* [RFC 00/38] PPGTT dynamic page allocations
@ 2014-10-07 17:10 Michel Thierry
2014-10-07 17:10 ` [RFC 01/38] drm/i915: Add some extra guards in evict_vm Michel Thierry
` (39 more replies)
0 siblings, 40 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:10 UTC (permalink / raw)
To: intel-gfx
This is based on the first 55 patches of Ben's 48b addressing work, taking
into consideration the latest changes in (mainly aliasing) ppgtt rules.
Because of these changes in the tree, the first 17 patches of the original
series are no longer needed, and some patches required more rework than others.
For GEN8, it has also been extended to work in logical ring submission (lrc)
mode, as it looks like it will be the preferred mode of operation.
I also tried to update the lrc code at the same time the ppgtt refactoring
occurred, leaving only one patch that is exclusively for lrc.
I'm asking for comments, as this is the foundation for 48b virtual addressing
in Broadwell.
This list can be seen in 3 parts:
[01-24] Include code rework for PPGTT (all GENs).
[25-28] Adds page table allocation for GEN6/GEN7
[29-38] Enables dynamic allocation in GEN8. It is enabled for both legacy
and execlist submission modes.
Ben Widawsky (37):
drm/i915: Add some extra guards in evict_vm
drm/i915/trace: Fix offsets for 64b
drm/i915: Wrap VMA binding
drm/i915: Make pin global flags explicit
drm/i915: Split out aliasing binds
drm/i915: fix gtt_total_entries()
drm/i915: Rename to GEN8_LEGACY_PDPES
drm/i915: Split out verbose PPGTT dumping
drm/i915: s/pd/pdpe, s/pt/pde
drm/i915: rename map/unmap to dma_map/unmap
drm/i915: Setup less PPGTT on failed pagedir
drm/i915: Un-hardcode number of page directories
drm/i915: Make gen6_write_pdes gen6_map_page_tables
drm/i915: Range clearing is PPGTT agnostic
drm/i915: Page table helpers, and define renames
drm/i915: construct page table abstractions
drm/i915: Complete page table structures
drm/i915: Create page table allocators
drm/i915: Generalize GEN6 mapping
drm/i915: Clean up pagetable DMA map & unmap
drm/i915: Always dma map page table allocations
drm/i915: Consolidate dma mappings
drm/i915: Always dma map page directory allocations
drm/i915: Track GEN6 page table usage
drm/i915: Extract context switch skip logic
drm/i915: Track page table reload need
drm/i915: Initialize all contexts
drm/i915: Finish gen6/7 dynamic page table allocation
drm/i915/bdw: Use dynamic allocation idioms on free
drm/i915/bdw: pagedirs rework allocation
drm/i915/bdw: pagetable allocation rework
drm/i915/bdw: Make the pdp switch a bit less hacky
drm/i915: num_pd_pages/num_pd_entries isn't useful
drm/i915: Extract PPGTT param from pagedir alloc
drm/i915/bdw: Split out mappings
drm/i915/bdw: begin bitmap tracking
drm/i915/bdw: Dynamic page table allocations
Michel Thierry (1):
drm/i915/bdw: Dynamic page table allocations in lrc mode
drivers/gpu/drm/i915/i915_debugfs.c | 78 +-
drivers/gpu/drm/i915/i915_drv.h | 22 +-
drivers/gpu/drm/i915/i915_gem.c | 35 +-
drivers/gpu/drm/i915/i915_gem_context.c | 64 +-
drivers/gpu/drm/i915/i915_gem_evict.c | 3 +
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 20 +-
drivers/gpu/drm/i915/i915_gem_gtt.c | 1306 ++++++++++++++++++++--------
drivers/gpu/drm/i915/i915_gem_gtt.h | 305 ++++++-
drivers/gpu/drm/i915/i915_gem_stolen.c | 2 +-
drivers/gpu/drm/i915/i915_trace.h | 124 ++-
drivers/gpu/drm/i915/intel_lrc.c | 80 +-
11 files changed, 1557 insertions(+), 482 deletions(-)
--
2.0.3
^ permalink raw reply [flat|nested] 53+ messages in thread
* [RFC 01/38] drm/i915: Add some extra guards in evict_vm
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
@ 2014-10-07 17:10 ` Michel Thierry
2014-10-08 13:36 ` Daniel Vetter
2014-10-07 17:10 ` [RFC 02/38] drm/i915/trace: Fix offsets for 64b Michel Thierry
` (38 subsequent siblings)
39 siblings, 1 reply; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:10 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_evict.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 886ff2e..7fd8b9b 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -214,6 +214,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
struct i915_vma *vma, *next;
int ret;
+ BUG_ON(!mutex_is_locked(&vm->dev->struct_mutex));
trace_i915_gem_evict_vm(vm);
if (do_idle) {
@@ -222,6 +223,8 @@ int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
return ret;
i915_gem_retire_requests(vm->dev);
+
+ WARN_ON(!list_empty(&vm->active_list));
}
list_for_each_entry_safe(vma, next, &vm->inactive_list, mm_list)
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 02/38] drm/i915/trace: Fix offsets for 64b
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
2014-10-07 17:10 ` [RFC 01/38] drm/i915: Add some extra guards in evict_vm Michel Thierry
@ 2014-10-07 17:10 ` Michel Thierry
2014-10-07 17:10 ` [RFC 03/38] drm/i915: Wrap VMA binding Michel Thierry
` (37 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:10 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_trace.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index f5aa006..cbf5521 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -115,7 +115,7 @@ TRACE_EVENT(i915_vma_bind,
TP_STRUCT__entry(
__field(struct drm_i915_gem_object *, obj)
__field(struct i915_address_space *, vm)
- __field(u32, offset)
+ __field(u64, offset)
__field(u32, size)
__field(unsigned, flags)
),
@@ -128,7 +128,7 @@ TRACE_EVENT(i915_vma_bind,
__entry->flags = flags;
),
- TP_printk("obj=%p, offset=%08x size=%x%s vm=%p",
+ TP_printk("obj=%p, offset=%016llx size=%x%s vm=%p",
__entry->obj, __entry->offset, __entry->size,
__entry->flags & PIN_MAPPABLE ? ", mappable" : "",
__entry->vm)
@@ -141,7 +141,7 @@ TRACE_EVENT(i915_vma_unbind,
TP_STRUCT__entry(
__field(struct drm_i915_gem_object *, obj)
__field(struct i915_address_space *, vm)
- __field(u32, offset)
+ __field(u64, offset)
__field(u32, size)
),
@@ -152,7 +152,7 @@ TRACE_EVENT(i915_vma_unbind,
__entry->size = vma->node.size;
),
- TP_printk("obj=%p, offset=%08x size=%x vm=%p",
+ TP_printk("obj=%p, offset=%016llx size=%x vm=%p",
__entry->obj, __entry->offset, __entry->size, __entry->vm)
);
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 03/38] drm/i915: Wrap VMA binding
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
2014-10-07 17:10 ` [RFC 01/38] drm/i915: Add some extra guards in evict_vm Michel Thierry
2014-10-07 17:10 ` [RFC 02/38] drm/i915/trace: Fix offsets for 64b Michel Thierry
@ 2014-10-07 17:10 ` Michel Thierry
2014-10-07 17:11 ` [RFC 04/38] drm/i915: Make pin global flags explicit Michel Thierry
` (36 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:10 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
This will be useful for some upcoming patches which do more platform
specific work. Having it in one central place just makes things a bit
cleaner and easier.
NOTE: I didn't actually end up using this patch for the intended
purpose, but I thought it was a nice patch to keep around.
v2: s/i915_gem_bind_vma/i915_gem_vma_bind/
s/i915_gem_unbind_vma/i915_gem_vma_unbind/
(Chris)
v3: Missed one spot
v4: Don't change the trace events (Daniel)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 3 +++
drivers/gpu/drm/i915/i915_gem.c | 12 ++++++------
drivers/gpu/drm/i915/i915_gem_context.c | 2 +-
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 ++-
drivers/gpu/drm/i915/i915_gem_gtt.c | 13 ++++++++++++-
5 files changed, 24 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 39f2181..3c725ec 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2558,6 +2558,9 @@ bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
struct i915_address_space *vm);
unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
struct i915_address_space *vm);
+void i915_gem_vma_bind(struct i915_vma *vma, enum i915_cache_level,
+ unsigned flags);
+void i915_gem_vma_unbind(struct i915_vma *vma);
struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
struct i915_address_space *vm);
struct i915_vma *
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fabe6fd..7745d22 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2989,7 +2989,7 @@ int i915_vma_unbind(struct i915_vma *vma)
trace_i915_vma_unbind(vma);
- vma->unbind_vma(vma);
+ i915_gem_vma_unbind(vma);
list_del_init(&vma->mm_list);
if (i915_is_ggtt(vma->vm))
@@ -3509,8 +3509,8 @@ search_free:
WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
trace_i915_vma_bind(vma, flags);
- vma->bind_vma(vma, obj->cache_level,
- flags & (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0);
+ i915_gem_vma_bind(vma, obj->cache_level,
+ flags & (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0);
return vma;
@@ -3717,8 +3717,8 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
list_for_each_entry(vma, &obj->vma_list, vma_link)
if (drm_mm_node_allocated(&vma->node))
- vma->bind_vma(vma, cache_level,
- obj->has_global_gtt_mapping ? GLOBAL_BIND : 0);
+ i915_gem_vma_bind(vma, cache_level,
+ obj->has_global_gtt_mapping ? GLOBAL_BIND : 0);
}
list_for_each_entry(vma, &obj->vma_list, vma_link)
@@ -4115,7 +4115,7 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
}
if (flags & PIN_GLOBAL && !obj->has_global_gtt_mapping)
- vma->bind_vma(vma, obj->cache_level, GLOBAL_BIND);
+ i915_gem_vma_bind(vma, obj->cache_level, GLOBAL_BIND);
vma->pin_count++;
if (flags & PIN_MAPPABLE)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index d1ed21a..813af4c 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -581,7 +581,7 @@ static int do_switch(struct intel_engine_cs *ring,
if (!to->legacy_hw_ctx.rcs_state->has_global_gtt_mapping) {
struct i915_vma *vma = i915_gem_obj_to_vma(to->legacy_hw_ctx.rcs_state,
&dev_priv->gtt.base);
- vma->bind_vma(vma, to->legacy_hw_ctx.rcs_state->cache_level, GLOBAL_BIND);
+ i915_gem_vma_bind(vma, to->legacy_hw_ctx.rcs_state->cache_level, GLOBAL_BIND);
}
/* GEN8 does *not* require an explicit reload if the PDPs have been
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 1a0611b..4564988 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -361,7 +361,8 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
struct i915_vma *vma =
list_first_entry(&target_i915_obj->vma_list,
typeof(*vma), vma_link);
- vma->bind_vma(vma, target_i915_obj->cache_level, GLOBAL_BIND);
+ i915_gem_vma_bind(vma, target_i915_obj->cache_level,
+ GLOBAL_BIND);
}
/* Validate that the target is in a valid r/w GPU domain */
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 6e03bf8..0c203f4 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1337,7 +1337,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
* without telling our object about it. So we need to fake it.
*/
obj->has_global_gtt_mapping = 0;
- vma->bind_vma(vma, obj->cache_level, GLOBAL_BIND);
+ i915_gem_vma_bind(vma, obj->cache_level, GLOBAL_BIND);
}
@@ -2134,6 +2134,17 @@ int i915_gem_gtt_init(struct drm_device *dev)
return 0;
}
+void i915_gem_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
+ unsigned flags)
+{
+ vma->bind_vma(vma, cache_level, flags);
+}
+
+void i915_gem_vma_unbind(struct i915_vma *vma)
+{
+ vma->unbind_vma(vma);
+}
+
static struct i915_vma *__i915_gem_vma_create(struct drm_i915_gem_object *obj,
struct i915_address_space *vm)
{
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 04/38] drm/i915: Make pin global flags explicit
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (2 preceding siblings ...)
2014-10-07 17:10 ` [RFC 03/38] drm/i915: Wrap VMA binding Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-08 13:36 ` Daniel Vetter
2014-10-07 17:11 ` [RFC 05/38] drm/i915: Split out aliasing binds Michel Thierry
` (35 subsequent siblings)
39 siblings, 1 reply; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
The driver currently lets callers pin global, and then tries to do
things correctly inside the function. Doing so has two downsides:
1. It's not possible to exclusively pin to a global, or an aliasing
address space.
2. It's difficult to read, and understand.
The eventual goal when realized should fix both of the issues. This patch
which should have no functional impact begins to address these issues
without intentionally breaking things.
v2: Replace PIN_GLOBAL with PIN_ALIASING in _pin(). Copy paste error
v3: Rebased/reworked with flag conflict from negative relocations
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 14 ++++++++------
drivers/gpu/drm/i915/i915_gem.c | 31 +++++++++++++++++++++++-------
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 ++-
drivers/gpu/drm/i915/i915_gem_gtt.c | 12 ++++++++++--
drivers/gpu/drm/i915/i915_gem_gtt.h | 6 +++++-
5 files changed, 49 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3c725ec..6b60e90 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2396,11 +2396,13 @@ void i915_init_vm(struct drm_i915_private *dev_priv,
void i915_gem_free_object(struct drm_gem_object *obj);
void i915_gem_vma_destroy(struct i915_vma *vma);
-#define PIN_MAPPABLE 0x1
-#define PIN_NONBLOCK 0x2
-#define PIN_GLOBAL 0x4
-#define PIN_OFFSET_BIAS 0x8
-#define PIN_OFFSET_MASK (~4095)
+#define PIN_MAPPABLE (1<<0)
+#define PIN_NONBLOCK (1<<1)
+#define PIN_GLOBAL (1<<2)
+#define PIN_ALIASING (1<<3)
+#define PIN_GLOBAL_ALIASED (PIN_ALIASING | PIN_GLOBAL)
+#define PIN_OFFSET_BIAS (1<<4)
+#define PIN_OFFSET_MASK (PAGE_MASK)
int __must_check i915_gem_object_pin(struct drm_i915_gem_object *obj,
struct i915_address_space *vm,
uint32_t alignment,
@@ -2618,7 +2620,7 @@ i915_gem_obj_ggtt_pin(struct drm_i915_gem_object *obj,
unsigned flags)
{
return i915_gem_object_pin(obj, i915_obj_to_ggtt(obj),
- alignment, flags | PIN_GLOBAL);
+ alignment, flags | PIN_GLOBAL_ALIASED);
}
static inline int
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7745d22..dfb20e6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3421,8 +3421,12 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
unsigned long end =
flags & PIN_MAPPABLE ? dev_priv->gtt.mappable_end : vm->total;
struct i915_vma *vma;
+ u32 vma_bind_flags = 0;
int ret;
+ if (WARN_ON((flags & (PIN_MAPPABLE | PIN_GLOBAL)) == PIN_MAPPABLE))
+ flags |= PIN_GLOBAL;
+
fence_size = i915_gem_get_gtt_size(dev,
obj->base.size,
obj->tiling_mode);
@@ -3508,9 +3512,11 @@ search_free:
WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
+ if (flags & PIN_GLOBAL_ALIASED)
+ vma_bind_flags = GLOBAL_BIND | ALIASING_BIND;
+
trace_i915_vma_bind(vma, flags);
- i915_gem_vma_bind(vma, obj->cache_level,
- flags & (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0);
+ i915_gem_vma_bind(vma, obj->cache_level, vma_bind_flags);
return vma;
@@ -3716,9 +3722,14 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
}
list_for_each_entry(vma, &obj->vma_list, vma_link)
- if (drm_mm_node_allocated(&vma->node))
- i915_gem_vma_bind(vma, cache_level,
- obj->has_global_gtt_mapping ? GLOBAL_BIND : 0);
+ if (drm_mm_node_allocated(&vma->node)) {
+ u32 bind_flags = 0;
+ if (obj->has_global_gtt_mapping)
+ bind_flags |= GLOBAL_BIND;
+ if (obj->has_aliasing_ppgtt_mapping)
+ bind_flags |= ALIASING_BIND;
+ i915_gem_vma_bind(vma, cache_level, bind_flags);
+ }
}
list_for_each_entry(vma, &obj->vma_list, vma_link)
@@ -4114,8 +4125,14 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
return PTR_ERR(vma);
}
- if (flags & PIN_GLOBAL && !obj->has_global_gtt_mapping)
- i915_gem_vma_bind(vma, obj->cache_level, GLOBAL_BIND);
+ if (flags & PIN_GLOBAL_ALIASED) {
+ u32 bind_flags = 0;
+ if (flags & PIN_GLOBAL && !obj->has_global_gtt_mapping)
+ bind_flags |= GLOBAL_BIND;
+ if (flags & PIN_ALIASING && !obj->has_aliasing_ppgtt_mapping)
+ bind_flags |= ALIASING_BIND;
+ i915_gem_vma_bind(vma, obj->cache_level, bind_flags);
+ }
vma->pin_count++;
if (flags & PIN_MAPPABLE)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 4564988..92191f0 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -362,7 +362,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
list_first_entry(&target_i915_obj->vma_list,
typeof(*vma), vma_link);
i915_gem_vma_bind(vma, target_i915_obj->cache_level,
- GLOBAL_BIND);
+ GLOBAL_BIND | ALIASING_BIND);
}
/* Validate that the target is in a valid r/w GPU domain */
@@ -533,6 +533,7 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
flags = 0;
if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
flags |= PIN_MAPPABLE;
+ /* FIXME: What kind of bind does Chris want? */
if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
flags |= PIN_GLOBAL;
if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0c203f4..d725883 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1336,8 +1336,16 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
* Unfortunately above, we've just wiped out the mappings
* without telling our object about it. So we need to fake it.
*/
- obj->has_global_gtt_mapping = 0;
- i915_gem_vma_bind(vma, obj->cache_level, GLOBAL_BIND);
+ if (obj->has_global_gtt_mapping || obj->has_aliasing_ppgtt_mapping) {
+ u32 bind_flags = 0;
+ if (obj->has_global_gtt_mapping)
+ bind_flags |= GLOBAL_BIND;
+ if (obj->has_aliasing_ppgtt_mapping)
+ bind_flags |= ALIASING_BIND;
+ obj->has_global_gtt_mapping = 0;
+ obj->has_aliasing_ppgtt_mapping = 0;
+ i915_gem_vma_bind(vma, obj->cache_level, bind_flags);
+ }
}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index d5c14af..5fd7fa9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -155,8 +155,12 @@ struct i915_vma {
* setting the valid PTE entries to a reserved scratch page. */
void (*unbind_vma)(struct i915_vma *vma);
/* Map an object into an address space with the given cache flags. */
+
+/* Only use this if you know you want a strictly global binding */
#define GLOBAL_BIND (1<<0)
-#define PTE_READ_ONLY (1<<1)
+/* Only use this if you know you want a strictly aliased binding */
+#define ALIASING_BIND (1<<1)
+#define PTE_READ_ONLY (1<<2)
void (*bind_vma)(struct i915_vma *vma,
enum i915_cache_level cache_level,
u32 flags);
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 05/38] drm/i915: Split out aliasing binds
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (3 preceding siblings ...)
2014-10-07 17:11 ` [RFC 04/38] drm/i915: Make pin global flags explicit Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-08 13:41 ` Daniel Vetter
2014-10-07 17:11 ` [RFC 06/38] drm/i915: fix gtt_total_entries() Michel Thierry
` (34 subsequent siblings)
39 siblings, 1 reply; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
This patch finishes off actually separating the aliasing and global
finds. Prior to this, all global binds would be aliased. Now if aliasing
binds are required, they must be explicitly asked for. So far, we have
no users of this outside of execbuf - but Mika has already submitted a
patch requiring just this.
A nice benefit of this is we should no longer be able to clobber GTT
only objects from the aliasing PPGTT.
v2: Only add aliasing binds for the GGTT/Aliasing PPGTT at execbuf
v3: Rebase resolution with changed size of flags
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 2 +-
drivers/gpu/drm/i915/i915_gem.c | 6 ++++--
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +++--
drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +++
4 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6b60e90..c0fea18 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2620,7 +2620,7 @@ i915_gem_obj_ggtt_pin(struct drm_i915_gem_object *obj,
unsigned flags)
{
return i915_gem_object_pin(obj, i915_obj_to_ggtt(obj),
- alignment, flags | PIN_GLOBAL_ALIASED);
+ alignment, flags | PIN_GLOBAL);
}
static inline int
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index dfb20e6..98186b2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3512,8 +3512,10 @@ search_free:
WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
- if (flags & PIN_GLOBAL_ALIASED)
- vma_bind_flags = GLOBAL_BIND | ALIASING_BIND;
+ if (flags & PIN_ALIASING)
+ vma_bind_flags = ALIASING_BIND;
+ if (flags & PIN_GLOBAL)
+ vma_bind_flags = GLOBAL_BIND;
trace_i915_vma_bind(vma, flags);
i915_gem_vma_bind(vma, obj->cache_level, vma_bind_flags);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 92191f0..d3a89e6 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -527,10 +527,11 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
{
struct drm_i915_gem_object *obj = vma->obj;
struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
- uint64_t flags;
+ uint64_t flags = 0;
int ret;
- flags = 0;
+ if (i915_is_ggtt(vma->vm))
+ flags = PIN_ALIASING;
if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
flags |= PIN_MAPPABLE;
/* FIXME: What kind of bind does Chris want? */
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d725883..ac0197f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1597,6 +1597,9 @@ static void ggtt_bind_vma(struct i915_vma *vma,
}
}
+ if (!(flags & ALIASING_BIND))
+ return;
+
if (dev_priv->mm.aliasing_ppgtt &&
(!obj->has_aliasing_ppgtt_mapping ||
(cache_level != obj->cache_level))) {
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 06/38] drm/i915: fix gtt_total_entries()
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (4 preceding siblings ...)
2014-10-07 17:11 ` [RFC 05/38] drm/i915: Split out aliasing binds Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-08 13:52 ` Daniel Vetter
2014-10-07 17:11 ` [RFC 07/38] drm/i915: Rename to GEN8_LEGACY_PDPES Michel Thierry
` (33 subsequent siblings)
39 siblings, 1 reply; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
It's useful to have it not as a macro for some upcoming work. Generally
since we try to avoid macros anyway, I think it doesn't hurt to put this
as its own patch.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 4 ++--
drivers/gpu/drm/i915/i915_gem_gtt.h | 8 ++++++--
drivers/gpu/drm/i915/i915_gem_stolen.c | 2 +-
3 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ac0197f..f677deb 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1489,7 +1489,7 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
unsigned num_entries = length >> PAGE_SHIFT;
gen8_gtt_pte_t scratch_pte, __iomem *gtt_base =
(gen8_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
- const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
+ const int max_entries = gtt_total_entries(&dev_priv->gtt) - first_entry;
int i;
if (WARN(num_entries > max_entries,
@@ -1515,7 +1515,7 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
unsigned num_entries = length >> PAGE_SHIFT;
gen6_gtt_pte_t scratch_pte, __iomem *gtt_base =
(gen6_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
- const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
+ const int max_entries = gtt_total_entries(&dev_priv->gtt) - first_entry;
int i;
if (WARN(num_entries > max_entries,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 5fd7fa9..98427ce 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -40,8 +40,6 @@ typedef uint32_t gen6_gtt_pte_t;
typedef uint64_t gen8_gtt_pte_t;
typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
-#define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
-
#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t))
/* gen6-hsw has bit 11-4 for physical addr bit 39-32 */
#define GEN6_GTT_ADDR_ENCODE(addr) ((addr) | (((addr) >> 28) & 0xff0))
@@ -284,6 +282,12 @@ int i915_ppgtt_init_hw(struct drm_device *dev);
void i915_ppgtt_release(struct kref *kref);
struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_device *dev,
struct drm_i915_file_private *fpriv);
+
+static inline size_t gtt_total_entries(struct i915_gtt *gtt)
+{
+ return gtt->base.total >> PAGE_SHIFT;
+}
+
static inline void i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
{
if (ppgtt)
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 85fda6b..4e1b22e 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -90,7 +90,7 @@ static unsigned long i915_stolen_to_physical(struct drm_device *dev)
(gtt_start & PGTBL_ADDRESS_HI_MASK) << 28;
else
gtt_start &= PGTBL_ADDRESS_LO_MASK;
- gtt_end = gtt_start + gtt_total_entries(dev_priv->gtt) * 4;
+ gtt_end = gtt_start + gtt_total_entries(&dev_priv->gtt) * 4;
if (gtt_start >= stolen[0].start && gtt_start < stolen[0].end)
stolen[0].end = gtt_start;
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 07/38] drm/i915: Rename to GEN8_LEGACY_PDPES
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (5 preceding siblings ...)
2014-10-07 17:11 ` [RFC 06/38] drm/i915: fix gtt_total_entries() Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 08/38] drm/i915: Split out verbose PPGTT dumping Michel Thierry
` (32 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
In gen8, 32b PPGTT has always had one "pdp" (it doesn't actually have
one, but it resembles having one). The #define was confusing as is, and
using "PDPE" is a much better description.
sed -i 's/GEN8_LEGACY_PDPS/GEN8_LEGACY_PDPES/' drivers/gpu/drm/i915/*.[ch]
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 6 +++---
drivers/gpu/drm/i915/i915_gem_gtt.h | 6 +++---
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index f677deb..0ee258b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -310,7 +310,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
pt_vaddr = NULL;
for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
- if (WARN_ON(pdpe >= GEN8_LEGACY_PDPS))
+ if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
break;
if (pt_vaddr == NULL)
@@ -421,7 +421,7 @@ bail:
static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
const int max_pdp)
{
- struct page **pt_pages[GEN8_LEGACY_PDPS];
+ struct page **pt_pages[GEN8_LEGACY_PDPES];
int i, ret;
for (i = 0; i < max_pdp; i++) {
@@ -472,7 +472,7 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
return -ENOMEM;
ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
- BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
+ BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
return 0;
}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 98427ce..2689bea 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -86,7 +86,7 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
#define GEN8_PDE_MASK 0x1ff
#define GEN8_PTE_SHIFT 12
#define GEN8_PTE_MASK 0x1ff
-#define GEN8_LEGACY_PDPS 4
+#define GEN8_LEGACY_PDPES 4
#define GEN8_PTES_PER_PAGE (PAGE_SIZE / sizeof(gen8_gtt_pte_t))
#define GEN8_PDES_PER_PAGE (PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
@@ -250,12 +250,12 @@ struct i915_hw_ppgtt {
unsigned num_pd_pages; /* gen8+ */
union {
struct page **pt_pages;
- struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
+ struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
};
struct page *pd_pages;
union {
uint32_t pd_offset;
- dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
+ dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
};
union {
dma_addr_t *pt_dma_addr;
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 08/38] drm/i915: Split out verbose PPGTT dumping
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (6 preceding siblings ...)
2014-10-07 17:11 ` [RFC 07/38] drm/i915: Rename to GEN8_LEGACY_PDPES Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 09/38] drm/i915: s/pd/pdpe, s/pt/pde Michel Thierry
` (31 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
There often is not enough memory to dump the full contents of the PPGTT.
As a temporary bandage, to continue getting valuable basic PPGTT info,
wrap the dangerous, memory hungry part inside of a new verbose version
of the debugfs file.
Also while here we can split out the PPGTT print function so it's more
reusable.
I'd really like to get PPGTT info into our error state, but I found it too
difficult to make work in the limited time I have. Maybe Mika can find a way.
v2: Get the info for the non-default contexts. Merge a patch from Chris
into this patch (Chris). All credit goes to him.
v3: Read and pass the 'verbose' flag without overwriting m.
References: 20140320115742.GA4463@nuc-i3427.alporthouse.com
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3)
---
drivers/gpu/drm/i915/i915_debugfs.c | 66 ++++++++++++++++++++++---------------
1 file changed, 40 insertions(+), 26 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 2912d61..b5e5485 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2007,28 +2007,12 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
return 0;
}
-static int per_file_ctx(int id, void *ptr, void *data)
+static void print_ppgtt(struct seq_file *m, struct i915_hw_ppgtt *ppgtt)
{
- struct intel_context *ctx = ptr;
- struct seq_file *m = data;
- struct i915_hw_ppgtt *ppgtt = ctx->ppgtt;
-
- if (!ppgtt) {
- seq_printf(m, " no ppgtt for context %d\n",
- ctx->user_handle);
- return 0;
- }
-
- if (i915_gem_context_is_default(ctx))
- seq_puts(m, " default context:\n");
- else
- seq_printf(m, " context %d:\n", ctx->user_handle);
- ppgtt->debug_dump(ppgtt, m);
-
- return 0;
+ seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
}
-static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
+static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int verbose)
{
struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_engine_cs *ring;
@@ -2052,7 +2036,34 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
}
}
-static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
+static int per_file_ctx(int id, void *ptr, void *data)
+{
+ struct intel_context *ctx = ptr;
+ struct seq_file *m = data;
+ struct drm_info_node *node = m->private;
+ bool verbose = (uintptr_t)node->info_ent->data & 1;
+ struct i915_hw_ppgtt *ppgtt = ctx->ppgtt;
+
+ if (!ppgtt) {
+ seq_printf(m, " no ppgtt for context %d\n",
+ ctx->user_handle);
+ return 0;
+ }
+
+ if (i915_gem_context_is_default(ctx))
+ seq_puts(m, " default context:\n");
+ else
+ seq_printf(m, " context %d:\n", ctx->user_handle);
+
+ print_ppgtt(m, ppgtt);
+ /* XXX: Dumper missing for gen8+ */
+ if (verbose && ppgtt->debug_dump)
+ ppgtt->debug_dump(ppgtt, m);
+
+ return 0;
+}
+
+static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool verbose)
{
struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_engine_cs *ring;
@@ -2074,9 +2085,9 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
seq_puts(m, "aliasing PPGTT:\n");
- seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
-
- ppgtt->debug_dump(ppgtt, m);
+ print_ppgtt(m, ppgtt);
+ if (verbose)
+ ppgtt->debug_dump(ppgtt, m);
}
list_for_each_entry_reverse(file, &dev->filelist, lhead) {
@@ -2084,7 +2095,8 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
seq_printf(m, "proc: %s\n",
get_pid_task(file->pid, PIDTYPE_PID)->comm);
- idr_for_each(&file_priv->context_idr, per_file_ctx, m);
+ idr_for_each(&file_priv->context_idr, per_file_ctx,
+ (void *)(unsigned long)m);
}
seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
}
@@ -2094,6 +2106,7 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
struct drm_info_node *node = m->private;
struct drm_device *dev = node->minor->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
+ bool verbose = (uintptr_t)node->info_ent->data & 1;
int ret = mutex_lock_interruptible(&dev->struct_mutex);
if (ret)
@@ -2101,9 +2114,9 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
intel_runtime_pm_get(dev_priv);
if (INTEL_INFO(dev)->gen >= 8)
- gen8_ppgtt_info(m, dev);
+ gen8_ppgtt_info(m, dev, verbose);
else if (INTEL_INFO(dev)->gen >= 6)
- gen6_ppgtt_info(m, dev);
+ gen6_ppgtt_info(m, dev, verbose);
intel_runtime_pm_put(dev_priv);
mutex_unlock(&dev->struct_mutex);
@@ -4182,6 +4195,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
{"i915_swizzle_info", i915_swizzle_info, 0},
{"i915_ppgtt_info", i915_ppgtt_info, 0},
+ {"i915_ppgtt_verbose_info", i915_ppgtt_info, 0, (void *)1},
{"i915_llc", i915_llc, 0},
{"i915_edp_psr_status", i915_edp_psr_status, 0},
{"i915_sink_crc_eDP1", i915_sink_crc, 0},
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 09/38] drm/i915: s/pd/pdpe, s/pt/pde
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (7 preceding siblings ...)
2014-10-07 17:11 ` [RFC 08/38] drm/i915: Split out verbose PPGTT dumping Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-08 13:55 ` Daniel Vetter
2014-10-07 17:11 ` [RFC 10/38] drm/i915: rename map/unmap to dma_map/unmap Michel Thierry
` (30 subsequent siblings)
39 siblings, 1 reply; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
The actual correct way to think about this with the new style of page
table data structures is as the actual entry that is being indexed into
the array. "pd", and "pt" aren't representative of what the operation is
doing.
The clarity here will improve the readability of future patches.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0ee258b..12da57a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -502,40 +502,40 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
}
static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
- const int pd)
+ const int pdpe)
{
dma_addr_t pd_addr;
int ret;
pd_addr = pci_map_page(ppgtt->base.dev->pdev,
- &ppgtt->pd_pages[pd], 0,
+ &ppgtt->pd_pages[pdpe], 0,
PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
if (ret)
return ret;
- ppgtt->pd_dma_addr[pd] = pd_addr;
+ ppgtt->pd_dma_addr[pdpe] = pd_addr;
return 0;
}
static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
- const int pd,
- const int pt)
+ const int pdpe,
+ const int pde)
{
dma_addr_t pt_addr;
struct page *p;
int ret;
- p = ppgtt->gen8_pt_pages[pd][pt];
+ p = ppgtt->gen8_pt_pages[pdpe][pde];
pt_addr = pci_map_page(ppgtt->base.dev->pdev,
p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
if (ret)
return ret;
- ppgtt->gen8_pt_dma_addr[pd][pt] = pt_addr;
+ ppgtt->gen8_pt_dma_addr[pdpe][pde] = pt_addr;
return 0;
}
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 10/38] drm/i915: rename map/unmap to dma_map/unmap
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (8 preceding siblings ...)
2014-10-07 17:11 ` [RFC 09/38] drm/i915: s/pd/pdpe, s/pt/pde Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 11/38] drm/i915: Setup less PPGTT on failed pagedir Michel Thierry
` (29 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
Upcoming patches will use the terms map and unmap in references to the
page table entries. Having this distinction will really help with code
clarity at that point.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 12da57a..ec39de6 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -363,7 +363,7 @@ static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
}
-static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
+static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
{
struct pci_dev *hwdev = ppgtt->base.dev->pdev;
int i, j;
@@ -391,7 +391,7 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
- gen8_ppgtt_unmap_pages(ppgtt);
+ gen8_ppgtt_dma_unmap_pages(ppgtt);
gen8_ppgtt_free(ppgtt);
}
@@ -617,7 +617,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
return 0;
bail:
- gen8_ppgtt_unmap_pages(ppgtt);
+ gen8_ppgtt_dma_unmap_pages(ppgtt);
gen8_ppgtt_free(ppgtt);
return ret;
}
@@ -903,7 +903,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
kunmap_atomic(pt_vaddr);
}
-static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
+static void gen6_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
{
int i;
@@ -932,7 +932,7 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
drm_mm_remove_node(&ppgtt->node);
- gen6_ppgtt_unmap_pages(ppgtt);
+ gen6_ppgtt_dma_unmap_pages(ppgtt);
gen6_ppgtt_free(ppgtt);
}
@@ -1032,7 +1032,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
PCI_DMA_BIDIRECTIONAL);
if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
- gen6_ppgtt_unmap_pages(ppgtt);
+ gen6_ppgtt_dma_unmap_pages(ppgtt);
return -EIO;
}
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 11/38] drm/i915: Setup less PPGTT on failed pagedir
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (9 preceding siblings ...)
2014-10-07 17:11 ` [RFC 10/38] drm/i915: rename map/unmap to dma_map/unmap Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 12/38] drm/i915: Un-hardcode number of page directories Michel Thierry
` (28 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
The current code will both potentially print a WARN, and setup part of
the PPGTT structure. Neither of these harm the current code, it is
simply for clarity, and to perhaps prevent later bugs, or weird
debug messages.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ec39de6..00b5e5a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -967,11 +967,14 @@ alloc:
goto alloc;
}
+ if (ret)
+ return ret;
+
if (ppgtt->node.start < dev_priv->gtt.mappable_end)
DRM_DEBUG("Forced to use aperture for PDEs\n");
ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
- return ret;
+ return 0;
}
static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 12/38] drm/i915: Un-hardcode number of page directories
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (10 preceding siblings ...)
2014-10-07 17:11 ` [RFC 11/38] drm/i915: Setup less PPGTT on failed pagedir Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 13/38] drm/i915: Make gen6_write_pdes gen6_map_page_tables Michel Thierry
` (27 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
trivial.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 2689bea..e32c00a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -259,7 +259,7 @@ struct i915_hw_ppgtt {
};
union {
dma_addr_t *pt_dma_addr;
- dma_addr_t *gen8_pt_dma_addr[4];
+ dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
};
struct drm_i915_file_private *file_priv;
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 13/38] drm/i915: Make gen6_write_pdes gen6_map_page_tables
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (11 preceding siblings ...)
2014-10-07 17:11 ` [RFC 12/38] drm/i915: Un-hardcode number of page directories Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-08 14:04 ` Daniel Vetter
2014-10-07 17:11 ` [RFC 14/38] drm/i915: Range clearing is PPGTT agnostic Michel Thierry
` (26 subsequent siblings)
39 siblings, 1 reply; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
Split out single mappings which will help with upcoming work. Also while
here, rename the function because it is a better description - but this
function is going away soon.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 39 ++++++++++++++++++++++---------------
1 file changed, 23 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 00b5e5a..f5a1ac9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -678,26 +678,33 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
}
}
-static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
+static void gen6_map_single(struct i915_hw_ppgtt *ppgtt,
+ const unsigned pde_index,
+ dma_addr_t daddr)
{
struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
- gen6_gtt_pte_t __iomem *pd_addr;
uint32_t pd_entry;
+ gen6_gtt_pte_t __iomem *pd_addr =
+ (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+
+ pd_entry = GEN6_PDE_ADDR_ENCODE(daddr);
+ pd_entry |= GEN6_PDE_VALID;
+
+ writel(pd_entry, pd_addr + pde_index);
+}
+
+/* Map all the page tables found in the ppgtt structure to incrementing page
+ * directories. */
+static void gen6_map_page_tables(struct i915_hw_ppgtt *ppgtt)
+{
+ struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
int i;
WARN_ON(ppgtt->pd_offset & 0x3f);
- pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
- ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
- for (i = 0; i < ppgtt->num_pd_entries; i++) {
- dma_addr_t pt_addr;
-
- pt_addr = ppgtt->pt_dma_addr[i];
- pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
- pd_entry |= GEN6_PDE_VALID;
+ for (i = 0; i < ppgtt->num_pd_entries; i++)
+ gen6_map_single(ppgtt, i, ppgtt->pt_dma_addr[i]);
- writel(pd_entry, pd_addr + i);
- }
- readl(pd_addr);
+ readl(dev_priv->gtt.gsm);
}
static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
@@ -1087,7 +1094,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->node.size >> 20,
ppgtt->node.start / PAGE_SIZE);
- gen6_write_pdes(ppgtt);
+ gen6_map_page_tables(ppgtt);
DRM_DEBUG("Adding PPGTT at offset %x\n",
ppgtt->pd_offset << 10);
@@ -1365,11 +1372,11 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
/* TODO: Perhaps it shouldn't be gen6 specific */
if (i915_is_ggtt(vm)) {
if (dev_priv->mm.aliasing_ppgtt)
- gen6_write_pdes(dev_priv->mm.aliasing_ppgtt);
+ gen6_map_page_tables(dev_priv->mm.aliasing_ppgtt);
continue;
}
- gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt, base));
+ gen6_map_page_tables(container_of(vm, struct i915_hw_ppgtt, base));
}
i915_ggtt_flush(dev_priv);
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 14/38] drm/i915: Range clearing is PPGTT agnostic
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (12 preceding siblings ...)
2014-10-07 17:11 ` [RFC 13/38] drm/i915: Make gen6_write_pdes gen6_map_page_tables Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 15/38] drm/i915: Page table helpers, and define renames Michel Thierry
` (25 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
Therefore we can do it from our general init function. Eventually, I
hope to have a lot more commonality like this. It won't arrive yet, but
this was a nice easy one.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index f5a1ac9..84bcfc6 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -607,8 +607,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
ppgtt->base.start = 0;
ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE;
- ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
-
DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
@@ -1088,8 +1086,6 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->pd_offset =
ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
- ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
-
DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
ppgtt->node.size >> 20,
ppgtt->node.start / PAGE_SIZE);
@@ -1125,6 +1121,8 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
kref_init(&ppgtt->ref);
drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
ppgtt->base.total);
+ ppgtt->base.clear_range(&ppgtt->base, 0,
+ ppgtt->base.total, true);
i915_init_vm(dev_priv, &ppgtt->base);
}
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 15/38] drm/i915: Page table helpers, and define renames
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (13 preceding siblings ...)
2014-10-07 17:11 ` [RFC 14/38] drm/i915: Range clearing is PPGTT agnostic Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 16/38] drm/i915: construct page table abstractions Michel Thierry
` (24 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
These page table helpers make the code much cleaner. There is some
room to use the arch/x86 header files. The reason I've opted not to is
in several cases, the definitions are dictated by the CONFIG_ options
which do not always indicate the restrictions in the GPU. While here,
clean up the defines to have more concise names, and consolidate between
gen6 and gen8 where appropriate.
v2: Use I915_PAGE_SIZE to remove PAGE_SIZE dep in the new code (Jesse)
Fix bugged I915_PTE_MASK define, which was unused (Chris)
BUG_ON bad length/size - taking directly from Chris (Chris)
define NUM_PTE (Chris)
I've made a lot of tiny errors in these helpers. Often I'd correct an
error only to introduce another one. While IGT was capable of catching
them, the tests often took a while to catch, and where hard/slow to
debug in the kernel. As a result, to test this, I compiled
i915_gem_gtt.h in userspace, and ran tests from userspace. What follows
isn't by any means complete, but it was able to catch lot of bugs. Gen8
is also untested, but since the current code is almost identical, I feel
pretty comfortable with that.
void test_pte(uint32_t base) {
uint32_t ret;
assert_pte_index((base + 0), 0);
assert_pte_index((base + 1), 0);
assert_pte_index((base + 0x1000), 1);
assert_pte_index((base + (1<<22)), 0);
assert_pte_index((base + ((1<<22) - 1)), 1023);
assert_pte_index((base + (1<<21)), 512);
assert_pte_count(base + 0, 0, 0);
assert_pte_count(base + 0, 1, 1);
assert_pte_count(base + 0, 0x1000, 1);
assert_pte_count(base + 0, 0x1001, 2);
assert_pte_count(base + 0, 1<<21, 512);
assert_pte_count(base + 0, 1<<22, 1024);
assert_pte_count(base + 0, (1<<22) - 1, 1024);
assert_pte_count(base + (1<<21), 1<<22, 512);
assert_pte_count(base + (1<<21), (1<<22)+1, 512);
assert_pte_count(base + (1<<21), 10<<22, 512);
}
void test_pde(uint32_t base) {
assert(gen6_pde_index(base + 0) == 0);
assert(gen6_pde_index(base + 1) == 0);
assert(gen6_pde_index(base + (1<<21)) == 0);
assert(gen6_pde_index(base + (1<<22)) == 1);
assert(gen6_pde_index(base + ((256<<22)))== 256);
assert(gen6_pde_index(base + ((512<<22))) == 0);
assert(gen6_pde_index(base + ((513<<22))) == 1); /* This is
actually not possible on gen6 */
assert(gen6_pde_count(base + 0, 0) == 0);
assert(gen6_pde_count(base + 0, 1) == 1);
assert(gen6_pde_count(base + 0, 1<<21) == 1);
assert(gen6_pde_count(base + 0, 1<<22) == 1);
assert(gen6_pde_count(base + 0, (1<<22) + 0x1000) == 2);
assert(gen6_pde_count(base + 0x1000, 1<<22) == 2);
assert(gen6_pde_count(base + 0, 511<<22) == 511);
assert(gen6_pde_count(base + 0, 512<<22) == 512);
assert(gen6_pde_count(base + 0x1000, 512<<22) == 512);
assert(gen6_pde_count(base + (1<<22), 512<<22) == 511);
}
int main()
{
test_pde(0);
while (1)
test_pte(rand() & ~((1<<22) - 1));
return 0;
}
v3: Some small rebase conflicts resolved
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 89 +++++++++++++-------------
drivers/gpu/drm/i915/i915_gem_gtt.h | 123 +++++++++++++++++++++++++++++++++---
2 files changed, 156 insertions(+), 56 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 84bcfc6..0607334 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -239,7 +239,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
int i, ret;
/* bit of a hack to find the actual last used pd */
- int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
+ int used_pd = ppgtt->num_pd_entries / I915_PDES_PER_PD;
for (i = used_pd - 1; i >= 0; i--) {
dma_addr_t addr = ppgtt->pd_dma_addr[i];
@@ -259,9 +259,9 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
gen8_gtt_pte_t *pt_vaddr, scratch_pte;
- unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
- unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
- unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
+ unsigned pdpe = gen8_pdpe_index(start);
+ unsigned pde = gen8_pde_index(start);
+ unsigned pte = gen8_pte_index(start);
unsigned num_entries = length >> PAGE_SHIFT;
unsigned last_pte, i;
@@ -272,8 +272,8 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
last_pte = pte + num_entries;
- if (last_pte > GEN8_PTES_PER_PAGE)
- last_pte = GEN8_PTES_PER_PAGE;
+ if (last_pte > GEN8_PTES_PER_PT)
+ last_pte = GEN8_PTES_PER_PT;
pt_vaddr = kmap_atomic(page_table);
@@ -287,7 +287,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
kunmap_atomic(pt_vaddr);
pte = 0;
- if (++pde == GEN8_PDES_PER_PAGE) {
+ if (++pde == I915_PDES_PER_PD) {
pdpe++;
pde = 0;
}
@@ -302,9 +302,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
gen8_gtt_pte_t *pt_vaddr;
- unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
- unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
- unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
+ unsigned pdpe = gen8_pdpe_index(start);
+ unsigned pde = gen8_pde_index(start);
+ unsigned pte = gen8_pte_index(start);
struct sg_page_iter sg_iter;
pt_vaddr = NULL;
@@ -319,12 +319,12 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
pt_vaddr[pte] =
gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
cache_level, true);
- if (++pte == GEN8_PTES_PER_PAGE) {
+ if (++pte == GEN8_PTES_PER_PT) {
if (!HAS_LLC(ppgtt->base.dev))
drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
kunmap_atomic(pt_vaddr);
pt_vaddr = NULL;
- if (++pde == GEN8_PDES_PER_PAGE) {
+ if (++pde == I915_PDES_PER_PD) {
pdpe++;
pde = 0;
}
@@ -345,7 +345,7 @@ static void gen8_free_page_tables(struct page **pt_pages)
if (pt_pages == NULL)
return;
- for (i = 0; i < GEN8_PDES_PER_PAGE; i++)
+ for (i = 0; i < I915_PDES_PER_PD; i++)
if (pt_pages[i])
__free_pages(pt_pages[i], 0);
}
@@ -377,7 +377,7 @@ static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
PCI_DMA_BIDIRECTIONAL);
- for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
+ for (j = 0; j < I915_PDES_PER_PD; j++) {
dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
if (addr)
pci_unmap_page(hwdev, addr, PAGE_SIZE,
@@ -400,11 +400,11 @@ static struct page **__gen8_alloc_page_tables(void)
struct page **pt_pages;
int i;
- pt_pages = kcalloc(GEN8_PDES_PER_PAGE, sizeof(struct page *), GFP_KERNEL);
+ pt_pages = kcalloc(I915_PDES_PER_PD, sizeof(struct page *), GFP_KERNEL);
if (!pt_pages)
return ERR_PTR(-ENOMEM);
- for (i = 0; i < GEN8_PDES_PER_PAGE; i++) {
+ for (i = 0; i < I915_PDES_PER_PD; i++) {
pt_pages[i] = alloc_page(GFP_KERNEL);
if (!pt_pages[i])
goto bail;
@@ -454,7 +454,7 @@ static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
int i;
for (i = 0; i < ppgtt->num_pd_pages; i++) {
- ppgtt->gen8_pt_dma_addr[i] = kcalloc(GEN8_PDES_PER_PAGE,
+ ppgtt->gen8_pt_dma_addr[i] = kcalloc(I915_PDES_PER_PD,
sizeof(dma_addr_t),
GFP_KERNEL);
if (!ppgtt->gen8_pt_dma_addr[i])
@@ -492,7 +492,7 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
return ret;
}
- ppgtt->num_pd_entries = max_pdp * GEN8_PDES_PER_PAGE;
+ ppgtt->num_pd_entries = max_pdp * I915_PDES_PER_PD;
ret = gen8_ppgtt_allocate_dma(ppgtt);
if (ret)
@@ -553,7 +553,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
{
const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
- const int min_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
+ const int min_pt_pages = I915_PDES_PER_PD * max_pdp;
int i, j, ret;
if (size % (1<<30))
@@ -572,7 +572,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
if (ret)
goto bail;
- for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
+ for (j = 0; j < I915_PDES_PER_PD; j++) {
ret = gen8_ppgtt_setup_page_tables(ppgtt, i, j);
if (ret)
goto bail;
@@ -590,7 +590,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
for (i = 0; i < max_pdp; i++) {
gen8_ppgtt_pde_t *pd_vaddr;
pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
- for (j = 0; j < GEN8_PDES_PER_PAGE; j++) {
+ for (j = 0; j < I915_PDES_PER_PD; j++) {
dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
I915_CACHE_LLC);
@@ -605,7 +605,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
ppgtt->base.cleanup = gen8_ppgtt_cleanup;
ppgtt->base.start = 0;
- ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * PAGE_SIZE;
+ ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PT * PAGE_SIZE;
DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
@@ -651,9 +651,9 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
seq_printf(m, "\tPDE: %x\n", pd_entry);
pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
- for (pte = 0; pte < I915_PPGTT_PT_ENTRIES; pte+=4) {
+ for (pte = 0; pte < GEN6_PTES_PER_PT; pte+=4) {
unsigned long va =
- (pde * PAGE_SIZE * I915_PPGTT_PT_ENTRIES) +
+ (pde * PAGE_SIZE * GEN6_PTES_PER_PT) +
(pte * PAGE_SIZE);
int i;
bool found = false;
@@ -849,29 +849,28 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
gen6_gtt_pte_t *pt_vaddr, scratch_pte;
- unsigned first_entry = start >> PAGE_SHIFT;
+ unsigned pde = gen6_pde_index(start);
unsigned num_entries = length >> PAGE_SHIFT;
- unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
- unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES;
+ unsigned pte = gen6_pte_index(start);
unsigned last_pte, i;
scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
while (num_entries) {
- last_pte = first_pte + num_entries;
- if (last_pte > I915_PPGTT_PT_ENTRIES)
- last_pte = I915_PPGTT_PT_ENTRIES;
+ last_pte = pte + num_entries;
+ if (last_pte > GEN6_PTES_PER_PT)
+ last_pte = GEN6_PTES_PER_PT;
- pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+ pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
- for (i = first_pte; i < last_pte; i++)
+ for (i = pte; i < last_pte; i++)
pt_vaddr[i] = scratch_pte;
kunmap_atomic(pt_vaddr);
- num_entries -= last_pte - first_pte;
- first_pte = 0;
- act_pt++;
+ num_entries -= last_pte - pte;
+ pte = 0;
+ pde++;
}
}
@@ -883,25 +882,23 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
gen6_gtt_pte_t *pt_vaddr;
- unsigned first_entry = start >> PAGE_SHIFT;
- unsigned act_pt = first_entry / I915_PPGTT_PT_ENTRIES;
- unsigned act_pte = first_entry % I915_PPGTT_PT_ENTRIES;
+ unsigned pde = gen6_pde_index(start);
+ unsigned pte = gen6_pte_index(start);
struct sg_page_iter sg_iter;
pt_vaddr = NULL;
for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
if (pt_vaddr == NULL)
- pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pt]);
+ pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
- pt_vaddr[act_pte] =
+ pt_vaddr[pte] =
vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
cache_level, true, flags);
-
- if (++act_pte == I915_PPGTT_PT_ENTRIES) {
+ if (++pte == GEN6_PTES_PER_PT) {
kunmap_atomic(pt_vaddr);
pt_vaddr = NULL;
- act_pt++;
- act_pte = 0;
+ pde++;
+ pte = 0;
}
}
if (pt_vaddr)
@@ -978,7 +975,7 @@ alloc:
if (ppgtt->node.start < dev_priv->gtt.mappable_end)
DRM_DEBUG("Forced to use aperture for PDEs\n");
- ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
+ ppgtt->num_pd_entries = I915_PDES_PER_PD;
return 0;
}
@@ -1080,7 +1077,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
ppgtt->base.cleanup = gen6_ppgtt_cleanup;
ppgtt->base.start = 0;
- ppgtt->base.total = ppgtt->num_pd_entries * I915_PPGTT_PT_ENTRIES * PAGE_SIZE;
+ ppgtt->base.total = ppgtt->num_pd_entries * GEN6_PTES_PER_PT * PAGE_SIZE;
ppgtt->debug_dump = gen6_dump_ppgtt;
ppgtt->pd_offset =
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index e32c00a..d432f2d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -40,8 +40,16 @@ typedef uint32_t gen6_gtt_pte_t;
typedef uint64_t gen8_gtt_pte_t;
typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
-#define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t))
-/* gen6-hsw has bit 11-4 for physical addr bit 39-32 */
+/* GEN Agnostic defines */
+#define I915_PAGE_SIZE 4096
+#define I915_PDES_PER_PD 512
+#define I915_PTE_MASK (I915_PAGE_SIZE-1)
+#define I915_PDE_MASK (I915_PDES_PER_PD-1)
+
+/* GEN6 PPGTT resembles a 2 level page table:
+ * 31:22 | 21:12 | 11:0
+ * PDE | PTE | offset
+ */
#define GEN6_GTT_ADDR_ENCODE(addr) ((addr) | (((addr) >> 28) & 0xff0))
#define GEN6_PTE_ADDR_ENCODE(addr) GEN6_GTT_ADDR_ENCODE(addr)
#define GEN6_PDE_ADDR_ENCODE(addr) GEN6_GTT_ADDR_ENCODE(addr)
@@ -49,13 +57,16 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
#define GEN6_PTE_UNCACHED (1 << 1)
#define GEN6_PTE_VALID (1 << 0)
-#define GEN6_PPGTT_PD_ENTRIES 512
-#define GEN6_PD_SIZE (GEN6_PPGTT_PD_ENTRIES * PAGE_SIZE)
+#define GEN6_PD_SIZE (I915_PDES_PER_PD * PAGE_SIZE)
#define GEN6_PD_ALIGN (PAGE_SIZE * 16)
#define GEN6_PDE_VALID (1 << 0)
#define GEN7_PTE_CACHE_L3_LLC (3 << 1)
+#define GEN6_PDE_SHIFT 22
+#define GEN6_PTES_PER_PT (PAGE_SIZE / sizeof(gen6_gtt_pte_t))
+#define NUM_PTE(pde_shift) (1 << (pde_shift - PAGE_SHIFT))
+
#define BYT_PTE_SNOOPED_BY_CPU_CACHES (1 << 2)
#define BYT_PTE_WRITEABLE (1 << 1)
@@ -74,6 +85,14 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
#define HSW_GTT_ADDR_ENCODE(addr) ((addr) | (((addr) >> 28) & 0x7f0))
#define HSW_PTE_ADDR_ENCODE(addr) HSW_GTT_ADDR_ENCODE(addr)
+#define PPAT_UNCACHED_INDEX (_PAGE_PWT | _PAGE_PCD)
+#define PPAT_CACHED_PDE_INDEX 0 /* WB LLC */
+#define PPAT_CACHED_INDEX _PAGE_PAT /* WB LLCeLLC */
+#define PPAT_DISPLAY_ELLC_INDEX _PAGE_PCD /* WT eLLC */
+
+#define GEN8_LEGACY_PDPES 4
+#define GEN8_PTES_PER_PT (PAGE_SIZE / sizeof(gen8_gtt_pte_t))
+
/* GEN8 legacy style address is defined as a 3 level page table:
* 31:30 | 29:21 | 20:12 | 11:0
* PDPE | PDE | PTE | offset
@@ -83,12 +102,6 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
#define GEN8_PDPE_SHIFT 30
#define GEN8_PDPE_MASK 0x3
#define GEN8_PDE_SHIFT 21
-#define GEN8_PDE_MASK 0x1ff
-#define GEN8_PTE_SHIFT 12
-#define GEN8_PTE_MASK 0x1ff
-#define GEN8_LEGACY_PDPES 4
-#define GEN8_PTES_PER_PAGE (PAGE_SIZE / sizeof(gen8_gtt_pte_t))
-#define GEN8_PDES_PER_PAGE (PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
#define PPAT_UNCACHED_INDEX (_PAGE_PWT | _PAGE_PCD)
#define PPAT_CACHED_PDE_INDEX 0 /* WB LLC */
@@ -270,6 +283,96 @@ struct i915_hw_ppgtt {
void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
};
+static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
+{
+ const uint32_t mask = NUM_PTE(pde_shift) - 1;
+ return (address >> PAGE_SHIFT) & mask;
+}
+
+/* Helper to counts the number of PTEs within the given length. This count does
+ * not cross a page table boundary, so the max value would be
+ * GEN6_PTES_PER_PT for GEN6, and GEN8_PTES_PER_PT for GEN8.
+ */
+static inline size_t i915_pte_count(uint64_t addr, size_t length,
+ uint32_t pde_shift)
+{
+ const uint64_t mask = ~((1 << pde_shift) - 1);
+ uint64_t end;
+
+ BUG_ON(length == 0);
+ BUG_ON(offset_in_page(addr|length));
+
+ end = addr + length;
+
+ if ((addr & mask) != (end & mask))
+ return NUM_PTE(pde_shift) - i915_pte_index(addr, pde_shift);
+
+ return i915_pte_index(end, pde_shift) - i915_pte_index(addr, pde_shift);
+}
+
+static inline uint32_t i915_pde_index(uint64_t addr, uint32_t shift)
+{
+ return (addr >> shift) & I915_PDE_MASK;
+}
+
+static inline size_t i915_pde_count(uint64_t addr, uint64_t length,
+ uint32_t pde_shift)
+{
+ const uint32_t pdp_shift = pde_shift + 9;
+ const uint64_t mask = ~((1 << pdp_shift) - 1);
+ uint64_t end;
+
+ BUG_ON(length == 0);
+ BUG_ON(offset_in_page(addr|length));
+
+ end = addr + length;
+
+ if ((addr & mask) != (end & mask))
+ return I915_PDES_PER_PD - i915_pde_index(addr, pde_shift);
+
+ return i915_pde_index(end, pde_shift) - i915_pde_index(addr, pde_shift);
+}
+
+static inline uint32_t gen6_pte_index(uint32_t addr)
+{
+ return i915_pte_index(addr, GEN6_PDE_SHIFT);
+}
+
+static inline size_t gen6_pte_count(uint32_t addr, uint32_t length)
+{
+ return i915_pte_count(addr, length, GEN6_PDE_SHIFT);
+}
+
+static inline uint32_t gen6_pde_index(uint32_t addr)
+{
+ return i915_pde_index(addr, GEN6_PDE_SHIFT);
+}
+
+static inline size_t gen6_pde_count(uint32_t addr, uint32_t length)
+{
+ return i915_pde_count(addr, length, GEN6_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pte_index(uint64_t address)
+{
+ return i915_pte_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pde_index(uint64_t address)
+{
+ return i915_pde_index(address, GEN8_PDE_SHIFT);
+}
+
+static inline uint32_t gen8_pdpe_index(uint64_t address)
+{
+ return (address >> GEN8_PDPE_SHIFT) & GEN8_PDPE_MASK;
+}
+
+static inline uint32_t gen8_pml4e_index(uint64_t address)
+{
+ BUG();
+}
+
int i915_gem_gtt_init(struct drm_device *dev);
void i915_gem_init_global_gtt(struct drm_device *dev);
int i915_gem_setup_global_gtt(struct drm_device *dev, unsigned long start,
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 16/38] drm/i915: construct page table abstractions
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (14 preceding siblings ...)
2014-10-07 17:11 ` [RFC 15/38] drm/i915: Page table helpers, and define renames Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 17/38] drm/i915: Complete page table structures Michel Thierry
` (23 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
Thus far we've opted to make complex code requiring difficult review. In
the future, the code is only going to become more complex, and as such
we'll take the hit now and start to encapsulate things.
To help transition the code nicely there is some wasted space in gen6/7.
This will be ameliorated shortly.
NOTE: The pun in the subject was intentional.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 174 ++++++++++++++++++------------------
drivers/gpu/drm/i915/i915_gem_gtt.h | 23 +++--
2 files changed, 104 insertions(+), 93 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0607334..904457e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -269,7 +269,8 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
I915_CACHE_LLC, use_scratch);
while (num_entries) {
- struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
+ struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
+ struct page *page_table = pd->page_tables[pde].page;
last_pte = pte + num_entries;
if (last_pte > GEN8_PTES_PER_PT)
@@ -313,8 +314,11 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
break;
- if (pt_vaddr == NULL)
- pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[pdpe][pde]);
+ if (pt_vaddr == NULL) {
+ struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
+ struct page *page_table = pd->page_tables[pde].page;
+ pt_vaddr = kmap_atomic(page_table);
+ }
pt_vaddr[pte] =
gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -338,29 +342,33 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
}
}
-static void gen8_free_page_tables(struct page **pt_pages)
+static void gen8_free_page_tables(struct i915_pagedir *pd)
{
int i;
- if (pt_pages == NULL)
+ if (pd->page_tables == NULL)
return;
for (i = 0; i < I915_PDES_PER_PD; i++)
- if (pt_pages[i])
- __free_pages(pt_pages[i], 0);
+ if (pd->page_tables[i].page)
+ __free_page(pd->page_tables[i].page);
+}
+
+static void gen8_free_page_directories(struct i915_pagedir *pd)
+{
+ kfree(pd->page_tables);
+ __free_page(pd->page);
}
-static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
{
int i;
for (i = 0; i < ppgtt->num_pd_pages; i++) {
- gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
- kfree(ppgtt->gen8_pt_pages[i]);
+ gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
+ gen8_free_page_directories(&ppgtt->pdp.pagedir[i]);
kfree(ppgtt->gen8_pt_dma_addr[i]);
}
-
- __free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << PAGE_SHIFT));
}
static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
@@ -395,86 +403,73 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
gen8_ppgtt_free(ppgtt);
}
-static struct page **__gen8_alloc_page_tables(void)
+static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
{
- struct page **pt_pages;
int i;
- pt_pages = kcalloc(I915_PDES_PER_PD, sizeof(struct page *), GFP_KERNEL);
- if (!pt_pages)
- return ERR_PTR(-ENOMEM);
-
- for (i = 0; i < I915_PDES_PER_PD; i++) {
- pt_pages[i] = alloc_page(GFP_KERNEL);
- if (!pt_pages[i])
- goto bail;
+ for (i = 0; i < ppgtt->num_pd_pages; i++) {
+ ppgtt->gen8_pt_dma_addr[i] = kcalloc(I915_PDES_PER_PD,
+ sizeof(dma_addr_t),
+ GFP_KERNEL);
+ if (!ppgtt->gen8_pt_dma_addr[i])
+ return -ENOMEM;
}
- return pt_pages;
-
-bail:
- gen8_free_page_tables(pt_pages);
- kfree(pt_pages);
- return ERR_PTR(-ENOMEM);
+ return 0;
}
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
- const int max_pdp)
+static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
{
- struct page **pt_pages[GEN8_LEGACY_PDPES];
- int i, ret;
+ int i, j;
- for (i = 0; i < max_pdp; i++) {
- pt_pages[i] = __gen8_alloc_page_tables();
- if (IS_ERR(pt_pages[i])) {
- ret = PTR_ERR(pt_pages[i]);
- goto unwind_out;
+ for (i = 0; i < ppgtt->num_pd_pages; i++) {
+ for (j = 0; j < I915_PDES_PER_PD; j++) {
+ struct i915_pagetab *pt = &ppgtt->pdp.pagedir[i].page_tables[j];
+ pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ if (!pt->page)
+ goto unwind_out;
}
}
- /* NB: Avoid touching gen8_pt_pages until last to keep the allocation,
- * "atomic" - for cleanup purposes.
- */
- for (i = 0; i < max_pdp; i++)
- ppgtt->gen8_pt_pages[i] = pt_pages[i];
-
return 0;
unwind_out:
- while (i--) {
- gen8_free_page_tables(pt_pages[i]);
- kfree(pt_pages[i]);
- }
+ while (i--)
+ gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
- return ret;
+ return -ENOMEM;
}
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
+ const int max_pdp)
{
int i;
- for (i = 0; i < ppgtt->num_pd_pages; i++) {
- ppgtt->gen8_pt_dma_addr[i] = kcalloc(I915_PDES_PER_PD,
- sizeof(dma_addr_t),
- GFP_KERNEL);
- if (!ppgtt->gen8_pt_dma_addr[i])
- return -ENOMEM;
- }
+ for (i = 0; i < max_pdp; i++) {
+ struct i915_pagetab *pt;
+ pt = kcalloc(I915_PDES_PER_PD, sizeof(*pt), GFP_KERNEL);
+ if (!pt)
+ goto unwind_out;
- return 0;
-}
+ ppgtt->pdp.pagedir[i].page = alloc_page(GFP_KERNEL);
+ if (!ppgtt->pdp.pagedir[i].page)
+ goto unwind_out;
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
- const int max_pdp)
-{
- ppgtt->pd_pages = alloc_pages(GFP_KERNEL, get_order(max_pdp << PAGE_SHIFT));
- if (!ppgtt->pd_pages)
- return -ENOMEM;
+ ppgtt->pdp.pagedir[i].page_tables = pt;
+ }
- ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
+ ppgtt->num_pd_pages = max_pdp;
BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
return 0;
+
+unwind_out:
+ while (i--) {
+ kfree(ppgtt->pdp.pagedir[i].page_tables);
+ __free_page(ppgtt->pdp.pagedir[i].page);
+ }
+
+ return -ENOMEM;
}
static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
@@ -486,18 +481,19 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
if (ret)
return ret;
- ret = gen8_ppgtt_allocate_page_tables(ppgtt, max_pdp);
- if (ret) {
- __free_pages(ppgtt->pd_pages, get_order(max_pdp << PAGE_SHIFT));
- return ret;
- }
+ ret = gen8_ppgtt_allocate_page_tables(ppgtt);
+ if (ret)
+ goto err_out;
ppgtt->num_pd_entries = max_pdp * I915_PDES_PER_PD;
ret = gen8_ppgtt_allocate_dma(ppgtt);
- if (ret)
- gen8_ppgtt_free(ppgtt);
+ if (!ret)
+ return ret;
+ /* TODO: Check this for all cases */
+err_out:
+ gen8_ppgtt_free(ppgtt);
return ret;
}
@@ -508,7 +504,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
int ret;
pd_addr = pci_map_page(ppgtt->base.dev->pdev,
- &ppgtt->pd_pages[pdpe], 0,
+ ppgtt->pdp.pagedir[pdpe].page, 0,
PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
@@ -528,7 +524,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
struct page *p;
int ret;
- p = ppgtt->gen8_pt_pages[pdpe][pde];
+ p = ppgtt->pdp.pagedir[pdpe].page_tables[pde].page;
pt_addr = pci_map_page(ppgtt->base.dev->pdev,
p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
@@ -589,7 +585,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
*/
for (i = 0; i < max_pdp; i++) {
gen8_ppgtt_pde_t *pd_vaddr;
- pd_vaddr = kmap_atomic(&ppgtt->pd_pages[i]);
+ pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i].page);
for (j = 0; j < I915_PDES_PER_PD; j++) {
dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
@@ -650,7 +646,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
expected);
seq_printf(m, "\tPDE: %x\n", pd_entry);
- pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
+ pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
for (pte = 0; pte < GEN6_PTES_PER_PT; pte+=4) {
unsigned long va =
(pde * PAGE_SIZE * GEN6_PTES_PER_PT) +
@@ -861,7 +857,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
if (last_pte > GEN6_PTES_PER_PT)
last_pte = GEN6_PTES_PER_PT;
- pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
+ pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
for (i = pte; i < last_pte; i++)
pt_vaddr[i] = scratch_pte;
@@ -889,7 +885,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
pt_vaddr = NULL;
for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
if (pt_vaddr == NULL)
- pt_vaddr = kmap_atomic(ppgtt->pt_pages[pde]);
+ pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
pt_vaddr[pte] =
vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -923,8 +919,8 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
kfree(ppgtt->pt_dma_addr);
for (i = 0; i < ppgtt->num_pd_entries; i++)
- __free_page(ppgtt->pt_pages[i]);
- kfree(ppgtt->pt_pages);
+ __free_page(ppgtt->pd.page_tables[i].page);
+ kfree(ppgtt->pd.page_tables);
}
static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -981,22 +977,22 @@ alloc:
static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
{
+ struct i915_pagetab *pt;
int i;
- ppgtt->pt_pages = kcalloc(ppgtt->num_pd_entries, sizeof(struct page *),
- GFP_KERNEL);
-
- if (!ppgtt->pt_pages)
+ pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
+ if (!pt)
return -ENOMEM;
for (i = 0; i < ppgtt->num_pd_entries; i++) {
- ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
- if (!ppgtt->pt_pages[i]) {
+ pt[i].page = alloc_page(GFP_KERNEL);
+ if (!pt->page) {
gen6_ppgtt_free(ppgtt);
return -ENOMEM;
}
}
+ ppgtt->pd.page_tables = pt;
return 0;
}
@@ -1031,9 +1027,11 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
int i;
for (i = 0; i < ppgtt->num_pd_entries; i++) {
+ struct page *page;
dma_addr_t pt_addr;
- pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i], 0, 4096,
+ page = ppgtt->pd.page_tables[i].page;
+ pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
PCI_DMA_BIDIRECTIONAL);
if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
@@ -1077,7 +1075,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
ppgtt->base.cleanup = gen6_ppgtt_cleanup;
ppgtt->base.start = 0;
- ppgtt->base.total = ppgtt->num_pd_entries * GEN6_PTES_PER_PT * PAGE_SIZE;
+ ppgtt->base.total = ppgtt->num_pd_entries * GEN6_PTES_PER_PT * PAGE_SIZE;
ppgtt->debug_dump = gen6_dump_ppgtt;
ppgtt->pd_offset =
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index d432f2d..06c5c84 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -255,6 +255,20 @@ struct i915_gtt {
unsigned long *mappable_end);
};
+struct i915_pagetab {
+ struct page *page;
+};
+
+struct i915_pagedir {
+ struct page *page; /* NULL for GEN6-GEN7 */
+ struct i915_pagetab *page_tables;
+};
+
+struct i915_pagedirpo {
+ /* struct page *page; */
+ struct i915_pagedir pagedir[GEN8_LEGACY_PDPES];
+};
+
struct i915_hw_ppgtt {
struct i915_address_space base;
struct kref ref;
@@ -262,11 +276,6 @@ struct i915_hw_ppgtt {
unsigned num_pd_entries;
unsigned num_pd_pages; /* gen8+ */
union {
- struct page **pt_pages;
- struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
- };
- struct page *pd_pages;
- union {
uint32_t pd_offset;
dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
};
@@ -274,6 +283,10 @@ struct i915_hw_ppgtt {
dma_addr_t *pt_dma_addr;
dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
};
+ union {
+ struct i915_pagedirpo pdp;
+ struct i915_pagedir pd;
+ };
struct drm_i915_file_private *file_priv;
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 17/38] drm/i915: Complete page table structures
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (15 preceding siblings ...)
2014-10-07 17:11 ` [RFC 16/38] drm/i915: construct page table abstractions Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 18/38] drm/i915: Create page table allocators Michel Thierry
` (22 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
Move the remaining members over to the new page table structures.
This can be squashed with the previous commit if desire. The reasoning
is the same as that patch. I simply felt it is easier to review if split.
v2: In lrc: s/ppgtt->pd_dma_addr[i]/ppgtt->pdp.pagedir[i].daddr/
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
drivers/gpu/drm/i915/i915_debugfs.c | 2 +-
drivers/gpu/drm/i915/i915_gem_gtt.c | 87 +++++++++++++------------------------
drivers/gpu/drm/i915/i915_gem_gtt.h | 14 +++---
drivers/gpu/drm/i915/intel_lrc.c | 16 +++----
4 files changed, 46 insertions(+), 73 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index b5e5485..2833974 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2009,7 +2009,7 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
static void print_ppgtt(struct seq_file *m, struct i915_hw_ppgtt *ppgtt)
{
- seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
+ seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
}
static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int verbose)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 904457e..2d4c2a5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -242,7 +242,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
int used_pd = ppgtt->num_pd_entries / I915_PDES_PER_PD;
for (i = used_pd - 1; i >= 0; i--) {
- dma_addr_t addr = ppgtt->pd_dma_addr[i];
+ dma_addr_t addr = ppgtt->pdp.pagedir[i].daddr;
ret = gen8_write_pdp(ring, i, addr);
if (ret)
return ret;
@@ -367,7 +367,6 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
for (i = 0; i < ppgtt->num_pd_pages; i++) {
gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
gen8_free_page_directories(&ppgtt->pdp.pagedir[i]);
- kfree(ppgtt->gen8_pt_dma_addr[i]);
}
}
@@ -379,14 +378,14 @@ static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
for (i = 0; i < ppgtt->num_pd_pages; i++) {
/* TODO: In the future we'll support sparse mappings, so this
* will have to change. */
- if (!ppgtt->pd_dma_addr[i])
+ if (!ppgtt->pdp.pagedir[i].daddr)
continue;
- pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
+ pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i].daddr, PAGE_SIZE,
PCI_DMA_BIDIRECTIONAL);
for (j = 0; j < I915_PDES_PER_PD; j++) {
- dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+ dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
if (addr)
pci_unmap_page(hwdev, addr, PAGE_SIZE,
PCI_DMA_BIDIRECTIONAL);
@@ -403,31 +402,18 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
gen8_ppgtt_free(ppgtt);
}
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
-{
- int i;
-
- for (i = 0; i < ppgtt->num_pd_pages; i++) {
- ppgtt->gen8_pt_dma_addr[i] = kcalloc(I915_PDES_PER_PD,
- sizeof(dma_addr_t),
- GFP_KERNEL);
- if (!ppgtt->gen8_pt_dma_addr[i])
- return -ENOMEM;
- }
-
- return 0;
-}
-
static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
{
int i, j;
for (i = 0; i < ppgtt->num_pd_pages; i++) {
+ struct i915_pagedir *pd = &ppgtt->pdp.pagedir[i];
for (j = 0; j < I915_PDES_PER_PD; j++) {
- struct i915_pagetab *pt = &ppgtt->pdp.pagedir[i].page_tables[j];
+ struct i915_pagetab *pt = &pd->page_tables[j];
pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!pt->page)
goto unwind_out;
+
}
}
@@ -487,9 +473,7 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
ppgtt->num_pd_entries = max_pdp * I915_PDES_PER_PD;
- ret = gen8_ppgtt_allocate_dma(ppgtt);
- if (!ret)
- return ret;
+ return 0;
/* TODO: Check this for all cases */
err_out:
@@ -511,7 +495,7 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
if (ret)
return ret;
- ppgtt->pd_dma_addr[pdpe] = pd_addr;
+ ppgtt->pdp.pagedir[pdpe].daddr = pd_addr;
return 0;
}
@@ -521,17 +505,18 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
const int pde)
{
dma_addr_t pt_addr;
- struct page *p;
+ struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
+ struct i915_pagetab *pt = &pd->page_tables[pde];
+ struct page *p = pt->page;
int ret;
- p = ppgtt->pdp.pagedir[pdpe].page_tables[pde].page;
pt_addr = pci_map_page(ppgtt->base.dev->pdev,
p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
if (ret)
return ret;
- ppgtt->gen8_pt_dma_addr[pdpe][pde] = pt_addr;
+ pt->daddr = pt_addr;
return 0;
}
@@ -587,7 +572,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
gen8_ppgtt_pde_t *pd_vaddr;
pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i].page);
for (j = 0; j < I915_PDES_PER_PD; j++) {
- dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+ dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
I915_CACHE_LLC);
}
@@ -628,14 +613,15 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
- ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+ ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
seq_printf(m, " VM %p (pd_offset %x-%x):\n", vm,
- ppgtt->pd_offset, ppgtt->pd_offset + ppgtt->num_pd_entries);
+ ppgtt->pd.pd_offset,
+ ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
u32 expected;
gen6_gtt_pte_t *pt_vaddr;
- dma_addr_t pt_addr = ppgtt->pt_dma_addr[pde];
+ dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
pd_entry = readl(pd_addr + pde);
expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
@@ -678,8 +664,8 @@ static void gen6_map_single(struct i915_hw_ppgtt *ppgtt,
{
struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
uint32_t pd_entry;
- gen6_gtt_pte_t __iomem *pd_addr =
- (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
+ gen6_gtt_pte_t __iomem *pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm;
+ pd_addr += ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
pd_entry = GEN6_PDE_ADDR_ENCODE(daddr);
pd_entry |= GEN6_PDE_VALID;
@@ -694,18 +680,18 @@ static void gen6_map_page_tables(struct i915_hw_ppgtt *ppgtt)
struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
int i;
- WARN_ON(ppgtt->pd_offset & 0x3f);
+ WARN_ON(ppgtt->pd.pd_offset & 0x3f);
for (i = 0; i < ppgtt->num_pd_entries; i++)
- gen6_map_single(ppgtt, i, ppgtt->pt_dma_addr[i]);
+ gen6_map_single(ppgtt, i, ppgtt->pd.page_tables[i].daddr);
readl(dev_priv->gtt.gsm);
}
static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
{
- BUG_ON(ppgtt->pd_offset & 0x3f);
+ BUG_ON(ppgtt->pd.pd_offset & 0x3f);
- return (ppgtt->pd_offset / 64) << 16;
+ return (ppgtt->pd.pd_offset / 64) << 16;
}
static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
@@ -905,19 +891,16 @@ static void gen6_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
{
int i;
- if (ppgtt->pt_dma_addr) {
- for (i = 0; i < ppgtt->num_pd_entries; i++)
- pci_unmap_page(ppgtt->base.dev->pdev,
- ppgtt->pt_dma_addr[i],
- 4096, PCI_DMA_BIDIRECTIONAL);
- }
+ for (i = 0; i < ppgtt->num_pd_entries; i++)
+ pci_unmap_page(ppgtt->base.dev->pdev,
+ ppgtt->pd.page_tables[i].daddr,
+ 4096, PCI_DMA_BIDIRECTIONAL);
}
static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
{
int i;
- kfree(ppgtt->pt_dma_addr);
for (i = 0; i < ppgtt->num_pd_entries; i++)
__free_page(ppgtt->pd.page_tables[i].page);
kfree(ppgtt->pd.page_tables);
@@ -1010,14 +993,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
return ret;
}
- ppgtt->pt_dma_addr = kcalloc(ppgtt->num_pd_entries, sizeof(dma_addr_t),
- GFP_KERNEL);
- if (!ppgtt->pt_dma_addr) {
- drm_mm_remove_node(&ppgtt->node);
- gen6_ppgtt_free(ppgtt);
- return -ENOMEM;
- }
-
return 0;
}
@@ -1039,7 +1014,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
return -EIO;
}
- ppgtt->pt_dma_addr[i] = pt_addr;
+ ppgtt->pd.page_tables[i].daddr = pt_addr;
}
return 0;
@@ -1078,7 +1053,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->base.total = ppgtt->num_pd_entries * GEN6_PTES_PER_PT * PAGE_SIZE;
ppgtt->debug_dump = gen6_dump_ppgtt;
- ppgtt->pd_offset =
+ ppgtt->pd.pd_offset =
ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
@@ -1087,7 +1062,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
gen6_map_page_tables(ppgtt);
DRM_DEBUG("Adding PPGTT at offset %x\n",
- ppgtt->pd_offset << 10);
+ ppgtt->pd.pd_offset << 10);
return 0;
}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 06c5c84..e59f203 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -257,10 +257,16 @@ struct i915_gtt {
struct i915_pagetab {
struct page *page;
+ dma_addr_t daddr;
};
struct i915_pagedir {
struct page *page; /* NULL for GEN6-GEN7 */
+ union {
+ uint32_t pd_offset;
+ dma_addr_t daddr;
+ };
+
struct i915_pagetab *page_tables;
};
@@ -276,14 +282,6 @@ struct i915_hw_ppgtt {
unsigned num_pd_entries;
unsigned num_pd_pages; /* gen8+ */
union {
- uint32_t pd_offset;
- dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
- };
- union {
- dma_addr_t *pt_dma_addr;
- dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
- };
- union {
struct i915_pagedirpo pdp;
struct i915_pagedir pd;
};
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7864dac..5a623b5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1719,14 +1719,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
- reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[3]);
- reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[3]);
- reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[2]);
- reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[2]);
- reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[1]);
- reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[1]);
- reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pd_dma_addr[0]);
- reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pd_dma_addr[0]);
+ reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3].daddr);
+ reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3].daddr);
+ reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2].daddr);
+ reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2].daddr);
+ reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1].daddr);
+ reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1].daddr);
+ reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0].daddr);
+ reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0].daddr);
if (ring->id == RCS) {
reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 18/38] drm/i915: Create page table allocators
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (16 preceding siblings ...)
2014-10-07 17:11 ` [RFC 17/38] drm/i915: Complete page table structures Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 19/38] drm/i915: Generalize GEN6 mapping Michel Thierry
` (21 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain non-trivial complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.pagedir[i].daddr/pdp.pagedir[i]->daddr/
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 226 +++++++++++++++++++++++-------------
drivers/gpu/drm/i915/i915_gem_gtt.h | 4 +-
drivers/gpu/drm/i915/intel_lrc.c | 16 +--
3 files changed, 155 insertions(+), 91 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 2d4c2a5..8a79142 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -210,6 +210,102 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
return pte;
}
+static void free_pt_single(struct i915_pagetab *pt)
+{
+ if (WARN_ON(!pt->page))
+ return;
+ __free_page(pt->page);
+ kfree(pt);
+}
+
+static struct i915_pagetab *alloc_pt_single(void)
+{
+ struct i915_pagetab *pt;
+
+ pt = kzalloc(sizeof(*pt), GFP_KERNEL);
+ if (!pt)
+ return ERR_PTR(-ENOMEM);
+
+ pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ if (!pt->page) {
+ kfree(pt);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ return pt;
+}
+
+/**
+ * alloc_pt_range() - Allocate a multiple page tables
+ * @pd: The page directory which will have at least @count entries
+ * available to point to the allocated page tables.
+ * @pde: First page directory entry for which we are allocating.
+ * @count: Number of pages to allocate.
+ *
+ * Allocates multiple page table pages and sets the appropriate entries in the
+ * page table structure within the page directory. Function cleans up after
+ * itself on any failures.
+ *
+ * Return: 0 if allocation succeeded.
+ */
+static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
+{
+ int i, ret;
+
+ /* 512 is the max page tables per pagedir on any platform.
+ * TODO: make WARN after patch series is done
+ */
+ BUG_ON(pde + count > I915_PDES_PER_PD);
+
+ for (i = pde; i < pde + count; i++) {
+ struct i915_pagetab *pt = alloc_pt_single();
+ if (IS_ERR(pt)) {
+ ret = PTR_ERR(pt);
+ goto err_out;
+ }
+ WARN(pd->page_tables[i],
+ "Leaking page directory entry %d (%pa)\n",
+ i, pd->page_tables[i]);
+ pd->page_tables[i] = pt;
+ }
+
+ return 0;
+
+err_out:
+ while (i--)
+ free_pt_single(pd->page_tables[i]);
+ return ret;
+}
+
+static void __free_pd_single(struct i915_pagedir *pd)
+{
+ __free_page(pd->page);
+ kfree(pd);
+}
+
+#define free_pd_single(pd) do { \
+ if ((pd)->page) { \
+ __free_pd_single(pd); \
+ } \
+} while (0)
+
+static struct i915_pagedir *alloc_pd_single(void)
+{
+ struct i915_pagedir *pd;
+
+ pd = kzalloc(sizeof(*pd), GFP_KERNEL);
+ if (!pd)
+ return ERR_PTR(-ENOMEM);
+
+ pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ if (!pd->page) {
+ kfree(pd);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ return pd;
+}
+
/* Broadwell Page Directory Pointer Descriptors */
static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
uint64_t val)
@@ -242,7 +338,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
int used_pd = ppgtt->num_pd_entries / I915_PDES_PER_PD;
for (i = used_pd - 1; i >= 0; i--) {
- dma_addr_t addr = ppgtt->pdp.pagedir[i].daddr;
+ dma_addr_t addr = ppgtt->pdp.pagedir[i]->daddr;
ret = gen8_write_pdp(ring, i, addr);
if (ret)
return ret;
@@ -269,8 +365,9 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
I915_CACHE_LLC, use_scratch);
while (num_entries) {
- struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
- struct page *page_table = pd->page_tables[pde].page;
+ struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
+ struct i915_pagetab *pt = pd->page_tables[pde];
+ struct page *page_table = pt->page;
last_pte = pte + num_entries;
if (last_pte > GEN8_PTES_PER_PT)
@@ -315,8 +412,9 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
break;
if (pt_vaddr == NULL) {
- struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
- struct page *page_table = pd->page_tables[pde].page;
+ struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
+ struct i915_pagetab *pt = pd->page_tables[pde];
+ struct page *page_table = pt->page;
pt_vaddr = kmap_atomic(page_table);
}
@@ -346,18 +444,13 @@ static void gen8_free_page_tables(struct i915_pagedir *pd)
{
int i;
- if (pd->page_tables == NULL)
+ if (!pd->page)
return;
- for (i = 0; i < I915_PDES_PER_PD; i++)
- if (pd->page_tables[i].page)
- __free_page(pd->page_tables[i].page);
-}
-
-static void gen8_free_page_directories(struct i915_pagedir *pd)
-{
- kfree(pd->page_tables);
- __free_page(pd->page);
+ for (i = 0; i < I915_PDES_PER_PD; i++) {
+ free_pt_single(pd->page_tables[i]);
+ pd->page_tables[i] = NULL;
+ }
}
static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
@@ -365,8 +458,8 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
int i;
for (i = 0; i < ppgtt->num_pd_pages; i++) {
- gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
- gen8_free_page_directories(&ppgtt->pdp.pagedir[i]);
+ gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
+ free_pd_single(ppgtt->pdp.pagedir[i]);
}
}
@@ -378,14 +471,16 @@ static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
for (i = 0; i < ppgtt->num_pd_pages; i++) {
/* TODO: In the future we'll support sparse mappings, so this
* will have to change. */
- if (!ppgtt->pdp.pagedir[i].daddr)
+ if (!ppgtt->pdp.pagedir[i]->daddr)
continue;
- pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i].daddr, PAGE_SIZE,
+ pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i]->daddr, PAGE_SIZE,
PCI_DMA_BIDIRECTIONAL);
for (j = 0; j < I915_PDES_PER_PD; j++) {
- dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
+ struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
+ struct i915_pagetab *pt = pd->page_tables[j];
+ dma_addr_t addr = pt->daddr;
if (addr)
pci_unmap_page(hwdev, addr, PAGE_SIZE,
PCI_DMA_BIDIRECTIONAL);
@@ -404,24 +499,20 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
{
- int i, j;
+ int i, ret;
for (i = 0; i < ppgtt->num_pd_pages; i++) {
- struct i915_pagedir *pd = &ppgtt->pdp.pagedir[i];
- for (j = 0; j < I915_PDES_PER_PD; j++) {
- struct i915_pagetab *pt = &pd->page_tables[j];
- pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
- if (!pt->page)
- goto unwind_out;
-
- }
+ ret = alloc_pt_range(ppgtt->pdp.pagedir[i],
+ 0, I915_PDES_PER_PD);
+ if (ret)
+ goto unwind_out;
}
return 0;
unwind_out:
while (i--)
- gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
+ gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
return -ENOMEM;
}
@@ -432,16 +523,9 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
int i;
for (i = 0; i < max_pdp; i++) {
- struct i915_pagetab *pt;
- pt = kcalloc(I915_PDES_PER_PD, sizeof(*pt), GFP_KERNEL);
- if (!pt)
- goto unwind_out;
-
- ppgtt->pdp.pagedir[i].page = alloc_page(GFP_KERNEL);
- if (!ppgtt->pdp.pagedir[i].page)
+ ppgtt->pdp.pagedir[i] = alloc_pd_single();
+ if (IS_ERR(ppgtt->pdp.pagedir[i]))
goto unwind_out;
-
- ppgtt->pdp.pagedir[i].page_tables = pt;
}
ppgtt->num_pd_pages = max_pdp;
@@ -450,10 +534,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
return 0;
unwind_out:
- while (i--) {
- kfree(ppgtt->pdp.pagedir[i].page_tables);
- __free_page(ppgtt->pdp.pagedir[i].page);
- }
+ while (i--)
+ free_pd_single(ppgtt->pdp.pagedir[i]);
return -ENOMEM;
}
@@ -488,14 +570,14 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
int ret;
pd_addr = pci_map_page(ppgtt->base.dev->pdev,
- ppgtt->pdp.pagedir[pdpe].page, 0,
+ ppgtt->pdp.pagedir[pdpe]->page, 0,
PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
if (ret)
return ret;
- ppgtt->pdp.pagedir[pdpe].daddr = pd_addr;
+ ppgtt->pdp.pagedir[pdpe]->daddr = pd_addr;
return 0;
}
@@ -505,8 +587,8 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
const int pde)
{
dma_addr_t pt_addr;
- struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
- struct i915_pagetab *pt = &pd->page_tables[pde];
+ struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
+ struct i915_pagetab *pt = pd->page_tables[pde];
struct page *p = pt->page;
int ret;
@@ -569,10 +651,12 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
* will never need to touch the PDEs again.
*/
for (i = 0; i < max_pdp; i++) {
+ struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
gen8_ppgtt_pde_t *pd_vaddr;
- pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i].page);
+ pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i]->page);
for (j = 0; j < I915_PDES_PER_PD; j++) {
- dma_addr_t addr = ppgtt->pdp.pagedir[i].page_tables[j].daddr;
+ struct i915_pagetab *pt = pd->page_tables[j];
+ dma_addr_t addr = pt->daddr;
pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
I915_CACHE_LLC);
}
@@ -621,7 +705,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
u32 expected;
gen6_gtt_pte_t *pt_vaddr;
- dma_addr_t pt_addr = ppgtt->pd.page_tables[pde].daddr;
+ dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
pd_entry = readl(pd_addr + pde);
expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
@@ -632,7 +716,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
expected);
seq_printf(m, "\tPDE: %x\n", pd_entry);
- pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
+ pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde]->page);
for (pte = 0; pte < GEN6_PTES_PER_PT; pte+=4) {
unsigned long va =
(pde * PAGE_SIZE * GEN6_PTES_PER_PT) +
@@ -682,7 +766,7 @@ static void gen6_map_page_tables(struct i915_hw_ppgtt *ppgtt)
WARN_ON(ppgtt->pd.pd_offset & 0x3f);
for (i = 0; i < ppgtt->num_pd_entries; i++)
- gen6_map_single(ppgtt, i, ppgtt->pd.page_tables[i].daddr);
+ gen6_map_single(ppgtt, i, ppgtt->pd.page_tables[i]->daddr);
readl(dev_priv->gtt.gsm);
}
@@ -843,7 +927,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
if (last_pte > GEN6_PTES_PER_PT)
last_pte = GEN6_PTES_PER_PT;
- pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
+ pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde]->page);
for (i = pte; i < last_pte; i++)
pt_vaddr[i] = scratch_pte;
@@ -871,7 +955,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
pt_vaddr = NULL;
for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
if (pt_vaddr == NULL)
- pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde].page);
+ pt_vaddr = kmap_atomic(ppgtt->pd.page_tables[pde]->page);
pt_vaddr[pte] =
vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -893,7 +977,7 @@ static void gen6_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
for (i = 0; i < ppgtt->num_pd_entries; i++)
pci_unmap_page(ppgtt->base.dev->pdev,
- ppgtt->pd.page_tables[i].daddr,
+ ppgtt->pd.page_tables[i]->daddr,
4096, PCI_DMA_BIDIRECTIONAL);
}
@@ -902,8 +986,9 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
int i;
for (i = 0; i < ppgtt->num_pd_entries; i++)
- __free_page(ppgtt->pd.page_tables[i].page);
- kfree(ppgtt->pd.page_tables);
+ free_pt_single(ppgtt->pd.page_tables[i]);
+
+ free_pd_single(&ppgtt->pd);
}
static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
@@ -958,27 +1043,6 @@ alloc:
return 0;
}
-static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
- struct i915_pagetab *pt;
- int i;
-
- pt = kcalloc(ppgtt->num_pd_entries, sizeof(*pt), GFP_KERNEL);
- if (!pt)
- return -ENOMEM;
-
- for (i = 0; i < ppgtt->num_pd_entries; i++) {
- pt[i].page = alloc_page(GFP_KERNEL);
- if (!pt->page) {
- gen6_ppgtt_free(ppgtt);
- return -ENOMEM;
- }
- }
-
- ppgtt->pd.page_tables = pt;
- return 0;
-}
-
static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
{
int ret;
@@ -987,7 +1051,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
if (ret)
return ret;
- ret = gen6_ppgtt_allocate_page_tables(ppgtt);
+ ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
if (ret) {
drm_mm_remove_node(&ppgtt->node);
return ret;
@@ -1005,7 +1069,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
struct page *page;
dma_addr_t pt_addr;
- page = ppgtt->pd.page_tables[i].page;
+ page = ppgtt->pd.page_tables[i]->page;
pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
PCI_DMA_BIDIRECTIONAL);
@@ -1014,7 +1078,7 @@ static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
return -EIO;
}
- ppgtt->pd.page_tables[i].daddr = pt_addr;
+ ppgtt->pd.page_tables[i]->daddr = pt_addr;
}
return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index e59f203..329f75f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -267,12 +267,12 @@ struct i915_pagedir {
dma_addr_t daddr;
};
- struct i915_pagetab *page_tables;
+ struct i915_pagetab *page_tables[I915_PDES_PER_PD]; /* PDEs */
};
struct i915_pagedirpo {
/* struct page *page; */
- struct i915_pagedir pagedir[GEN8_LEGACY_PDPES];
+ struct i915_pagedir *pagedir[GEN8_LEGACY_PDPES];
};
struct i915_hw_ppgtt {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5a623b5..6607f56 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1719,14 +1719,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
- reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3].daddr);
- reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3].daddr);
- reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2].daddr);
- reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2].daddr);
- reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1].daddr);
- reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1].daddr);
- reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0].daddr);
- reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0].daddr);
+ reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3]->daddr);
+ reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3]->daddr);
+ reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2]->daddr);
+ reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2]->daddr);
+ reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1]->daddr);
+ reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1]->daddr);
+ reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0]->daddr);
+ reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0]->daddr);
if (ring->id == RCS) {
reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 19/38] drm/i915: Generalize GEN6 mapping
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (17 preceding siblings ...)
2014-10-07 17:11 ` [RFC 18/38] drm/i915: Create page table allocators Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 20/38] drm/i915: Clean up pagetable DMA map & unmap Michel Thierry
` (20 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
Having a more general way of doing mappings will allow the ability to
easy map and unmap a specific page table. Specifically in this case, we
pass down the page directory + entry, and the page table to map. This
works similarly to the x86 code.
The same work will need to happen for GEN8. At that point I will try to
combine functionality.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 62 ++++++++++++++++++++-----------------
drivers/gpu/drm/i915/i915_gem_gtt.h | 2 ++
2 files changed, 35 insertions(+), 29 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 8a79142..2f01601 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -687,18 +687,13 @@ bail:
static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
{
- struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
struct i915_address_space *vm = &ppgtt->base;
- gen6_gtt_pte_t __iomem *pd_addr;
gen6_gtt_pte_t scratch_pte;
uint32_t pd_entry;
int pte, pde;
scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
- pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
- ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
-
seq_printf(m, " VM %p (pd_offset %x-%x):\n", vm,
ppgtt->pd.pd_offset,
ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
@@ -706,7 +701,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
u32 expected;
gen6_gtt_pte_t *pt_vaddr;
dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
- pd_entry = readl(pd_addr + pde);
+ pd_entry = readl(ppgtt->pd_addr + pde);
expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
if (pd_entry != expected)
@@ -742,39 +737,43 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
}
}
-static void gen6_map_single(struct i915_hw_ppgtt *ppgtt,
- const unsigned pde_index,
- dma_addr_t daddr)
+/* Map pde (index) from the page directory @pd to the page table @pt */
+static void gen6_map_single(struct i915_pagedir *pd,
+ const int pde, struct i915_pagetab *pt)
{
- struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
- uint32_t pd_entry;
- gen6_gtt_pte_t __iomem *pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm;
- pd_addr += ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(pd, struct i915_hw_ppgtt, pd);
+ u32 pd_entry;
- pd_entry = GEN6_PDE_ADDR_ENCODE(daddr);
+ pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
pd_entry |= GEN6_PDE_VALID;
- writel(pd_entry, pd_addr + pde_index);
+ writel(pd_entry, ppgtt->pd_addr + pde);
+
+ /* XXX: Caller needs to make sure the write completes if necessary */
}
/* Map all the page tables found in the ppgtt structure to incrementing page
* directories. */
-static void gen6_map_page_tables(struct i915_hw_ppgtt *ppgtt)
+static void gen6_map_page_range(struct drm_i915_private *dev_priv,
+ struct i915_pagedir *pd, unsigned pde, size_t n)
{
- struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
- int i;
+ if (WARN_ON(pde + n > I915_PDES_PER_PD))
+ n = I915_PDES_PER_PD - pde;
- WARN_ON(ppgtt->pd.pd_offset & 0x3f);
- for (i = 0; i < ppgtt->num_pd_entries; i++)
- gen6_map_single(ppgtt, i, ppgtt->pd.page_tables[i]->daddr);
+ n += pde;
+ for (; pde < n; pde++)
+ gen6_map_single(pd, pde, pd->page_tables[pde]);
+
+ /* Make sure write is complete before other code can use this page
+ * table. Also require for WC mapped PTEs */
readl(dev_priv->gtt.gsm);
}
static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
{
BUG_ON(ppgtt->pd.pd_offset & 0x3f);
-
return (ppgtt->pd.pd_offset / 64) << 16;
}
@@ -1120,11 +1119,15 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->pd.pd_offset =
ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
+ ppgtt->pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
+ ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
+
+ gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
+
DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
ppgtt->node.size >> 20,
ppgtt->node.start / PAGE_SIZE);
- gen6_map_page_tables(ppgtt);
DRM_DEBUG("Adding PPGTT at offset %x\n",
ppgtt->pd.pd_offset << 10);
@@ -1402,13 +1405,14 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
/* TODO: Perhaps it shouldn't be gen6 specific */
- if (i915_is_ggtt(vm)) {
- if (dev_priv->mm.aliasing_ppgtt)
- gen6_map_page_tables(dev_priv->mm.aliasing_ppgtt);
- continue;
- }
- gen6_map_page_tables(container_of(vm, struct i915_hw_ppgtt, base));
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+
+ if (i915_is_ggtt(vm))
+ ppgtt = dev_priv->mm.aliasing_ppgtt;
+
+ gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
}
i915_ggtt_flush(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 329f75f..f3bdd40 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -288,6 +288,8 @@ struct i915_hw_ppgtt {
struct drm_i915_file_private *file_priv;
+ gen6_gtt_pte_t __iomem *pd_addr;
+
int (*enable)(struct i915_hw_ppgtt *ppgtt);
int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
struct intel_engine_cs *ring);
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 20/38] drm/i915: Clean up pagetable DMA map & unmap
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (18 preceding siblings ...)
2014-10-07 17:11 ` [RFC 19/38] drm/i915: Generalize GEN6 mapping Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 21/38] drm/i915: Always dma map page table allocations Michel Thierry
` (19 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
Map and unmap are common operations across all generations for
pagetables. With a simple helper, we can get a nice net code reduction
as well as simplified complexity.
There is some room for optimization here, for instance with the multiple
page mapping, that can be done in one pci_map operation. In that case
however, the max value we'll ever see there is 512, and so I believe the
simpler code makes this a worthwhile trade-off. Also, the range mapping
functions are place holders to help transition the code. Eventually,
mapping will only occur during a page allocation which will always be a
discrete operation.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 147 +++++++++++++++++++++---------------
1 file changed, 85 insertions(+), 62 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 2f01601..7c500f8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -210,6 +210,76 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
return pte;
}
+#define dma_unmap_pt_single(pt, dev) do { \
+ pci_unmap_page((dev)->pdev, (pt)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
+} while (0);
+
+
+static void dma_unmap_pt_range(struct i915_pagedir *pd,
+ unsigned pde, size_t n,
+ struct drm_device *dev)
+{
+ if (WARN_ON(pde + n > I915_PDES_PER_PD))
+ n = I915_PDES_PER_PD - pde;
+
+ n += pde;
+
+ for (; pde < n; pde++)
+ dma_unmap_pt_single(pd->page_tables[pde], dev);
+}
+
+/**
+ * dma_map_pt_single() - Create a dma mapping for a page table
+ * @pt: Page table to get a DMA map for
+ * @dev: drm device
+ *
+ * Page table allocations are unified across all gens. They always require a
+ * single 4k allocation, as well as a DMA mapping.
+ *
+ * Return: 0 if success.
+ */
+static int dma_map_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
+{
+ struct page *page;
+ dma_addr_t pt_addr;
+ int ret;
+
+ page = pt->page;
+ pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
+ PCI_DMA_BIDIRECTIONAL);
+
+ ret = pci_dma_mapping_error(dev->pdev, pt_addr);
+ if (ret)
+ return ret;
+
+ pt->daddr = pt_addr;
+
+ return 0;
+}
+
+static int dma_map_pt_range(struct i915_pagedir *pd,
+ unsigned pde, size_t n,
+ struct drm_device *dev)
+{
+ const int first = pde;
+
+ if (WARN_ON(pde + n > I915_PDES_PER_PD))
+ n = I915_PDES_PER_PD - pde;
+
+ n += pde;
+
+ for (; pde < n; pde++) {
+ int ret;
+ ret = dma_map_pt_single(pd->page_tables[pde], dev);
+ if (ret) {
+ dma_unmap_pt_range(pd, first, pde, dev);
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
static void free_pt_single(struct i915_pagetab *pt)
{
if (WARN_ON(!pt->page))
@@ -218,7 +288,7 @@ static void free_pt_single(struct i915_pagetab *pt)
kfree(pt);
}
-static struct i915_pagetab *alloc_pt_single(void)
+static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
{
struct i915_pagetab *pt;
@@ -241,6 +311,7 @@ static struct i915_pagetab *alloc_pt_single(void)
* available to point to the allocated page tables.
* @pde: First page directory entry for which we are allocating.
* @count: Number of pages to allocate.
+ * @dev DRM device used for DMA mapping.
*
* Allocates multiple page table pages and sets the appropriate entries in the
* page table structure within the page directory. Function cleans up after
@@ -248,7 +319,8 @@ static struct i915_pagetab *alloc_pt_single(void)
*
* Return: 0 if allocation succeeded.
*/
-static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
+static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count,
+ struct drm_device *dev)
{
int i, ret;
@@ -258,7 +330,7 @@ static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
BUG_ON(pde + count > I915_PDES_PER_PD);
for (i = pde; i < pde + count; i++) {
- struct i915_pagetab *pt = alloc_pt_single();
+ struct i915_pagetab *pt = alloc_pt_single(dev);
if (IS_ERR(pt)) {
ret = PTR_ERR(pt);
goto err_out;
@@ -503,7 +575,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
for (i = 0; i < ppgtt->num_pd_pages; i++) {
ret = alloc_pt_range(ppgtt->pdp.pagedir[i],
- 0, I915_PDES_PER_PD);
+ 0, I915_PDES_PER_PD, ppgtt->base.dev);
if (ret)
goto unwind_out;
}
@@ -582,27 +654,6 @@ static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
return 0;
}
-static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
- const int pdpe,
- const int pde)
-{
- dma_addr_t pt_addr;
- struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
- struct i915_pagetab *pt = pd->page_tables[pde];
- struct page *p = pt->page;
- int ret;
-
- pt_addr = pci_map_page(ppgtt->base.dev->pdev,
- p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
- ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
- if (ret)
- return ret;
-
- pt->daddr = pt_addr;
-
- return 0;
-}
-
/**
* GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
* with a net effect resembling a 2-level page table in normal x86 terms. Each
@@ -631,12 +682,15 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
* 2. Create DMA mappings for the page directories and page tables.
*/
for (i = 0; i < max_pdp; i++) {
+ struct i915_pagedir *pd;
ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
if (ret)
goto bail;
+ pd = ppgtt->pdp.pagedir[i];
+
for (j = 0; j < I915_PDES_PER_PD; j++) {
- ret = gen8_ppgtt_setup_page_tables(ppgtt, i, j);
+ ret = dma_map_pt_single(pd->page_tables[j], ppgtt->base.dev);
if (ret)
goto bail;
}
@@ -970,16 +1024,6 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
kunmap_atomic(pt_vaddr);
}
-static void gen6_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
-{
- int i;
-
- for (i = 0; i < ppgtt->num_pd_entries; i++)
- pci_unmap_page(ppgtt->base.dev->pdev,
- ppgtt->pd.page_tables[i]->daddr,
- 4096, PCI_DMA_BIDIRECTIONAL);
-}
-
static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
{
int i;
@@ -997,7 +1041,7 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
drm_mm_remove_node(&ppgtt->node);
- gen6_ppgtt_dma_unmap_pages(ppgtt);
+ dma_unmap_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries, vm->dev);
gen6_ppgtt_free(ppgtt);
}
@@ -1050,7 +1094,8 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
if (ret)
return ret;
- ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries);
+ ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+ ppgtt->base.dev);
if (ret) {
drm_mm_remove_node(&ppgtt->node);
return ret;
@@ -1059,29 +1104,6 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
return 0;
}
-static int gen6_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt)
-{
- struct drm_device *dev = ppgtt->base.dev;
- int i;
-
- for (i = 0; i < ppgtt->num_pd_entries; i++) {
- struct page *page;
- dma_addr_t pt_addr;
-
- page = ppgtt->pd.page_tables[i]->page;
- pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
- PCI_DMA_BIDIRECTIONAL);
-
- if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
- gen6_ppgtt_dma_unmap_pages(ppgtt);
- return -EIO;
- }
-
- ppgtt->pd.page_tables[i]->daddr = pt_addr;
- }
-
- return 0;
-}
static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
{
@@ -1103,7 +1125,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
if (ret)
return ret;
- ret = gen6_ppgtt_setup_page_tables(ppgtt);
+ ret = dma_map_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
+ ppgtt->base.dev);
if (ret) {
gen6_ppgtt_free(ppgtt);
return ret;
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 21/38] drm/i915: Always dma map page table allocations
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (19 preceding siblings ...)
2014-10-07 17:11 ` [RFC 20/38] drm/i915: Clean up pagetable DMA map & unmap Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 22/38] drm/i915: Consolidate dma mappings Michel Thierry
` (18 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
There is never a case where we don't want to do it. Since we've broken
up the allocations into nice clean helper functions, it's both easy and
obvious to do the dma mapping at the same time.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 78 ++++++++-----------------------------
1 file changed, 17 insertions(+), 61 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7c500f8..1fd2575 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -214,20 +214,6 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
pci_unmap_page((dev)->pdev, (pt)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
} while (0);
-
-static void dma_unmap_pt_range(struct i915_pagedir *pd,
- unsigned pde, size_t n,
- struct drm_device *dev)
-{
- if (WARN_ON(pde + n > I915_PDES_PER_PD))
- n = I915_PDES_PER_PD - pde;
-
- n += pde;
-
- for (; pde < n; pde++)
- dma_unmap_pt_single(pd->page_tables[pde], dev);
-}
-
/**
* dma_map_pt_single() - Create a dma mapping for a page table
* @pt: Page table to get a DMA map for
@@ -257,33 +243,12 @@ static int dma_map_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
return 0;
}
-static int dma_map_pt_range(struct i915_pagedir *pd,
- unsigned pde, size_t n,
- struct drm_device *dev)
-{
- const int first = pde;
-
- if (WARN_ON(pde + n > I915_PDES_PER_PD))
- n = I915_PDES_PER_PD - pde;
-
- n += pde;
-
- for (; pde < n; pde++) {
- int ret;
- ret = dma_map_pt_single(pd->page_tables[pde], dev);
- if (ret) {
- dma_unmap_pt_range(pd, first, pde, dev);
- return ret;
- }
- }
-
- return 0;
-}
-
-static void free_pt_single(struct i915_pagetab *pt)
+static void free_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
{
if (WARN_ON(!pt->page))
return;
+
+ dma_unmap_pt_single(pt, dev);
__free_page(pt->page);
kfree(pt);
}
@@ -291,6 +256,7 @@ static void free_pt_single(struct i915_pagetab *pt)
static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
{
struct i915_pagetab *pt;
+ int ret;
pt = kzalloc(sizeof(*pt), GFP_KERNEL);
if (!pt)
@@ -302,6 +268,13 @@ static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
return ERR_PTR(-ENOMEM);
}
+ ret = dma_map_pt_single(pt, dev);
+ if (ret) {
+ __free_page(pt->page);
+ kfree(pt);
+ return ERR_PTR(ret);
+ }
+
return pt;
}
@@ -345,7 +318,7 @@ static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count,
err_out:
while (i--)
- free_pt_single(pd->page_tables[i]);
+ free_pt_single(pd->page_tables[i], dev);
return ret;
}
@@ -512,7 +485,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
}
}
-static void gen8_free_page_tables(struct i915_pagedir *pd)
+static void gen8_free_page_tables(struct i915_pagedir *pd, struct drm_device *dev)
{
int i;
@@ -520,7 +493,7 @@ static void gen8_free_page_tables(struct i915_pagedir *pd)
return;
for (i = 0; i < I915_PDES_PER_PD; i++) {
- free_pt_single(pd->page_tables[i]);
+ free_pt_single(pd->page_tables[i], dev);
pd->page_tables[i] = NULL;
}
}
@@ -530,7 +503,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
int i;
for (i = 0; i < ppgtt->num_pd_pages; i++) {
- gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
+ gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
free_pd_single(ppgtt->pdp.pagedir[i]);
}
}
@@ -584,7 +557,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
unwind_out:
while (i--)
- gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
+ gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
return -ENOMEM;
}
@@ -682,18 +655,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
* 2. Create DMA mappings for the page directories and page tables.
*/
for (i = 0; i < max_pdp; i++) {
- struct i915_pagedir *pd;
ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
if (ret)
goto bail;
-
- pd = ppgtt->pdp.pagedir[i];
-
- for (j = 0; j < I915_PDES_PER_PD; j++) {
- ret = dma_map_pt_single(pd->page_tables[j], ppgtt->base.dev);
- if (ret)
- goto bail;
- }
}
/*
@@ -1029,7 +993,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
int i;
for (i = 0; i < ppgtt->num_pd_entries; i++)
- free_pt_single(ppgtt->pd.page_tables[i]);
+ free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
free_pd_single(&ppgtt->pd);
}
@@ -1041,7 +1005,6 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
drm_mm_remove_node(&ppgtt->node);
- dma_unmap_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries, vm->dev);
gen6_ppgtt_free(ppgtt);
}
@@ -1125,13 +1088,6 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
if (ret)
return ret;
- ret = dma_map_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
- ppgtt->base.dev);
- if (ret) {
- gen6_ppgtt_free(ppgtt);
- return ret;
- }
-
ppgtt->base.clear_range = gen6_ppgtt_clear_range;
ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
ppgtt->base.cleanup = gen6_ppgtt_cleanup;
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 22/38] drm/i915: Consolidate dma mappings
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (20 preceding siblings ...)
2014-10-07 17:11 ` [RFC 21/38] drm/i915: Always dma map page table allocations Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 23/38] drm/i915: Always dma map page directory allocations Michel Thierry
` (17 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
With a little bit of macro magic, and the fact that every page
table/dir/etc. we wish to map will have a page, and daddr member, we can
greatly simplify and reduce code.
The patch introduces an i915_dma_map/unmap which has the same semantics
as pci_map_page, but is 1 line, and doesn't require newlines, or local
variables to make it fit cleanly.
Notice that even the page allocation shares this same attribute. For
now, I am leaving that code untouched because the macro version would be
a bit on the big side - but it's a nice cleanup as well (IMO)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 56 ++++++++++++-------------------------
1 file changed, 18 insertions(+), 38 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 1fd2575..3bb728f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -210,45 +210,33 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
return pte;
}
-#define dma_unmap_pt_single(pt, dev) do { \
- pci_unmap_page((dev)->pdev, (pt)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
+#define i915_dma_unmap_single(px, dev) do { \
+ pci_unmap_page((dev)->pdev, (px)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
} while (0);
/**
- * dma_map_pt_single() - Create a dma mapping for a page table
- * @pt: Page table to get a DMA map for
+ * i915_dma_map_px_single() - Create a dma mapping for a page table/dir/etc.
+ * @px: Page table/dir/etc to get a DMA map for
* @dev: drm device
*
* Page table allocations are unified across all gens. They always require a
- * single 4k allocation, as well as a DMA mapping.
+ * single 4k allocation, as well as a DMA mapping. If we keep the structs
+ * symmetric here, the simple macro covers us for every page table type.
*
* Return: 0 if success.
*/
-static int dma_map_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
-{
- struct page *page;
- dma_addr_t pt_addr;
- int ret;
-
- page = pt->page;
- pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
- PCI_DMA_BIDIRECTIONAL);
-
- ret = pci_dma_mapping_error(dev->pdev, pt_addr);
- if (ret)
- return ret;
-
- pt->daddr = pt_addr;
-
- return 0;
-}
+#define i915_dma_map_px_single(px, dev) \
+ pci_dma_mapping_error((dev)->pdev, \
+ (px)->daddr = pci_map_page((dev)->pdev, \
+ (px)->page, 0, 4096, \
+ PCI_DMA_BIDIRECTIONAL))
static void free_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
{
if (WARN_ON(!pt->page))
return;
- dma_unmap_pt_single(pt, dev);
+ i915_dma_unmap_single(pt, dev);
__free_page(pt->page);
kfree(pt);
}
@@ -268,7 +256,7 @@ static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
return ERR_PTR(-ENOMEM);
}
- ret = dma_map_pt_single(pt, dev);
+ ret = i915_dma_map_px_single(pt, dev);
if (ret) {
__free_page(pt->page);
kfree(pt);
@@ -510,7 +498,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
{
- struct pci_dev *hwdev = ppgtt->base.dev->pdev;
+ struct drm_device *dev = ppgtt->base.dev;
int i, j;
for (i = 0; i < ppgtt->num_pd_pages; i++) {
@@ -519,16 +507,14 @@ static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
if (!ppgtt->pdp.pagedir[i]->daddr)
continue;
- pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i]->daddr, PAGE_SIZE,
- PCI_DMA_BIDIRECTIONAL);
+ i915_dma_unmap_single(ppgtt->pdp.pagedir[i], dev);
for (j = 0; j < I915_PDES_PER_PD; j++) {
struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
struct i915_pagetab *pt = pd->page_tables[j];
dma_addr_t addr = pt->daddr;
if (addr)
- pci_unmap_page(hwdev, addr, PAGE_SIZE,
- PCI_DMA_BIDIRECTIONAL);
+ i915_dma_unmap_single(pt, dev);
}
}
}
@@ -611,19 +597,13 @@ err_out:
static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
const int pdpe)
{
- dma_addr_t pd_addr;
int ret;
- pd_addr = pci_map_page(ppgtt->base.dev->pdev,
- ppgtt->pdp.pagedir[pdpe]->page, 0,
- PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-
- ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
+ ret = i915_dma_map_px_single(ppgtt->pdp.pagedir[pdpe],
+ ppgtt->base.dev);
if (ret)
return ret;
- ppgtt->pdp.pagedir[pdpe]->daddr = pd_addr;
-
return 0;
}
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 23/38] drm/i915: Always dma map page directory allocations
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (21 preceding siblings ...)
2014-10-07 17:11 ` [RFC 22/38] drm/i915: Consolidate dma mappings Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 24/38] drm/i915: Track GEN6 page table usage Michel Thierry
` (16 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
Similar to the patch a few back in the series, we can always map and
unmap page directories when we do their allocation and teardown. Page
directory pages only exist on gen8+, so this should only effect behavior
on those platforms.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 79 +++++++++----------------------------
1 file changed, 19 insertions(+), 60 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 3bb728f..54fbd87 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -310,21 +310,23 @@ err_out:
return ret;
}
-static void __free_pd_single(struct i915_pagedir *pd)
+static void __free_pd_single(struct i915_pagedir *pd, struct drm_device *dev)
{
+ i915_dma_unmap_single(pd, dev);
__free_page(pd->page);
kfree(pd);
}
-#define free_pd_single(pd) do { \
+#define free_pd_single(pd, dev) do { \
if ((pd)->page) { \
- __free_pd_single(pd); \
+ __free_pd_single(pd, dev); \
} \
} while (0)
-static struct i915_pagedir *alloc_pd_single(void)
+static struct i915_pagedir *alloc_pd_single(struct drm_device *dev)
{
struct i915_pagedir *pd;
+ int ret;
pd = kzalloc(sizeof(*pd), GFP_KERNEL);
if (!pd)
@@ -336,6 +338,13 @@ static struct i915_pagedir *alloc_pd_single(void)
return ERR_PTR(-ENOMEM);
}
+ ret = i915_dma_map_px_single(pd, dev);
+ if (ret) {
+ __free_page(pd->page);
+ kfree(pd);
+ return ERR_PTR(ret);
+ }
+
return pd;
}
@@ -492,30 +501,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
for (i = 0; i < ppgtt->num_pd_pages; i++) {
gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
- free_pd_single(ppgtt->pdp.pagedir[i]);
- }
-}
-
-static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
-{
- struct drm_device *dev = ppgtt->base.dev;
- int i, j;
-
- for (i = 0; i < ppgtt->num_pd_pages; i++) {
- /* TODO: In the future we'll support sparse mappings, so this
- * will have to change. */
- if (!ppgtt->pdp.pagedir[i]->daddr)
- continue;
-
- i915_dma_unmap_single(ppgtt->pdp.pagedir[i], dev);
-
- for (j = 0; j < I915_PDES_PER_PD; j++) {
- struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
- struct i915_pagetab *pt = pd->page_tables[j];
- dma_addr_t addr = pt->daddr;
- if (addr)
- i915_dma_unmap_single(pt, dev);
- }
+ free_pd_single(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
}
}
@@ -524,7 +510,6 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
- gen8_ppgtt_dma_unmap_pages(ppgtt);
gen8_ppgtt_free(ppgtt);
}
@@ -554,7 +539,7 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
int i;
for (i = 0; i < max_pdp; i++) {
- ppgtt->pdp.pagedir[i] = alloc_pd_single();
+ ppgtt->pdp.pagedir[i] = alloc_pd_single(ppgtt->base.dev);
if (IS_ERR(ppgtt->pdp.pagedir[i]))
goto unwind_out;
}
@@ -566,7 +551,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
unwind_out:
while (i--)
- free_pd_single(ppgtt->pdp.pagedir[i]);
+ free_pd_single(ppgtt->pdp.pagedir[i],
+ ppgtt->base.dev);
return -ENOMEM;
}
@@ -594,19 +580,6 @@ err_out:
return ret;
}
-static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
- const int pdpe)
-{
- int ret;
-
- ret = i915_dma_map_px_single(ppgtt->pdp.pagedir[pdpe],
- ppgtt->base.dev);
- if (ret)
- return ret;
-
- return 0;
-}
-
/**
* GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
* with a net effect resembling a 2-level page table in normal x86 terms. Each
@@ -632,16 +605,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
return ret;
/*
- * 2. Create DMA mappings for the page directories and page tables.
- */
- for (i = 0; i < max_pdp; i++) {
- ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
- if (ret)
- goto bail;
- }
-
- /*
- * 3. Map all the page directory entires to point to the page tables
+ * 2. Map all the page directory entires to point to the page tables
* we've allocated.
*
* For now, the PPGTT helper functions all require that the PDEs are
@@ -676,11 +640,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
ppgtt->num_pd_entries,
(ppgtt->num_pd_entries - min_pt_pages) + size % (1<<30));
return 0;
-
-bail:
- gen8_ppgtt_dma_unmap_pages(ppgtt);
- gen8_ppgtt_free(ppgtt);
- return ret;
}
static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
@@ -975,7 +934,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
for (i = 0; i < ppgtt->num_pd_entries; i++)
free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
- free_pd_single(&ppgtt->pd);
+ free_pd_single(&ppgtt->pd, ppgtt->base.dev);
}
static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 24/38] drm/i915: Track GEN6 page table usage
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (22 preceding siblings ...)
2014-10-07 17:11 ` [RFC 23/38] drm/i915: Always dma map page directory allocations Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 25/38] drm/i915: Extract context switch skip logic Michel Thierry
` (15 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
Instead of implementing the full tracking + dynamic allocation, this
patch does a bit less than half of the work, by tracking and warning on
unexpected conditions. The tracking itself follows which PTEs within a
page table are currently being used for objects. The next patch will
modify this to actually allocate the page tables only when necessary.
With the current patch there isn't much in the way of making a gen
agnostic range allocation function. However, in the next patch we'll add
more specificity which makes having separate functions a bit easier to
manage.
Notice that aliasing PPGTT is not managed here. The patch which actually
begins dynamic allocation/teardown explains the reasoning forthis.
v2: s/pdp.pagedir/pdp.pagedirs
Make a scratch page allocation helper
v3: For lrc, s/pdp.pagedir/pdp.pagedirs/
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 202 ++++++++++++++++++++++++++++--------
drivers/gpu/drm/i915/i915_gem_gtt.h | 115 ++++++++++++--------
drivers/gpu/drm/i915/intel_lrc.c | 16 +--
3 files changed, 237 insertions(+), 96 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 54fbd87..a2686a8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -70,10 +70,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
return has_full_ppgtt ? 2 : has_aliasing_ppgtt ? 1 : 0;
}
-
-static void ppgtt_bind_vma(struct i915_vma *vma,
- enum i915_cache_level cache_level,
- u32 flags);
+static int ppgtt_bind_vma(struct i915_vma *vma,
+ enum i915_cache_level cache_level,
+ u32 flags);
static void ppgtt_unbind_vma(struct i915_vma *vma);
static inline gen8_gtt_pte_t gen8_pte_encode(dma_addr_t addr,
@@ -231,37 +230,78 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
(px)->page, 0, 4096, \
PCI_DMA_BIDIRECTIONAL))
-static void free_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
+static void __free_pt_single(struct i915_pagetab *pt, struct drm_device *dev,
+ int scratch)
{
+ if (WARN(scratch ^ pt->scratch,
+ "Tried to free scratch = %d. Is scratch = %d\n",
+ scratch, pt->scratch))
+ return;
+
if (WARN_ON(!pt->page))
return;
+ if (!scratch) {
+ const size_t count = INTEL_INFO(dev)->gen >= 8 ?
+ GEN8_PTES_PER_PT : GEN6_PTES_PER_PT;
+ WARN(!bitmap_empty(pt->used_ptes, count),
+ "Free page table with %d used pages\n",
+ bitmap_weight(pt->used_ptes, count));
+ }
+
i915_dma_unmap_single(pt, dev);
__free_page(pt->page);
+ kfree(pt->used_ptes);
kfree(pt);
}
+#define free_pt_single(pt, dev) \
+ __free_pt_single(pt, dev, false)
+#define free_pt_scratch(pt, dev) \
+ __free_pt_single(pt, dev, true)
+
static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
{
struct i915_pagetab *pt;
- int ret;
+ const size_t count = INTEL_INFO(dev)->gen >= 8 ?
+ GEN8_PTES_PER_PT : GEN6_PTES_PER_PT;
+ int ret = -ENOMEM;
pt = kzalloc(sizeof(*pt), GFP_KERNEL);
if (!pt)
return ERR_PTR(-ENOMEM);
+ pt->used_ptes = kcalloc(BITS_TO_LONGS(count), sizeof(*pt->used_ptes),
+ GFP_KERNEL);
+
+ if (!pt->used_ptes)
+ goto fail_bitmap;
+
pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
- if (!pt->page) {
- kfree(pt);
- return ERR_PTR(-ENOMEM);
- }
+ if (!pt->page)
+ goto fail_page;
ret = i915_dma_map_px_single(pt, dev);
- if (ret) {
- __free_page(pt->page);
- kfree(pt);
- return ERR_PTR(ret);
- }
+ if (ret)
+ goto fail_dma;
+
+ return pt;
+
+fail_dma:
+ __free_page(pt->page);
+fail_page:
+ kfree(pt->used_ptes);
+fail_bitmap:
+ kfree(pt);
+
+ return ERR_PTR(ret);
+}
+
+static inline struct i915_pagetab *alloc_pt_scratch(struct drm_device *dev)
+{
+ struct i915_pagetab *pt = alloc_pt_single(dev);
+ if (!IS_ERR(pt))
+ pt->scratch = 1;
return pt;
}
@@ -380,7 +420,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
int used_pd = ppgtt->num_pd_entries / I915_PDES_PER_PD;
for (i = used_pd - 1; i >= 0; i--) {
- dma_addr_t addr = ppgtt->pdp.pagedir[i]->daddr;
+ dma_addr_t addr = ppgtt->pdp.pagedirs[i]->daddr;
ret = gen8_write_pdp(ring, i, addr);
if (ret)
return ret;
@@ -407,7 +447,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
I915_CACHE_LLC, use_scratch);
while (num_entries) {
- struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
+ struct i915_pagedir *pd = ppgtt->pdp.pagedirs[pdpe];
struct i915_pagetab *pt = pd->page_tables[pde];
struct page *page_table = pt->page;
@@ -454,7 +494,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
break;
if (pt_vaddr == NULL) {
- struct i915_pagedir *pd = ppgtt->pdp.pagedir[pdpe];
+ struct i915_pagedir *pd = ppgtt->pdp.pagedirs[pdpe];
struct i915_pagetab *pt = pd->page_tables[pde];
struct page *page_table = pt->page;
pt_vaddr = kmap_atomic(page_table);
@@ -500,8 +540,8 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
int i;
for (i = 0; i < ppgtt->num_pd_pages; i++) {
- gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
- free_pd_single(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
+ gen8_free_page_tables(ppgtt->pdp.pagedirs[i], ppgtt->base.dev);
+ free_pd_single(ppgtt->pdp.pagedirs[i], ppgtt->base.dev);
}
}
@@ -518,7 +558,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
int i, ret;
for (i = 0; i < ppgtt->num_pd_pages; i++) {
- ret = alloc_pt_range(ppgtt->pdp.pagedir[i],
+ ret = alloc_pt_range(ppgtt->pdp.pagedirs[i],
0, I915_PDES_PER_PD, ppgtt->base.dev);
if (ret)
goto unwind_out;
@@ -528,7 +568,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
unwind_out:
while (i--)
- gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
+ gen8_free_page_tables(ppgtt->pdp.pagedirs[i], ppgtt->base.dev);
return -ENOMEM;
}
@@ -539,8 +579,8 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
int i;
for (i = 0; i < max_pdp; i++) {
- ppgtt->pdp.pagedir[i] = alloc_pd_single(ppgtt->base.dev);
- if (IS_ERR(ppgtt->pdp.pagedir[i]))
+ ppgtt->pdp.pagedirs[i] = alloc_pd_single(ppgtt->base.dev);
+ if (IS_ERR(ppgtt->pdp.pagedirs[i]))
goto unwind_out;
}
@@ -551,7 +591,7 @@ static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
unwind_out:
while (i--)
- free_pd_single(ppgtt->pdp.pagedir[i],
+ free_pd_single(ppgtt->pdp.pagedirs[i],
ppgtt->base.dev);
return -ENOMEM;
@@ -613,9 +653,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
* will never need to touch the PDEs again.
*/
for (i = 0; i < max_pdp; i++) {
- struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
+ struct i915_pagedir *pd = ppgtt->pdp.pagedirs[i];
gen8_ppgtt_pde_t *pd_vaddr;
- pd_vaddr = kmap_atomic(ppgtt->pdp.pagedir[i]->page);
+ pd_vaddr = kmap_atomic(ppgtt->pdp.pagedirs[i]->page);
for (j = 0; j < I915_PDES_PER_PD; j++) {
struct i915_pagetab *pt = pd->page_tables[j];
dma_addr_t addr = pt->daddr;
@@ -713,15 +753,13 @@ static void gen6_map_single(struct i915_pagedir *pd,
/* Map all the page tables found in the ppgtt structure to incrementing page
* directories. */
static void gen6_map_page_range(struct drm_i915_private *dev_priv,
- struct i915_pagedir *pd, unsigned pde, size_t n)
+ struct i915_pagedir *pd, uint32_t start, uint32_t length)
{
- if (WARN_ON(pde + n > I915_PDES_PER_PD))
- n = I915_PDES_PER_PD - pde;
-
- n += pde;
+ struct i915_pagetab *pt;
+ uint32_t pde, temp;
- for (; pde < n; pde++)
- gen6_map_single(pd, pde, pd->page_tables[pde]);
+ gen6_for_each_pde(pt, pd, start, length, temp, pde)
+ gen6_map_single(pd, pde, pt);
/* Make sure write is complete before other code can use this page
* table. Also require for WC mapped PTEs */
@@ -927,6 +965,51 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
kunmap_atomic(pt_vaddr);
}
+static int gen6_alloc_va_range(struct i915_address_space *vm,
+ uint64_t start, uint64_t length)
+{
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+ struct i915_pagetab *pt;
+ uint32_t pde, temp;
+
+ gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+ int j;
+
+ DECLARE_BITMAP(tmp_bitmap, GEN6_PTES_PER_PT);
+ bitmap_zero(tmp_bitmap, GEN6_PTES_PER_PT);
+ bitmap_set(tmp_bitmap, gen6_pte_index(start),
+ gen6_pte_count(start, length));
+
+ /* TODO: To be done in the next patch. Map the page/insert
+ * entries here */
+ for_each_set_bit(j, tmp_bitmap, GEN6_PTES_PER_PT) {
+ if (test_bit(j, pt->used_ptes)) {
+ /* Check that we're changing cache levels */
+ }
+ }
+
+ bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+ GEN6_PTES_PER_PT);
+ }
+
+ return 0;
+}
+
+static void gen6_teardown_va_range(struct i915_address_space *vm,
+ uint64_t start, uint64_t length)
+{
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+ struct i915_pagetab *pt;
+ uint32_t pde, temp;
+
+ gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+ bitmap_clear(pt->used_ptes, gen6_pte_index(start),
+ gen6_pte_count(start, length));
+ }
+}
+
static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
{
int i;
@@ -934,6 +1017,7 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
for (i = 0; i < ppgtt->num_pd_entries; i++)
free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+ free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
free_pd_single(&ppgtt->pd, ppgtt->base.dev);
}
@@ -959,6 +1043,9 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
* size. We allocate at the top of the GTT to avoid fragmentation.
*/
BUG_ON(!drm_mm_initialized(&dev_priv->gtt.base.mm));
+ ppgtt->scratch_pt = alloc_pt_scratch(ppgtt->base.dev);
+ if (IS_ERR(ppgtt->scratch_pt))
+ return PTR_ERR(ppgtt->scratch_pt);
alloc:
ret = drm_mm_insert_node_in_range_generic(&dev_priv->gtt.base.mm,
&ppgtt->node, GEN6_PD_SIZE,
@@ -972,20 +1059,25 @@ alloc:
0, dev_priv->gtt.base.total,
0);
if (ret)
- return ret;
+ goto err_out;
retried = true;
goto alloc;
}
if (ret)
- return ret;
+ goto err_out;
+
if (ppgtt->node.start < dev_priv->gtt.mappable_end)
DRM_DEBUG("Forced to use aperture for PDEs\n");
ppgtt->num_pd_entries = I915_PDES_PER_PD;
return 0;
+
+err_out:
+ free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
+ return ret;
}
static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
@@ -1027,6 +1119,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
if (ret)
return ret;
+ ppgtt->base.allocate_va_range = gen6_alloc_va_range;
+ ppgtt->base.teardown_va_range = gen6_teardown_va_range;
ppgtt->base.clear_range = gen6_ppgtt_clear_range;
ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
ppgtt->base.cleanup = gen6_ppgtt_cleanup;
@@ -1040,7 +1134,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
- gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
+ gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
ppgtt->node.size >> 20,
@@ -1156,17 +1250,28 @@ void i915_ppgtt_release(struct kref *kref)
kfree(ppgtt);
}
-static void
+static int
ppgtt_bind_vma(struct i915_vma *vma,
enum i915_cache_level cache_level,
u32 flags)
{
+ int ret;
+
/* Currently applicable only to VLV */
if (vma->obj->gt_ro)
flags |= PTE_READ_ONLY;
+ if (vma->vm->allocate_va_range) {
+ ret = vma->vm->allocate_va_range(vma->vm,
+ vma->node.start,
+ vma->node.size);
+ if (ret)
+ return ret;
+ }
+
vma->vm->insert_entries(vma->vm, vma->obj->pages, vma->node.start,
cache_level, flags);
+ return 0;
}
static void ppgtt_unbind_vma(struct i915_vma *vma)
@@ -1175,6 +1280,9 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
vma->node.start,
vma->obj->base.size,
true);
+ if (vma->vm->teardown_va_range)
+ vma->vm->teardown_va_range(vma->vm,
+ vma->node.start, vma->node.size);
}
extern int intel_iommu_gfx_mapped;
@@ -1495,9 +1603,9 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
}
-static void i915_ggtt_bind_vma(struct i915_vma *vma,
- enum i915_cache_level cache_level,
- u32 unused)
+static int i915_ggtt_bind_vma(struct i915_vma *vma,
+ enum i915_cache_level cache_level,
+ u32 unused)
{
const unsigned long entry = vma->node.start >> PAGE_SHIFT;
unsigned int flags = (cache_level == I915_CACHE_NONE) ?
@@ -1506,6 +1614,8 @@ static void i915_ggtt_bind_vma(struct i915_vma *vma,
BUG_ON(!i915_is_ggtt(vma->vm));
intel_gtt_insert_sg_entries(vma->obj->pages, entry, flags);
vma->obj->has_global_gtt_mapping = 1;
+
+ return 0;
}
static void i915_ggtt_clear_range(struct i915_address_space *vm,
@@ -1528,9 +1638,9 @@ static void i915_ggtt_unbind_vma(struct i915_vma *vma)
intel_gtt_clear_range(first, size);
}
-static void ggtt_bind_vma(struct i915_vma *vma,
- enum i915_cache_level cache_level,
- u32 flags)
+static int ggtt_bind_vma(struct i915_vma *vma,
+ enum i915_cache_level cache_level,
+ u32 flags)
{
struct drm_device *dev = vma->vm->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1562,7 +1672,7 @@ static void ggtt_bind_vma(struct i915_vma *vma,
}
if (!(flags & ALIASING_BIND))
- return;
+ return 0;
if (dev_priv->mm.aliasing_ppgtt &&
(!obj->has_aliasing_ppgtt_mapping ||
@@ -1574,6 +1684,8 @@ static void ggtt_bind_vma(struct i915_vma *vma,
cache_level, flags);
vma->obj->has_aliasing_ppgtt_mapping = 1;
}
+
+ return 0;
}
static void ggtt_unbind_vma(struct i915_vma *vma)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index f3bdd40..eb225ab 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -172,9 +172,33 @@ struct i915_vma {
/* Only use this if you know you want a strictly aliased binding */
#define ALIASING_BIND (1<<1)
#define PTE_READ_ONLY (1<<2)
- void (*bind_vma)(struct i915_vma *vma,
- enum i915_cache_level cache_level,
- u32 flags);
+ int (*bind_vma)(struct i915_vma *vma,
+ enum i915_cache_level cache_level,
+ u32 flags);
+};
+
+
+struct i915_pagetab {
+ struct page *page;
+ dma_addr_t daddr;
+
+ unsigned long *used_ptes;
+ unsigned int scratch:1;
+};
+
+struct i915_pagedir {
+ struct page *page; /* NULL for GEN6-GEN7 */
+ union {
+ uint32_t pd_offset;
+ dma_addr_t daddr;
+ };
+
+ struct i915_pagetab *page_tables[I915_PDES_PER_PD];
+};
+
+struct i915_pagedirpo {
+ /* struct page *page; */
+ struct i915_pagedir *pagedirs[GEN8_LEGACY_PDPES];
};
struct i915_address_space {
@@ -216,6 +240,12 @@ struct i915_address_space {
gen6_gtt_pte_t (*pte_encode)(dma_addr_t addr,
enum i915_cache_level level,
bool valid, u32 flags); /* Create a valid PTE */
+ int (*allocate_va_range)(struct i915_address_space *vm,
+ uint64_t start,
+ uint64_t length);
+ void (*teardown_va_range)(struct i915_address_space *vm,
+ uint64_t start,
+ uint64_t length);
void (*clear_range)(struct i915_address_space *vm,
uint64_t start,
uint64_t length,
@@ -227,6 +257,29 @@ struct i915_address_space {
void (*cleanup)(struct i915_address_space *vm);
};
+struct i915_hw_ppgtt {
+ struct i915_address_space base;
+ struct kref ref;
+ struct drm_mm_node node;
+ unsigned num_pd_entries;
+ unsigned num_pd_pages; /* gen8+ */
+ union {
+ struct i915_pagedirpo pdp;
+ struct i915_pagedir pd;
+ };
+
+ struct i915_pagetab *scratch_pt;
+
+ struct drm_i915_file_private *file_priv;
+
+ gen6_gtt_pte_t __iomem *pd_addr;
+
+ int (*enable)(struct i915_hw_ppgtt *ppgtt);
+ int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
+ struct intel_engine_cs *ring);
+ void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
+};
+
/* The Graphics Translation Table is the way in which GEN hardware translates a
* Graphics Virtual Address into a Physical Address. In addition to the normal
* collateral associated with any va->pa translations GEN hardware also has a
@@ -255,46 +308,22 @@ struct i915_gtt {
unsigned long *mappable_end);
};
-struct i915_pagetab {
- struct page *page;
- dma_addr_t daddr;
-};
-
-struct i915_pagedir {
- struct page *page; /* NULL for GEN6-GEN7 */
- union {
- uint32_t pd_offset;
- dma_addr_t daddr;
- };
-
- struct i915_pagetab *page_tables[I915_PDES_PER_PD]; /* PDEs */
-};
-
-struct i915_pagedirpo {
- /* struct page *page; */
- struct i915_pagedir *pagedir[GEN8_LEGACY_PDPES];
-};
-
-struct i915_hw_ppgtt {
- struct i915_address_space base;
- struct kref ref;
- struct drm_mm_node node;
- unsigned num_pd_entries;
- unsigned num_pd_pages; /* gen8+ */
- union {
- struct i915_pagedirpo pdp;
- struct i915_pagedir pd;
- };
-
- struct drm_i915_file_private *file_priv;
-
- gen6_gtt_pte_t __iomem *pd_addr;
-
- int (*enable)(struct i915_hw_ppgtt *ppgtt);
- int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
- struct intel_engine_cs *ring);
- void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
-};
+/* For each pde iterates over every pde between from start until start + length.
+ * If start, and start+length are not perfectly divisible, the macro will round
+ * down, and up as needed. The macro modifies pde, start, and length. Dev is
+ * only used to differentiate shift values. Temp is temp. On gen6/7, start = 0,
+ * and length = 2G effectively iterates over every PDE in the system. On gen8+
+ * it simply iterates over every page directory entry in a page directory.
+ *
+ * XXX: temp is not actually needed, but it saves doing the ALIGN operation.
+ */
+#define gen6_for_each_pde(pt, pd, start, length, temp, iter) \
+ for (iter = gen6_pde_index(start), pt = (pd)->page_tables[iter]; \
+ length > 0 && iter < I915_PDES_PER_PD; \
+ pt = (pd)->page_tables[++iter], \
+ temp = ALIGN(start+1, 1 << GEN6_PDE_SHIFT) - start, \
+ temp = min(temp, (unsigned)length), \
+ start += temp, length -= temp)
static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
{
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 6607f56..b426fe6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1719,14 +1719,14 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
- reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[3]->daddr);
- reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[3]->daddr);
- reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[2]->daddr);
- reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[2]->daddr);
- reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[1]->daddr);
- reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[1]->daddr);
- reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedir[0]->daddr);
- reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedir[0]->daddr);
+ reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedirs[3]->daddr);
+ reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedirs[3]->daddr);
+ reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedirs[2]->daddr);
+ reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedirs[2]->daddr);
+ reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedirs[1]->daddr);
+ reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedirs[1]->daddr);
+ reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedirs[0]->daddr);
+ reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedirs[0]->daddr);
if (ring->id == RCS) {
reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 25/38] drm/i915: Extract context switch skip logic
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (23 preceding siblings ...)
2014-10-07 17:11 ` [RFC 24/38] drm/i915: Track GEN6 page table usage Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 26/38] drm/i915: Track page table reload need Michel Thierry
` (14 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
We have some fanciness coming up. This patch just breaks out the logic.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_context.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 813af4c..4b11c64 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -675,6 +675,16 @@ unpin_out:
return ret;
}
+static inline bool should_skip_switch(struct intel_engine_cs *ring,
+ struct intel_context *from,
+ struct intel_context *to)
+{
+ if (from == to && !to->remap_slice)
+ return true;
+
+ return false;
+}
+
/**
* i915_switch_context() - perform a GPU context switch.
* @ring: ring for which we'll execute the context switch
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 26/38] drm/i915: Track page table reload need
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (24 preceding siblings ...)
2014-10-07 17:11 ` [RFC 25/38] drm/i915: Extract context switch skip logic Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 27/38] drm/i915: Initialize all contexts Michel Thierry
` (13 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
This patch was formerly known as, "Force pd restore when PDEs change,
gen6-7." I had to change the name because it is needed for GEN8 too.
The real issue this is trying to solve is when a new object is mapped
into the current address space. The GPU does not snoop the new mapping
so we must do the gen specific action to reload the page tables.
GEN8 and GEN7 do differ in the way they load page tables for the RCS.
GEN8 does so with the context restore, while GEN7 requires the proper
load commands in the command streamer. Non-render is similar for both.
Caveat for GEN7
The docs say you cannot change the PDEs of a currently running context.
We never map new PDEs of a running context, and expect them to be
present - so I think this is okay. (We can unmap, but this should also
be okay since we only unmap unreferenced objects that the GPU shouldn't
be tryingto va->pa xlate.) The MI_SET_CONTEXT command does have a flag
to signal that even if the context is the same, force a reload. It's
unclear exactly what this does, but I have a hunch it's the right thing
to do.
The logic assumes that we always emit a context switch after mapping new
PDEs, and before we submit a batch. This is the case today, and has been
the case since the inception of hardware contexts. A note in the comment
let's the user know.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
squash! drm/i915: Force pd restore when PDEs change, gen6-7
It's not just for gen8. If the current context has mappings change, we
need a context reload to switch
v2: Rebased after ppgtt clean up patches. Split the warning for aliasing
and true ppgtt options. And do not break aliasing ppgtt, where to->ppgtt
is always null.
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
drivers/gpu/drm/i915/i915_gem_context.c | 67 +++++++++++++++++++++---------
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++++
drivers/gpu/drm/i915/i915_gem_gtt.c | 15 ++++++-
drivers/gpu/drm/i915/i915_gem_gtt.h | 2 +
4 files changed, 75 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 4b11c64..7218849 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -515,6 +515,42 @@ mi_set_context(struct intel_engine_cs *ring,
return ret;
}
+static inline bool should_skip_switch(struct intel_engine_cs *ring,
+ struct intel_context *from,
+ struct intel_context *to)
+{
+ struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+ if (to->remap_slice)
+ return false;
+
+ if (to->ppgtt) {
+ if (from == to && !test_bit(ring->id, &to->ppgtt->base.pd_reload_mask))
+ return true;
+ } else {
+ if (from == to && !test_bit(ring->id, &dev_priv->mm.aliasing_ppgtt->base.pd_reload_mask))
+ return true;
+ }
+
+ return false;
+}
+
+static bool
+needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
+{
+ struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+ return ((INTEL_INFO(ring->dev)->gen < 8) ||
+ (ring != &dev_priv->ring[RCS])) && to->ppgtt;
+}
+
+static bool
+needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
+{
+ return IS_GEN8(ring->dev) &&
+ (to->ppgtt || &to->ppgtt->base.pd_reload_mask);
+}
+
static int do_switch(struct intel_engine_cs *ring,
struct intel_context *to)
{
@@ -522,9 +558,6 @@ static int do_switch(struct intel_engine_cs *ring,
struct intel_context *from = ring->last_context;
u32 hw_flags = 0;
bool uninitialized = false;
- bool needs_pd_load_pre = ((INTEL_INFO(ring->dev)->gen < 8) ||
- (ring != &dev_priv->ring[RCS])) && to->ppgtt;
- bool needs_pd_load_post = false;
int ret, i;
if (from != NULL && ring == &dev_priv->ring[RCS]) {
@@ -532,7 +565,7 @@ static int do_switch(struct intel_engine_cs *ring,
BUG_ON(!i915_gem_obj_is_pinned(from->legacy_hw_ctx.rcs_state));
}
- if (from == to && !to->remap_slice)
+ if (should_skip_switch(ring, from, to))
return 0;
/* Trying to pin first makes error handling easier. */
@@ -550,7 +583,7 @@ static int do_switch(struct intel_engine_cs *ring,
*/
from = ring->last_context;
- if (needs_pd_load_pre) {
+ if (needs_pd_load_pre(ring, to)) {
/* Older GENs and non render rings still want the load first,
* "PP_DCLV followed by PP_DIR_BASE register through Load
* Register Immediate commands in Ring Buffer before submitting
@@ -558,6 +591,12 @@ static int do_switch(struct intel_engine_cs *ring,
ret = to->ppgtt->switch_mm(to->ppgtt, ring);
if (ret)
goto unpin_out;
+
+ /* Doing a PD load always reloads the page dirs */
+ if (to->ppgtt)
+ clear_bit(ring->id, &to->ppgtt->base.pd_reload_mask);
+ else
+ clear_bit(ring->id, &dev_priv->mm.aliasing_ppgtt->base.pd_reload_mask);
}
if (ring != &dev_priv->ring[RCS]) {
@@ -590,16 +629,16 @@ static int do_switch(struct intel_engine_cs *ring,
* XXX: If we implemented page directory eviction code, this
* optimization needs to be removed.
*/
- if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to)) {
+ if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
hw_flags |= MI_RESTORE_INHIBIT;
- needs_pd_load_post = to->ppgtt && IS_GEN8(ring->dev);
- }
+ else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->base.pd_reload_mask))
+ hw_flags |= MI_FORCE_RESTORE;
ret = mi_set_context(ring, to, hw_flags);
if (ret)
goto unpin_out;
- if (needs_pd_load_post) {
+ if (needs_pd_load_post(ring, to)) {
ret = to->ppgtt->switch_mm(to->ppgtt, ring);
/* The hardware context switch is emitted, but we haven't
* actually changed the state - so it's probably safe to bail
@@ -675,16 +714,6 @@ unpin_out:
return ret;
}
-static inline bool should_skip_switch(struct intel_engine_cs *ring,
- struct intel_context *from,
- struct intel_context *to)
-{
- if (from == to && !to->remap_slice)
- return true;
-
- return false;
-}
-
/**
* i915_switch_context() - perform a GPU context switch.
* @ring: ring for which we'll execute the context switch
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index d3a89e6..a90c702 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1093,6 +1093,13 @@ i915_gem_ringbuffer_submission(struct drm_device *dev, struct drm_file *file,
if (ret)
goto error;
+ if (ctx->ppgtt)
+ WARN(ctx->ppgtt->base.pd_reload_mask & (1<<ring->id),
+ "%s didn't clear reload\n", ring->name);
+ else
+ WARN(dev_priv->mm.aliasing_ppgtt->base.pd_reload_mask &
+ (1<<ring->id), "%s didn't clear reload\n", ring->name);
+
instp_mode = args->flags & I915_EXEC_CONSTANTS_MASK;
instp_mask = I915_EXEC_CONSTANTS_MASK;
switch (instp_mode) {
@@ -1345,6 +1352,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
if (ret)
goto err;
+ /* XXX: Reserve has possibly change PDEs which means we must do a
+ * context switch before we can coherently read some of the reserved
+ * VMAs. */
+
/* The objects are in their final locations, apply the relocations. */
if (need_relocs)
ret = i915_gem_execbuffer_relocate(eb);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a2686a8..ab02dad 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1234,6 +1234,15 @@ i915_ppgtt_create(struct drm_device *dev, struct drm_i915_file_private *fpriv)
return ppgtt;
}
+/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
+ * are switching between contexts with the same LRCA, we also must do a force
+ * restore.
+ */
+#define ppgtt_invalidate_tlbs(vm) do {\
+ /* If current vm != vm, */ \
+ vm->pd_reload_mask = INTEL_INFO(vm->dev)->ring_mask; \
+} while (0)
+
void i915_ppgtt_release(struct kref *kref)
{
struct i915_hw_ppgtt *ppgtt =
@@ -1267,6 +1276,8 @@ ppgtt_bind_vma(struct i915_vma *vma,
vma->node.size);
if (ret)
return ret;
+
+ ppgtt_invalidate_tlbs(vma->vm);
}
vma->vm->insert_entries(vma->vm, vma->obj->pages, vma->node.start,
@@ -1280,9 +1291,11 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
vma->node.start,
vma->obj->base.size,
true);
- if (vma->vm->teardown_va_range)
+ if (vma->vm->teardown_va_range) {
vma->vm->teardown_va_range(vma->vm,
vma->node.start, vma->node.size);
+ ppgtt_invalidate_tlbs(vma->vm);
+ }
}
extern int intel_iommu_gfx_mapped;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index eb225ab..044ac67 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -213,6 +213,8 @@ struct i915_address_space {
struct page *page;
} scratch;
+ unsigned long pd_reload_mask;
+
/**
* List of objects currently involved in rendering.
*
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 27/38] drm/i915: Initialize all contexts
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (25 preceding siblings ...)
2014-10-07 17:11 ` [RFC 26/38] drm/i915: Track page table reload need Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 28/38] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
` (12 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
The problem is we're going to switch to a new context, which could be
the default context. The plan was to use restore inhibit, which would be
fine, except if we are using dynamic page tables (which we will). If we
use dynamic page tables and we don't load new page tables, the previous
page tables might go away, and future operations will fault.
CTXA runs.
switch to default, restore inhibit
CTXA dies and has its address space taken away.
Run CTXB, tries to save using the context A's address space - this
fails.
The general solution is to make sure every context has it's own state,
and its own address space. For cases when we must restore inhibit, first
thing we do is load a valid address space. I thought this would be
enough, but apparently there are references within the context itself
which will refer to the old address space - therefore, we also must
reinitialize.
It was tricky to track this down as we don't have much insight into what
happens in a context save.
This is required for the next patch which enables dynamic page tables.
v2: to->ppgtt is only valid in full ppgtt.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
drivers/gpu/drm/i915/i915_gem_context.c | 25 +++++++++++--------------
1 file changed, 11 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 7218849..eb9c4b3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -544,13 +544,6 @@ needs_pd_load_pre(struct intel_engine_cs *ring, struct intel_context *to)
(ring != &dev_priv->ring[RCS])) && to->ppgtt;
}
-static bool
-needs_pd_load_post(struct intel_engine_cs *ring, struct intel_context *to)
-{
- return IS_GEN8(ring->dev) &&
- (to->ppgtt || &to->ppgtt->base.pd_reload_mask);
-}
-
static int do_switch(struct intel_engine_cs *ring,
struct intel_context *to)
{
@@ -625,20 +618,24 @@ static int do_switch(struct intel_engine_cs *ring,
/* GEN8 does *not* require an explicit reload if the PDPs have been
* setup, and we do not wish to move them.
- *
- * XXX: If we implemented page directory eviction code, this
- * optimization needs to be removed.
*/
- if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to))
+ if (!to->legacy_hw_ctx.initialized) {
hw_flags |= MI_RESTORE_INHIBIT;
- else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->base.pd_reload_mask))
+ /* NB: If we inhibit the restore, the context is not allowed to
+ * die because future work may end up depending on valid address
+ * space. This means we must enforce that a page table load
+ * occur when this occurs. */
+ } else if (to->ppgtt && test_and_clear_bit(ring->id, &to->ppgtt->base.pd_reload_mask))
hw_flags |= MI_FORCE_RESTORE;
ret = mi_set_context(ring, to, hw_flags);
if (ret)
goto unpin_out;
- if (needs_pd_load_post(ring, to)) {
+ if (IS_GEN8(ring->dev) && to->ppgtt && (hw_flags & MI_RESTORE_INHIBIT)) {
+ /* We have a valid page directory (scratch) to switch to. This
+ * allows the old VM to be freed. Note that if anything occurs
+ * between the set context, and here, we are f*cked */
ret = to->ppgtt->switch_mm(to->ppgtt, ring);
/* The hardware context switch is emitted, but we haven't
* actually changed the state - so it's probably safe to bail
@@ -687,7 +684,7 @@ static int do_switch(struct intel_engine_cs *ring,
i915_gem_context_unreference(from);
}
- uninitialized = !to->legacy_hw_ctx.initialized && from == NULL;
+ uninitialized = !to->legacy_hw_ctx.initialized;
to->legacy_hw_ctx.initialized = true;
done:
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 28/38] drm/i915: Finish gen6/7 dynamic page table allocation
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (26 preceding siblings ...)
2014-10-07 17:11 ` [RFC 27/38] drm/i915: Initialize all contexts Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 29/38] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
` (11 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
This patch continues on the idea from the previous patch. From here on,
in the steady state, PDEs are all pointing to the scratch page table (as
recommended in the spec). When an object is allocated in the VA range,
the code will determine if we need to allocate a page for the page
table. Similarly when the object is destroyed, we will remove, and free
the page table pointing the PDE back to the scratch page.
Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.
The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV. Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.
We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.
v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.
v3: Updated trace event to spit out a name
v4: Aliasing ppgtt is now initialized differently (in setup global gtt)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v4)
---
drivers/gpu/drm/i915/i915_debugfs.c | 19 +++++-
drivers/gpu/drm/i915/i915_drv.h | 7 ++
drivers/gpu/drm/i915/i915_gem_gtt.c | 124 +++++++++++++++++++++++++++++++++---
drivers/gpu/drm/i915/i915_trace.h | 116 +++++++++++++++++++++++++++++++++
4 files changed, 255 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 2833974..6ad937e 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2007,9 +2007,25 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
return 0;
}
+static size_t gen6_ppgtt_count_pt_pages(struct i915_hw_ppgtt *ppgtt)
+{
+ struct i915_pagedir *pd = &ppgtt->pd;
+ struct i915_pagetab **pt = &pd->page_tables[0];
+ size_t cnt = 0;
+ int i;
+
+ for (i = 0; i < ppgtt->num_pd_entries; i++) {
+ if (pt[i] != ppgtt->scratch_pt)
+ cnt++;
+ }
+
+ return cnt;
+}
+
static void print_ppgtt(struct seq_file *m, struct i915_hw_ppgtt *ppgtt)
{
seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
+ seq_printf(m, "\tpd pages: %zu\n", gen6_ppgtt_count_pt_pages(ppgtt));
}
static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int verbose)
@@ -2081,6 +2097,8 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool ver
seq_printf(m, "PP_DIR_BASE_READ: 0x%08x\n", I915_READ(RING_PP_DIR_BASE_READ(ring)));
seq_printf(m, "PP_DIR_DCLV: 0x%08x\n", I915_READ(RING_PP_DIR_DCLV(ring)));
}
+ seq_printf(m, "ECOCHK: 0x%08x\n\n", I915_READ(GAM_ECOCHK));
+
if (dev_priv->mm.aliasing_ppgtt) {
struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
@@ -2098,7 +2116,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool ver
idr_for_each(&file_priv->context_idr, per_file_ctx,
(void *)(unsigned long)m);
}
- seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
}
static int i915_ppgtt_info(struct seq_file *m, void *data)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c0fea18..7dcc08c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2588,6 +2588,13 @@ static inline bool i915_is_ggtt(struct i915_address_space *vm)
return vm == ggtt;
}
+static inline bool i915_is_aliasing_ppgtt(struct i915_address_space *vm)
+{
+ struct i915_address_space *appgtt =
+ &((struct drm_i915_private *)(vm)->dev->dev_private)->mm.aliasing_ppgtt->base;
+ return vm == appgtt;
+}
+
static inline struct i915_hw_ppgtt *
i915_vm_to_ppgtt(struct i915_address_space *vm)
{
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ab02dad..e221874 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -968,10 +968,47 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
static int gen6_alloc_va_range(struct i915_address_space *vm,
uint64_t start, uint64_t length)
{
+ DECLARE_BITMAP(new_page_tables, I915_PDES_PER_PD);
+ struct drm_device *dev = vm->dev;
+ struct drm_i915_private *dev_priv = dev->dev_private;
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
struct i915_pagetab *pt;
+ const uint32_t start_save = start, length_save = length;
uint32_t pde, temp;
+ int ret;
+
+ BUG_ON(upper_32_bits(start));
+
+ bitmap_zero(new_page_tables, I915_PDES_PER_PD);
+
+ /* The allocation is done in two stages so that we can bail out with
+ * minimal amount of pain. The first stage finds new page tables that
+ * need allocation. The second stage marks use ptes within the page
+ * tables.
+ */
+ gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+ if (pt != ppgtt->scratch_pt) {
+ WARN_ON(bitmap_empty(pt->used_ptes, GEN6_PTES_PER_PT));
+ continue;
+ }
+
+ /* We've already allocated a page table */
+ WARN_ON(!bitmap_empty(pt->used_ptes, GEN6_PTES_PER_PT));
+
+ pt = alloc_pt_single(dev);
+ if (IS_ERR(pt)) {
+ ret = PTR_ERR(pt);
+ goto unwind_out;
+ }
+
+ ppgtt->pd.page_tables[pde] = pt;
+ set_bit(pde, new_page_tables);
+ trace_i915_pagetable_alloc(vm, pde, start, GEN6_PDE_SHIFT);
+ }
+
+ start = start_save;
+ length = length_save;
gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
int j;
@@ -989,11 +1026,32 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
}
}
- bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap,
+ if (test_and_clear_bit(pde, new_page_tables))
+ gen6_map_single(&ppgtt->pd, pde, pt);
+
+ trace_i915_pagetable_map(vm, pde, pt,
+ gen6_pte_index(start),
+ gen6_pte_count(start, length),
+ GEN6_PTES_PER_PT);
+ bitmap_or(pt->used_ptes, tmp_bitmap, pt->used_ptes,
GEN6_PTES_PER_PT);
}
+ WARN_ON(!bitmap_empty(new_page_tables, I915_PDES_PER_PD));
+
+ /* Make sure write is complete before other code can use this page
+ * table. Also require for WC mapped PTEs */
+ readl(dev_priv->gtt.gsm);
+
return 0;
+
+unwind_out:
+ for_each_set_bit(pde, new_page_tables, I915_PDES_PER_PD) {
+ struct i915_pagetab *pt = ppgtt->pd.page_tables[pde];
+ ppgtt->pd.page_tables[pde] = NULL;
+ free_pt_single(pt, vm->dev);
+ }
+ return ret;
}
static void gen6_teardown_va_range(struct i915_address_space *vm,
@@ -1005,8 +1063,27 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
uint32_t pde, temp;
gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) {
+
+ if (WARN(pt == ppgtt->scratch_pt,
+ "Tried to teardown scratch page vm %p. pde %u: %llx-%llx\n",
+ vm, pde, start, start + length))
+ continue;
+
+ trace_i915_pagetable_unmap(vm, pde, pt,
+ gen6_pte_index(start),
+ gen6_pte_count(start, length),
+ GEN6_PTES_PER_PT);
+
bitmap_clear(pt->used_ptes, gen6_pte_index(start),
gen6_pte_count(start, length));
+
+ if (bitmap_empty(pt->used_ptes, GEN6_PTES_PER_PT)) {
+ trace_i915_pagetable_destroy(vm, pde,
+ start & GENMASK_ULL(64, GEN6_PDE_SHIFT),
+ GEN6_PDE_SHIFT);
+ gen6_map_single(&ppgtt->pd, pde, ppgtt->scratch_pt);
+ ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
+ }
}
}
@@ -1014,9 +1091,13 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
{
int i;
- for (i = 0; i < ppgtt->num_pd_entries; i++)
- free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+ for (i = 0; i < ppgtt->num_pd_entries; i++) {
+ struct i915_pagetab *pt = ppgtt->pd.page_tables[i];
+ if (pt != ppgtt->scratch_pt)
+ free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+ }
+ /* Consider putting this as part of pd free. */
free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
free_pd_single(&ppgtt->pd, ppgtt->base.dev);
}
@@ -1080,7 +1161,7 @@ err_out:
return ret;
}
-static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
+static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt, bool preallocate_pt)
{
int ret;
@@ -1088,9 +1169,13 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
if (ret)
return ret;
+ if (!preallocate_pt)
+ return 0;
+
ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
ppgtt->base.dev);
if (ret) {
+ free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
drm_mm_remove_node(&ppgtt->node);
return ret;
}
@@ -1098,8 +1183,17 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
return 0;
}
+static void gen6_scratch_va_range(struct i915_hw_ppgtt *ppgtt,
+ uint64_t start, uint64_t length)
+{
+ struct i915_pagetab *unused;
+ uint32_t pde, temp;
+
+ gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde)
+ ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt;
+}
-static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
{
struct drm_device *dev = ppgtt->base.dev;
struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1115,7 +1209,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
} else
BUG();
- ret = gen6_ppgtt_alloc(ppgtt);
+ ret = gen6_ppgtt_alloc(ppgtt, aliasing);
if (ret)
return ret;
@@ -1134,6 +1228,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
+ if (!aliasing)
+ gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total);
+
gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total);
DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
@@ -1146,7 +1243,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
return 0;
}
-static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
+static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt,
+ bool aliasing)
{
struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1154,7 +1252,7 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
ppgtt->base.scratch = dev_priv->gtt.base.scratch;
if (INTEL_INFO(dev)->gen < 8)
- return gen6_ppgtt_init(ppgtt);
+ return gen6_ppgtt_init(ppgtt, aliasing);
else if (IS_GEN8(dev) || IS_GEN9(dev))
return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
else
@@ -1165,7 +1263,7 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
struct drm_i915_private *dev_priv = dev->dev_private;
int ret = 0;
- ret = __hw_ppgtt_init(dev, ppgtt);
+ ret = __hw_ppgtt_init(dev, ppgtt, false);
if (ret == 0) {
kref_init(&ppgtt->ref);
drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
@@ -1271,6 +1369,8 @@ ppgtt_bind_vma(struct i915_vma *vma,
flags |= PTE_READ_ONLY;
if (vma->vm->allocate_va_range) {
+ trace_i915_va_alloc(vma->vm, vma->node.start, vma->node.size,
+ VM_TO_TRACE_NAME(vma->vm));
ret = vma->vm->allocate_va_range(vma->vm,
vma->node.start,
vma->node.size);
@@ -1292,6 +1392,10 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
vma->obj->base.size,
true);
if (vma->vm->teardown_va_range) {
+ trace_i915_va_teardown(vma->vm,
+ vma->node.start, vma->node.size,
+ VM_TO_TRACE_NAME(vma->vm));
+
vma->vm->teardown_va_range(vma->vm,
vma->node.start, vma->node.size);
ppgtt_invalidate_tlbs(vma->vm);
@@ -1823,7 +1927,7 @@ int i915_gem_setup_global_gtt(struct drm_device *dev,
if (!ppgtt)
return -ENOMEM;
- ret = __hw_ppgtt_init(dev, ppgtt);
+ ret = __hw_ppgtt_init(dev, ppgtt, true);
if (ret != 0)
return ret;
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index cbf5521..2d21c54 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -156,6 +156,122 @@ TRACE_EVENT(i915_vma_unbind,
__entry->obj, __entry->offset, __entry->size, __entry->vm)
);
+#define VM_TO_TRACE_NAME(vm) \
+ (i915_is_ggtt(vm) ? "GGTT" : \
+ i915_is_aliasing_ppgtt(vm) ? "Aliasing PPGTT" : \
+ "Private VM")
+
+DECLARE_EVENT_CLASS(i915_va,
+ TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+ TP_ARGS(vm, start, length, name),
+
+ TP_STRUCT__entry(
+ __field(struct i915_address_space *, vm)
+ __field(u64, start)
+ __field(u64, end)
+ __string(name, name)
+ ),
+
+ TP_fast_assign(
+ __entry->vm = vm;
+ __entry->start = start;
+ __entry->end = start + length;
+ __assign_str(name, name);
+ ),
+
+ TP_printk("vm=%p (%s), 0x%llx-0x%llx",
+ __entry->vm, __get_str(name), __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_va, i915_va_alloc,
+ TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+ TP_ARGS(vm, start, length, name)
+);
+
+DEFINE_EVENT(i915_va, i915_va_teardown,
+ TP_PROTO(struct i915_address_space *vm, u64 start, u64 length, const char *name),
+ TP_ARGS(vm, start, length, name)
+);
+
+DECLARE_EVENT_CLASS(i915_pagetable,
+ TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+ TP_ARGS(vm, pde, start, pde_shift),
+
+ TP_STRUCT__entry(
+ __field(struct i915_address_space *, vm)
+ __field(u32, pde)
+ __field(u64, start)
+ __field(u64, end)
+ ),
+
+ TP_fast_assign(
+ __entry->vm = vm;
+ __entry->pde = pde;
+ __entry->start = start;
+ __entry->end = (start + (1ULL << pde_shift)) & ~((1ULL << pde_shift)-1);
+ ),
+
+ TP_printk("vm=%p, pde=%d (0x%llx-0x%llx)",
+ __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT(i915_pagetable, i915_pagetable_alloc,
+ TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+ TP_ARGS(vm, pde, start, pde_shift)
+);
+
+DEFINE_EVENT(i915_pagetable, i915_pagetable_destroy,
+ TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
+ TP_ARGS(vm, pde, start, pde_shift)
+);
+
+/* Avoid extra math because we only support two sizes. The format is defined by
+ * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
+#define TRACE_PT_SIZE(bits) \
+ ((((bits) == 1024) ? 288 : 144) + 1)
+
+DECLARE_EVENT_CLASS(i915_pagetable_update,
+ TP_PROTO(struct i915_address_space *vm, u32 pde,
+ struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
+ TP_ARGS(vm, pde, pt, first, len, bits),
+
+ TP_STRUCT__entry(
+ __field(struct i915_address_space *, vm)
+ __field(u32, pde)
+ __field(u32, first)
+ __field(u32, last)
+ __dynamic_array(char, cur_ptes, TRACE_PT_SIZE(bits))
+ ),
+
+ TP_fast_assign(
+ __entry->vm = vm;
+ __entry->pde = pde;
+ __entry->first = first;
+ __entry->last = first + len;
+
+ bitmap_scnprintf(__get_str(cur_ptes),
+ TRACE_PT_SIZE(bits),
+ pt->used_ptes,
+ bits);
+ ),
+
+ TP_printk("vm=%p, pde=%d, updating %u:%u\t%s",
+ __entry->vm, __entry->pde, __entry->last, __entry->first,
+ __get_str(cur_ptes))
+);
+
+DEFINE_EVENT(i915_pagetable_update, i915_pagetable_map,
+ TP_PROTO(struct i915_address_space *vm, u32 pde,
+ struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
+ TP_ARGS(vm, pde, pt, first, len, bits)
+);
+
+DEFINE_EVENT(i915_pagetable_update, i915_pagetable_unmap,
+ TP_PROTO(struct i915_address_space *vm, u32 pde,
+ struct i915_pagetab *pt, u32 first, u32 len, size_t bits),
+ TP_ARGS(vm, pde, pt, first, len, bits)
+);
+
TRACE_EVENT(i915_gem_object_change_domain,
TP_PROTO(struct drm_i915_gem_object *obj, u32 old_read, u32 old_write),
TP_ARGS(obj, old_read, old_write),
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 29/38] drm/i915/bdw: Use dynamic allocation idioms on free
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (27 preceding siblings ...)
2014-10-07 17:11 ` [RFC 28/38] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 30/38] drm/i915/bdw: pagedirs rework allocation Michel Thierry
` (10 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
The page directory freer is left here for now as it's still useful given
that GEN8 still preallocates. Once the allocation functions are broken
up into more discrete chunks, we'll follow suit and destroy this
leftover piece.
v2: Match trace_i915_va_teardown params
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 46 ++++++++++++++++++++++++-------------
drivers/gpu/drm/i915/i915_gem_gtt.h | 26 +++++++++++++++++++++
2 files changed, 56 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e221874..8c3bb45 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -522,27 +522,41 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
}
}
-static void gen8_free_page_tables(struct i915_pagedir *pd, struct drm_device *dev)
+static void gen8_teardown_va_range(struct i915_address_space *vm,
+ uint64_t start, uint64_t length)
{
- int i;
-
- if (!pd->page)
- return;
-
- for (i = 0; i < I915_PDES_PER_PD; i++) {
- free_pt_single(pd->page_tables[i], dev);
- pd->page_tables[i] = NULL;
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+ struct i915_pagedir *pd;
+ struct i915_pagetab *pt;
+ uint64_t temp;
+ uint32_t pdpe, pde;
+
+ gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+ uint64_t pd_len = gen8_clamp_pd(start, length);
+ uint64_t pd_start = start;
+ gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+ free_pt_single(pt, vm->dev);
+ }
+ free_pd_single(pd, vm->dev);
}
}
-static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+/* This function will die soon */
+static void gen8_free_full_pagedir(struct i915_hw_ppgtt *ppgtt, int i)
{
- int i;
+ gen8_teardown_va_range(&ppgtt->base,
+ i << GEN8_PDPE_SHIFT,
+ (1 << GEN8_PDPE_SHIFT));
+}
- for (i = 0; i < ppgtt->num_pd_pages; i++) {
- gen8_free_page_tables(ppgtt->pdp.pagedirs[i], ppgtt->base.dev);
- free_pd_single(ppgtt->pdp.pagedirs[i], ppgtt->base.dev);
- }
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+{
+ trace_i915_va_teardown(&ppgtt->base,
+ ppgtt->base.start, ppgtt->base.total,
+ VM_TO_TRACE_NAME(&ppgtt->base));
+ gen8_teardown_va_range(&ppgtt->base,
+ ppgtt->base.start, ppgtt->base.total);
}
static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
@@ -568,7 +582,7 @@ static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
unwind_out:
while (i--)
- gen8_free_page_tables(ppgtt->pdp.pagedirs[i], ppgtt->base.dev);
+ gen8_free_full_pagedir(ppgtt, i);
return -ENOMEM;
}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 044ac67..9032fc4 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -397,6 +397,32 @@ static inline size_t gen6_pde_count(uint32_t addr, uint32_t length)
return i915_pde_count(addr, length, GEN6_PDE_SHIFT);
}
+#define gen8_for_each_pde(pt, pd, start, length, temp, iter) \
+ for (iter = gen8_pde_index(start), pt = (pd)->page_tables[iter]; \
+ length > 0 && iter < I915_PDES_PER_PD; \
+ pt = (pd)->page_tables[++iter], \
+ temp = ALIGN(start+1, 1 << GEN8_PDE_SHIFT) - start, \
+ temp = min(temp, length), \
+ start += temp, length -= temp)
+
+#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter) \
+ for (iter = gen8_pdpe_index(start), pd = (pdp)->pagedirs[iter]; \
+ length > 0 && iter < GEN8_LEGACY_PDPES; \
+ pd = (pdp)->pagedirs[++iter], \
+ temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start, \
+ temp = min(temp, length), \
+ start += temp, length -= temp)
+
+/* Clamp length to the next pagedir boundary */
+static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
+{
+ uint64_t next_pd = ALIGN(start + 1, 1 << GEN8_PDPE_SHIFT);
+ if (next_pd > (start + length))
+ return length;
+
+ return next_pd - start;
+}
+
static inline uint32_t gen8_pte_index(uint64_t address)
{
return i915_pte_index(address, GEN8_PDE_SHIFT);
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 30/38] drm/i915/bdw: pagedirs rework allocation
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (28 preceding siblings ...)
2014-10-07 17:11 ` [RFC 29/38] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 31/38] drm/i915/bdw: pagetable allocation rework Michel Thierry
` (9 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 43 ++++++++++++++++++++++++++-----------
1 file changed, 31 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 8c3bb45..6f9c79b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -537,8 +537,10 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
uint64_t pd_start = start;
gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
free_pt_single(pt, vm->dev);
+ pd->page_tables[pde] = NULL;
}
free_pd_single(pd, vm->dev);
+ ppgtt->pdp.pagedirs[pdpe] = NULL;
}
}
@@ -587,26 +589,40 @@ unwind_out:
return -ENOMEM;
}
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
- const int max_pdp)
+static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
+ uint64_t start,
+ uint64_t length)
{
- int i;
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(pdp, struct i915_hw_ppgtt, pdp);
+ struct i915_pagedir *unused;
+ uint64_t temp;
+ uint32_t pdpe;
- for (i = 0; i < max_pdp; i++) {
- ppgtt->pdp.pagedirs[i] = alloc_pd_single(ppgtt->base.dev);
- if (IS_ERR(ppgtt->pdp.pagedirs[i]))
+ /* FIXME: PPGTT container_of won't work for 64b */
+ BUG_ON((start + length) > 0x800000000ULL);
+
+ gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
+ BUG_ON(unused);
+ pdp->pagedirs[pdpe] = alloc_pd_single(ppgtt->base.dev);
+ if (IS_ERR(ppgtt->pdp.pagedirs[pdpe]))
goto unwind_out;
+
+ ppgtt->num_pd_pages++;
}
- ppgtt->num_pd_pages = max_pdp;
BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
return 0;
unwind_out:
- while (i--)
- free_pd_single(ppgtt->pdp.pagedirs[i],
+ while (pdpe--) {
+ free_pd_single(ppgtt->pdp.pagedirs[pdpe],
ppgtt->base.dev);
+ ppgtt->num_pd_pages--;
+ }
+
+ WARN_ON(ppgtt->num_pd_pages);
return -ENOMEM;
}
@@ -616,7 +632,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
{
int ret;
- ret = gen8_ppgtt_allocate_page_directories(ppgtt, max_pdp);
+ ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, ppgtt->base.start,
+ ppgtt->base.total);
if (ret)
return ret;
@@ -653,6 +670,10 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
if (size % (1<<30))
DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
+ ppgtt->base.start = 0;
+ ppgtt->base.total = size;
+ BUG_ON(ppgtt->base.total == 0);
+
/* 1. Do all our allocations for page directories and page tables. */
ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
if (ret)
@@ -685,8 +706,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
ppgtt->base.clear_range = gen8_ppgtt_clear_range;
ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
ppgtt->base.cleanup = gen8_ppgtt_cleanup;
- ppgtt->base.start = 0;
- ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PT * PAGE_SIZE;
DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 31/38] drm/i915/bdw: pagetable allocation rework
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (29 preceding siblings ...)
2014-10-07 17:11 ` [RFC 30/38] drm/i915/bdw: pagedirs rework allocation Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 32/38] drm/i915/bdw: Make the pdp switch a bit less hacky Michel Thierry
` (8 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 54 ++++++++++++++++++++-----------------
drivers/gpu/drm/i915/i915_gem_gtt.h | 10 +++++++
2 files changed, 39 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 6f9c79b..f30d299 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -544,14 +544,6 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
}
}
-/* This function will die soon */
-static void gen8_free_full_pagedir(struct i915_hw_ppgtt *ppgtt, int i)
-{
- gen8_teardown_va_range(&ppgtt->base,
- i << GEN8_PDPE_SHIFT,
- (1 << GEN8_PDPE_SHIFT));
-}
-
static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
{
trace_i915_va_teardown(&ppgtt->base,
@@ -569,22 +561,27 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
gen8_ppgtt_free(ppgtt);
}
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_alloc_pagetabs(struct i915_pagedir *pd,
+ uint64_t start,
+ uint64_t length,
+ struct drm_device *dev)
{
- int i, ret;
+ struct i915_pagetab *unused;
+ uint64_t temp;
+ uint32_t pde;
- for (i = 0; i < ppgtt->num_pd_pages; i++) {
- ret = alloc_pt_range(ppgtt->pdp.pagedirs[i],
- 0, I915_PDES_PER_PD, ppgtt->base.dev);
- if (ret)
+ gen8_for_each_pde(unused, pd, start, length, temp, pde) {
+ BUG_ON(unused);
+ pd->page_tables[pde] = alloc_pt_single(dev);
+ if (IS_ERR(pd->page_tables[pde]))
goto unwind_out;
}
return 0;
unwind_out:
- while (i--)
- gen8_free_full_pagedir(ppgtt, i);
+ while (pde--)
+ free_pt_single(pd->page_tables[pde], dev);
return -ENOMEM;
}
@@ -628,20 +625,28 @@ unwind_out:
}
static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
- const int max_pdp)
+ uint64_t start,
+ uint64_t length)
{
+ struct i915_pagedir *pd;
+ uint64_t temp;
+ uint32_t pdpe;
int ret;
- ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, ppgtt->base.start,
- ppgtt->base.total);
+ ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length);
if (ret)
return ret;
- ret = gen8_ppgtt_allocate_page_tables(ppgtt);
- if (ret)
- goto err_out;
+ gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+ ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
+ ppgtt->base.dev);
+ if (ret)
+ goto err_out;
+
+ ppgtt->num_pd_entries += I915_PDES_PER_PD;
+ }
- ppgtt->num_pd_entries = max_pdp * I915_PDES_PER_PD;
+ BUG_ON(pdpe > ppgtt->num_pd_pages);
return 0;
@@ -672,10 +677,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
ppgtt->base.start = 0;
ppgtt->base.total = size;
- BUG_ON(ppgtt->base.total == 0);
/* 1. Do all our allocations for page directories and page tables. */
- ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
+ ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
if (ret)
return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 9032fc4..4fca5bc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -413,6 +413,16 @@ static inline size_t gen6_pde_count(uint32_t addr, uint32_t length)
temp = min(temp, length), \
start += temp, length -= temp)
+/* Clamp length to the next pagetab boundary */
+static inline uint64_t gen8_clamp_pt(uint64_t start, uint64_t length)
+{
+ uint64_t next_pt = ALIGN(start + 1, 1 << GEN8_PDE_SHIFT);
+ if (next_pt > (start + length))
+ return length;
+
+ return next_pt - start;
+}
+
/* Clamp length to the next pagedir boundary */
static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
{
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 32/38] drm/i915/bdw: Make the pdp switch a bit less hacky
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (30 preceding siblings ...)
2014-10-07 17:11 ` [RFC 31/38] drm/i915/bdw: pagetable allocation rework Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 33/38] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
` (7 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
One important part of this patch is we now write a scratch page
directory into any unused PDP descriptors. This matters for 2 reasons,
first, it's not clear we're allowed to just use 0, or an invalid
pointer, and second, we must wipe out any previous contents from the last
context.
The latter point only matters with full PPGTT. The former point would
only effect 32b platforms, or platforms with less than 4GB memory.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 29 ++++++++++++++++++-----------
drivers/gpu/drm/i915/i915_gem_gtt.h | 5 ++++-
2 files changed, 22 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index f30d299..a267418 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -389,8 +389,9 @@ static struct i915_pagedir *alloc_pd_single(struct drm_device *dev)
}
/* Broadwell Page Directory Pointer Descriptors */
-static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
- uint64_t val)
+static int gen8_write_pdp(struct intel_engine_cs *ring,
+ unsigned entry,
+ dma_addr_t addr)
{
int ret;
@@ -402,10 +403,10 @@ static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry,
intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
intel_ring_emit(ring, GEN8_RING_PDP_UDW(ring, entry));
- intel_ring_emit(ring, (u32)(val >> 32));
+ intel_ring_emit(ring, upper_32_bits(addr));
intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
intel_ring_emit(ring, GEN8_RING_PDP_LDW(ring, entry));
- intel_ring_emit(ring, (u32)(val));
+ intel_ring_emit(ring, lower_32_bits(addr));
intel_ring_advance(ring);
return 0;
@@ -416,12 +417,12 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
{
int i, ret;
- /* bit of a hack to find the actual last used pd */
- int used_pd = ppgtt->num_pd_entries / I915_PDES_PER_PD;
-
- for (i = used_pd - 1; i >= 0; i--) {
- dma_addr_t addr = ppgtt->pdp.pagedirs[i]->daddr;
- ret = gen8_write_pdp(ring, i, addr);
+ for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
+ struct i915_pagedir *pd = ppgtt->pdp.pagedirs[i];
+ dma_addr_t pd_daddr = pd ? pd->daddr : ppgtt->scratch_pd->daddr;
+ /* The page directory might be NULL, but we need to clear out
+ * whatever the previous context might have used. */
+ ret = gen8_write_pdp(ring, i, pd_daddr);
if (ret)
return ret;
}
@@ -678,10 +679,16 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
ppgtt->base.start = 0;
ppgtt->base.total = size;
+ ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
+ if (IS_ERR(ppgtt->scratch_pd))
+ return PTR_ERR(ppgtt->scratch_pd);
+
/* 1. Do all our allocations for page directories and page tables. */
ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
- if (ret)
+ if (ret) {
+ free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
return ret;
+ }
/*
* 2. Map all the page directory entires to point to the page tables
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 4fca5bc..e3a761a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -270,7 +270,10 @@ struct i915_hw_ppgtt {
struct i915_pagedir pd;
};
- struct i915_pagetab *scratch_pt;
+ union {
+ struct i915_pagetab *scratch_pt;
+ struct i915_pagetab *scratch_pd; /* Just need the daddr */
+ };
struct drm_i915_file_private *file_priv;
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 33/38] drm/i915: num_pd_pages/num_pd_entries isn't useful
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (31 preceding siblings ...)
2014-10-07 17:11 ` [RFC 32/38] drm/i915/bdw: Make the pdp switch a bit less hacky Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 34/38] drm/i915: Extract PPGTT param from pagedir alloc Michel Thierry
` (6 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
These values are never quite useful for dynamic allocations of the page
tables. Getting rid of them will help prevent later confusion.
TODO: this probably needs to be earlier in the series
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_debugfs.c | 11 ++++-----
drivers/gpu/drm/i915/i915_gem_gtt.c | 45 ++++++++++---------------------------
drivers/gpu/drm/i915/i915_gem_gtt.h | 7 ++++--
3 files changed, 21 insertions(+), 42 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 6ad937e..96c1014 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2009,13 +2009,12 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
static size_t gen6_ppgtt_count_pt_pages(struct i915_hw_ppgtt *ppgtt)
{
- struct i915_pagedir *pd = &ppgtt->pd;
- struct i915_pagetab **pt = &pd->page_tables[0];
+ struct i915_pagetab *pt;
size_t cnt = 0;
- int i;
+ uint32_t useless;
- for (i = 0; i < ppgtt->num_pd_entries; i++) {
- if (pt[i] != ppgtt->scratch_pt)
+ gen6_for_all_pdes(pt, ppgtt, useless) {
+ if (pt != ppgtt->scratch_pt)
cnt++;
}
@@ -2038,8 +2037,6 @@ static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int verb
if (!ppgtt)
return;
- seq_printf(m, "Page directories: %d\n", ppgtt->num_pd_pages);
- seq_printf(m, "Page tables: %d\n", ppgtt->num_pd_entries);
for_each_ring(ring, dev_priv, unused) {
seq_printf(m, "%s\n", ring->name);
for (i = 0; i < 4; i++) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a267418..ed27c29 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -605,22 +605,14 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
pdp->pagedirs[pdpe] = alloc_pd_single(ppgtt->base.dev);
if (IS_ERR(ppgtt->pdp.pagedirs[pdpe]))
goto unwind_out;
-
- ppgtt->num_pd_pages++;
}
- BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
-
return 0;
unwind_out:
- while (pdpe--) {
+ while (pdpe--)
free_pd_single(ppgtt->pdp.pagedirs[pdpe],
ppgtt->base.dev);
- ppgtt->num_pd_pages--;
- }
-
- WARN_ON(ppgtt->num_pd_pages);
return -ENOMEM;
}
@@ -643,12 +635,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
ppgtt->base.dev);
if (ret)
goto err_out;
-
- ppgtt->num_pd_entries += I915_PDES_PER_PD;
}
- BUG_ON(pdpe > ppgtt->num_pd_pages);
-
return 0;
/* TODO: Check this for all cases */
@@ -670,7 +658,6 @@ err_out:
static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
{
const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
- const int min_pt_pages = I915_PDES_PER_PD * max_pdp;
int i, j, ret;
if (size % (1<<30))
@@ -718,27 +705,21 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
ppgtt->base.cleanup = gen8_ppgtt_cleanup;
- DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d wasted)\n",
- ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
- DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
- ppgtt->num_pd_entries,
- (ppgtt->num_pd_entries - min_pt_pages) + size % (1<<30));
return 0;
}
static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
{
struct i915_address_space *vm = &ppgtt->base;
+ struct i915_pagetab *unused;
gen6_gtt_pte_t scratch_pte;
uint32_t pd_entry;
- int pte, pde;
+ uint32_t pte, pde, temp;
+ uint32_t start = ppgtt->base.start, length = ppgtt->base.total;
scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true, 0);
- seq_printf(m, " VM %p (pd_offset %x-%x):\n", vm,
- ppgtt->pd.pd_offset,
- ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
- for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
+ gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
u32 expected;
gen6_gtt_pte_t *pt_vaddr;
dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
@@ -1133,12 +1114,12 @@ static void gen6_teardown_va_range(struct i915_address_space *vm,
static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
{
- int i;
+ struct i915_pagetab *pt;
+ uint32_t pde;
- for (i = 0; i < ppgtt->num_pd_entries; i++) {
- struct i915_pagetab *pt = ppgtt->pd.page_tables[i];
+ gen6_for_all_pdes(pt, ppgtt, pde) {
if (pt != ppgtt->scratch_pt)
- free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev);
+ free_pt_single(pt, ppgtt->base.dev);
}
/* Consider putting this as part of pd free. */
@@ -1197,7 +1178,6 @@ alloc:
if (ppgtt->node.start < dev_priv->gtt.mappable_end)
DRM_DEBUG("Forced to use aperture for PDEs\n");
- ppgtt->num_pd_entries = I915_PDES_PER_PD;
return 0;
err_out:
@@ -1216,8 +1196,7 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt, bool preallocate_pt)
if (!preallocate_pt)
return 0;
- ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries,
- ppgtt->base.dev);
+ ret = alloc_pt_range(&ppgtt->pd, 0, I915_PDES_PER_PD, ppgtt->base.dev);
if (ret) {
free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev);
drm_mm_remove_node(&ppgtt->node);
@@ -1263,7 +1242,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
ppgtt->base.cleanup = gen6_ppgtt_cleanup;
ppgtt->base.start = 0;
- ppgtt->base.total = ppgtt->num_pd_entries * GEN6_PTES_PER_PT * PAGE_SIZE;
+ ppgtt->base.total = I915_PDES_PER_PD * GEN6_PTES_PER_PT * PAGE_SIZE;
ppgtt->debug_dump = gen6_dump_ppgtt;
ppgtt->pd.pd_offset =
@@ -1599,7 +1578,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
if (i915_is_ggtt(vm))
ppgtt = dev_priv->mm.aliasing_ppgtt;
- gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
+ gen6_map_page_range(dev_priv, &ppgtt->pd, 0, I915_PDES_PER_PD);
}
i915_ggtt_flush(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index e3a761a..da27cc4 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -263,8 +263,6 @@ struct i915_hw_ppgtt {
struct i915_address_space base;
struct kref ref;
struct drm_mm_node node;
- unsigned num_pd_entries;
- unsigned num_pd_pages; /* gen8+ */
union {
struct i915_pagedirpo pdp;
struct i915_pagedir pd;
@@ -330,6 +328,11 @@ struct i915_gtt {
temp = min(temp, (unsigned)length), \
start += temp, length -= temp)
+#define gen6_for_all_pdes(pt, ppgtt, iter) \
+ for (iter = 0, pt = ppgtt->pd.page_tables[iter]; \
+ iter < gen6_pde_index(ppgtt->base.total); \
+ pt = ppgtt->pd.page_tables[++iter])
+
static inline uint32_t i915_pte_index(uint64_t address, uint32_t pde_shift)
{
const uint32_t mask = NUM_PTE(pde_shift) - 1;
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 34/38] drm/i915: Extract PPGTT param from pagedir alloc
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (32 preceding siblings ...)
2014-10-07 17:11 ` [RFC 33/38] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 35/38] drm/i915/bdw: Split out mappings Michel Thierry
` (5 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
Now that we don't need to trace num_pd_pages, we may as well kill all
need for the PPGTT structure in the alloc_pagedirs. This is very useful
for when we move to 48b addressing, and the PDP isn't the root of the
page table structure.
The param is replaced with drm_device, which is an unavoidable wart
throughout the series. (in other words, not extra flagrant).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ed27c29..c6e2242 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -589,10 +589,9 @@ unwind_out:
static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
uint64_t start,
- uint64_t length)
+ uint64_t length,
+ struct drm_device *dev)
{
- struct i915_hw_ppgtt *ppgtt =
- container_of(pdp, struct i915_hw_ppgtt, pdp);
struct i915_pagedir *unused;
uint64_t temp;
uint32_t pdpe;
@@ -602,8 +601,8 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
BUG_ON(unused);
- pdp->pagedirs[pdpe] = alloc_pd_single(ppgtt->base.dev);
- if (IS_ERR(ppgtt->pdp.pagedirs[pdpe]))
+ pdp->pagedirs[pdpe] = alloc_pd_single(dev);
+ if (IS_ERR(pdp->pagedirs[pdpe]))
goto unwind_out;
}
@@ -611,8 +610,7 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
unwind_out:
while (pdpe--)
- free_pd_single(ppgtt->pdp.pagedirs[pdpe],
- ppgtt->base.dev);
+ free_pd_single(pdp->pagedirs[pdpe], dev);
return -ENOMEM;
}
@@ -626,7 +624,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
uint32_t pdpe;
int ret;
- ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length);
+ ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length,
+ ppgtt->base.dev);
if (ret)
return ret;
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 35/38] drm/i915/bdw: Split out mappings
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (33 preceding siblings ...)
2014-10-07 17:11 ` [RFC 34/38] drm/i915: Extract PPGTT param from pagedir alloc Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 36/38] drm/i915/bdw: begin bitmap tracking Michel Thierry
` (4 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
When we do dynamic page table allocations for gen8, we'll need to have
more control over how and when we map page tables, similar to gen6.
This patch adds the functionality and calls it at init, which should
have no functional change.
The PDPEs are still a special case for now. We'll need a function for
that in the future as well.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 95 ++++++++++++++++++++-----------------
1 file changed, 51 insertions(+), 44 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index c6e2242..9403c60 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -523,6 +523,36 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
}
}
+static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
+ struct i915_pagetab *pt,
+ struct drm_device *dev)
+{
+ gen8_ppgtt_pde_t entry =
+ gen8_pde_encode(dev, pt->daddr, I915_CACHE_LLC);
+ *pde = entry;
+}
+
+/* It's likely we'll map more than one pagetable at a time. This function will
+ * save us unnecessary kmap calls, but do no more functionally than multiple
+ * calls to map_pt. */
+static void gen8_map_pagetable_range(struct i915_pagedir *pd,
+ uint64_t start,
+ uint64_t length,
+ struct drm_device *dev)
+{
+ gen8_ppgtt_pde_t *pagedir = kmap_atomic(pd->page);
+ struct i915_pagetab *pt;
+ uint64_t temp, pde;
+
+ gen8_for_each_pde(pt, pd, start, length, temp, pde)
+ __gen8_do_map_pt(pagedir + pde, pt, dev);
+
+ if (!HAS_LLC(dev))
+ drm_clflush_virt_range(pagedir, PAGE_SIZE);
+
+ kunmap_atomic(pagedir);
+}
+
static void gen8_teardown_va_range(struct i915_address_space *vm,
uint64_t start, uint64_t length)
{
@@ -547,9 +577,6 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
{
- trace_i915_va_teardown(&ppgtt->base,
- ppgtt->base.start, ppgtt->base.total,
- VM_TO_TRACE_NAME(&ppgtt->base));
gen8_teardown_va_range(&ppgtt->base,
ppgtt->base.start, ppgtt->base.total);
}
@@ -615,11 +642,14 @@ unwind_out:
return -ENOMEM;
}
-static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
- uint64_t start,
- uint64_t length)
+static int gen8_alloc_va_range(struct i915_address_space *vm,
+ uint64_t start,
+ uint64_t length)
{
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
struct i915_pagedir *pd;
+ const uint64_t orig_start = start;
uint64_t temp;
uint32_t pdpe;
int ret;
@@ -638,9 +668,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
return 0;
- /* TODO: Check this for all cases */
err_out:
- gen8_ppgtt_free(ppgtt);
+ gen8_teardown_va_range(vm, orig_start, start);
return ret;
}
@@ -650,59 +679,37 @@ err_out:
* PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
* space.
*
- * FIXME: split allocation into smaller pieces. For now we only ever do this
- * once, but with full PPGTT, the multiple contiguous allocations will be bad.
- * TODO: Do something with the size parameter
*/
static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
{
- const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
- int i, j, ret;
-
- if (size % (1<<30))
- DRM_INFO("Pages will be wasted unless GTT size (%llu) is divisible by 1GB\n", size);
+ struct i915_pagedir *pd;
+ uint64_t temp, start = 0;
+ const uint64_t orig_length = size;
+ uint32_t pdpe;
+ int ret;
ppgtt->base.start = 0;
ppgtt->base.total = size;
+ ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+ ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+ ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+ ppgtt->switch_mm = gen8_mm_switch;
ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
if (IS_ERR(ppgtt->scratch_pd))
return PTR_ERR(ppgtt->scratch_pd);
- /* 1. Do all our allocations for page directories and page tables. */
- ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
+ ret = gen8_alloc_va_range(&ppgtt->base, start, size);
if (ret) {
free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
return ret;
}
- /*
- * 2. Map all the page directory entires to point to the page tables
- * we've allocated.
- *
- * For now, the PPGTT helper functions all require that the PDEs are
- * plugged in correctly. So we do that now/here. For aliasing PPGTT, we
- * will never need to touch the PDEs again.
- */
- for (i = 0; i < max_pdp; i++) {
- struct i915_pagedir *pd = ppgtt->pdp.pagedirs[i];
- gen8_ppgtt_pde_t *pd_vaddr;
- pd_vaddr = kmap_atomic(ppgtt->pdp.pagedirs[i]->page);
- for (j = 0; j < I915_PDES_PER_PD; j++) {
- struct i915_pagetab *pt = pd->page_tables[j];
- dma_addr_t addr = pt->daddr;
- pd_vaddr[j] = gen8_pde_encode(ppgtt->base.dev, addr,
- I915_CACHE_LLC);
- }
- if (!HAS_LLC(ppgtt->base.dev))
- drm_clflush_virt_range(pd_vaddr, PAGE_SIZE);
- kunmap_atomic(pd_vaddr);
- }
+ start = 0;
+ size = orig_length;
- ppgtt->switch_mm = gen8_mm_switch;
- ppgtt->base.clear_range = gen8_ppgtt_clear_range;
- ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
- ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+ gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
+ gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
return 0;
}
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 36/38] drm/i915/bdw: begin bitmap tracking
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (34 preceding siblings ...)
2014-10-07 17:11 ` [RFC 35/38] drm/i915/bdw: Split out mappings Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 37/38] drm/i915/bdw: Dynamic page table allocations Michel Thierry
` (3 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
Like with gen6/7, we can enable bitmap tracking with all the
preallocations to make sure things actually don't blow up.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 101 +++++++++++++++++++++++++++++++-----
drivers/gpu/drm/i915/i915_gem_gtt.h | 12 +++++
2 files changed, 99 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 9403c60..6703721 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -352,8 +352,12 @@ err_out:
static void __free_pd_single(struct i915_pagedir *pd, struct drm_device *dev)
{
+ WARN(!bitmap_empty(pd->used_pdes, I915_PDES_PER_PD),
+ "Free page directory with %d used pages\n",
+ bitmap_weight(pd->used_pdes, I915_PDES_PER_PD));
i915_dma_unmap_single(pd, dev);
__free_page(pd->page);
+ kfree(pd->used_pdes);
kfree(pd);
}
@@ -366,26 +370,35 @@ static void __free_pd_single(struct i915_pagedir *pd, struct drm_device *dev)
static struct i915_pagedir *alloc_pd_single(struct drm_device *dev)
{
struct i915_pagedir *pd;
- int ret;
+ int ret = -ENOMEM;
pd = kzalloc(sizeof(*pd), GFP_KERNEL);
if (!pd)
return ERR_PTR(-ENOMEM);
+ pd->used_pdes = kcalloc(BITS_TO_LONGS(I915_PDES_PER_PD),
+ sizeof(*pd->used_pdes), GFP_KERNEL);
+ if (!pd->used_pdes)
+ goto free_pd;
+
pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
- if (!pd->page) {
- kfree(pd);
- return ERR_PTR(-ENOMEM);
- }
+ if (!pd->page)
+ goto free_bitmap;
ret = i915_dma_map_px_single(pd, dev);
- if (ret) {
- __free_page(pd->page);
- kfree(pd);
- return ERR_PTR(ret);
- }
+ if (ret)
+ goto free_page;
return pd;
+
+free_page:
+ __free_page(pd->page);
+free_bitmap:
+ kfree(pd->used_pdes);
+free_pd:
+ kfree(pd);
+
+ return ERR_PTR(ret);
}
/* Broadwell Page Directory Pointer Descriptors */
@@ -566,12 +579,48 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
uint64_t pd_len = gen8_clamp_pd(start, length);
uint64_t pd_start = start;
+
+ /* Page directories might not be present since the macro rounds
+ * down, and up.
+ */
+ if (!pd) {
+ WARN(test_bit(pdpe, ppgtt->pdp.used_pdpes),
+ "PDPE %d is not allocated, but is reserved (%p)\n",
+ pdpe, vm);
+ continue;
+ } else {
+ WARN(!test_bit(pdpe, ppgtt->pdp.used_pdpes),
+ "PDPE %d not reserved, but is allocated (%p)",
+ pdpe, vm);
+ }
+
gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
- free_pt_single(pt, vm->dev);
- pd->page_tables[pde] = NULL;
+ if (!pt) {
+ WARN(test_bit(pde, pd->used_pdes),
+ "PDE %d is not allocated, but is reserved (%p)\n",
+ pde, vm);
+ continue;
+ } else
+ WARN(!test_bit(pde, pd->used_pdes),
+ "PDE %d not reserved, but is allocated (%p)",
+ pde, vm);
+
+ bitmap_clear(pt->used_ptes,
+ gen8_pte_index(pd_start),
+ gen8_pte_count(pd_start, pd_len));
+
+ if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PT)) {
+ free_pt_single(pt, vm->dev);
+ pd->page_tables[pde] = NULL;
+ WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
+ }
+ }
+
+ if (bitmap_empty(pd->used_pdes, I915_PDES_PER_PD)) {
+ free_pd_single(pd, vm->dev);
+ ppgtt->pdp.pagedirs[pdpe] = NULL;
+ WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
}
- free_pd_single(pd, vm->dev);
- ppgtt->pdp.pagedirs[pdpe] = NULL;
}
}
@@ -614,6 +663,7 @@ unwind_out:
return -ENOMEM;
}
+/* bitmap of new pagedirs */
static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
uint64_t start,
uint64_t length,
@@ -629,6 +679,7 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
BUG_ON(unused);
pdp->pagedirs[pdpe] = alloc_pd_single(dev);
+
if (IS_ERR(pdp->pagedirs[pdpe]))
goto unwind_out;
}
@@ -650,10 +701,12 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
container_of(vm, struct i915_hw_ppgtt, base);
struct i915_pagedir *pd;
const uint64_t orig_start = start;
+ const uint64_t orig_length = length;
uint64_t temp;
uint32_t pdpe;
int ret;
+ /* Do the allocations first so we can easily bail out */
ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length,
ppgtt->base.dev);
if (ret)
@@ -666,6 +719,26 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
goto err_out;
}
+ /* Now mark everything we've touched as used. This doesn't allow for
+ * robust error checking, but it makes the code a hell of a lot simpler.
+ */
+ start = orig_start;
+ length = orig_length;
+
+ gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+ struct i915_pagetab *pt;
+ uint64_t pd_len = gen8_clamp_pd(start, length);
+ uint64_t pd_start = start;
+ uint32_t pde;
+ gen8_for_each_pde(pt, &ppgtt->pd, pd_start, pd_len, temp, pde) {
+ bitmap_set(pd->page_tables[pde]->used_ptes,
+ gen8_pte_index(start),
+ gen8_pte_count(start, length));
+ set_bit(pde, pd->used_pdes);
+ }
+ set_bit(pdpe, ppgtt->pdp.used_pdpes);
+ }
+
return 0;
err_out:
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index da27cc4..120f213 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -193,11 +193,13 @@ struct i915_pagedir {
dma_addr_t daddr;
};
+ unsigned long *used_pdes;
struct i915_pagetab *page_tables[I915_PDES_PER_PD];
};
struct i915_pagedirpo {
/* struct page *page; */
+ DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
struct i915_pagedir *pagedirs[GEN8_LEGACY_PDPES];
};
@@ -459,6 +461,16 @@ static inline uint32_t gen8_pml4e_index(uint64_t address)
BUG();
}
+static inline size_t gen8_pte_count(uint64_t addr, uint64_t length)
+{
+ return i915_pte_count(addr, length, GEN8_PDE_SHIFT);
+}
+
+static inline size_t gen8_pde_count(uint64_t addr, uint64_t length)
+{
+ return i915_pde_count(addr, length, GEN8_PDE_SHIFT);
+}
+
int i915_gem_gtt_init(struct drm_device *dev);
void i915_gem_init_global_gtt(struct drm_device *dev);
int i915_gem_setup_global_gtt(struct drm_device *dev, unsigned long start,
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 37/38] drm/i915/bdw: Dynamic page table allocations
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (35 preceding siblings ...)
2014-10-07 17:11 ` [RFC 36/38] drm/i915/bdw: begin bitmap tracking Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-07 17:11 ` [RFC 38/38] drm/i915/bdw: Dynamic page table allocations in lrc mode Michel Thierry
` (2 subsequent siblings)
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
From: Ben Widawsky <benjamin.widawsky@intel.com>
This finishes off the dynamic page tables allocations, in the legacy 3
level style that already exists. Most everything has already been setup
to this point, the patch finishes off the enabling by setting the
appropriate function pointers.
Zombie tracking:
This could be a separate patch, but I found it helpful for debugging.
Since we write page tables asynchronously with respect to the GPU using
them, we can't actually free the page tables until we know the GPU won't
use them. With this patch, that is always when the context dies. It
would be possible to write a reaper to go through zombies and clean them
up when under memory pressure. That exercise is left for the reader.
Scratch unused pages:
The object pages can get freed even if a page table still points to
them. Like the zombie fix, we need to make sure we don't let our GPU
access arbitrary memory when we've unmapped things.
v2: Update aliasing/true ppgtt allocate/teardown/clear functions for
gen 6 & 7.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 379 +++++++++++++++++++++++++++++-------
drivers/gpu/drm/i915/i915_gem_gtt.h | 16 +-
2 files changed, 328 insertions(+), 67 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 6703721..117b88a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -536,7 +536,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
}
}
-static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
+static void __gen8_do_map_pt(gen8_ppgtt_pde_t * const pde,
struct i915_pagetab *pt,
struct drm_device *dev)
{
@@ -553,7 +553,7 @@ static void gen8_map_pagetable_range(struct i915_pagedir *pd,
uint64_t length,
struct drm_device *dev)
{
- gen8_ppgtt_pde_t *pagedir = kmap_atomic(pd->page);
+ gen8_ppgtt_pde_t * const pagedir = kmap_atomic(pd->page);
struct i915_pagetab *pt;
uint64_t temp, pde;
@@ -566,8 +566,9 @@ static void gen8_map_pagetable_range(struct i915_pagedir *pd,
kunmap_atomic(pagedir);
}
-static void gen8_teardown_va_range(struct i915_address_space *vm,
- uint64_t start, uint64_t length)
+static void __gen8_teardown_va_range(struct i915_address_space *vm,
+ uint64_t start, uint64_t length,
+ bool dead)
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
@@ -589,6 +590,13 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
pdpe, vm);
continue;
} else {
+ if (dead && pd->zombie) {
+ WARN_ON(test_bit(pdpe, ppgtt->pdp.used_pdpes));
+ free_pd_single(pd, vm->dev);
+ ppgtt->pdp.pagedirs[pdpe] = NULL;
+ continue;
+ }
+
WARN(!test_bit(pdpe, ppgtt->pdp.used_pdpes),
"PDPE %d not reserved, but is allocated (%p)",
pdpe, vm);
@@ -600,34 +608,65 @@ static void gen8_teardown_va_range(struct i915_address_space *vm,
"PDE %d is not allocated, but is reserved (%p)\n",
pde, vm);
continue;
- } else
+ } else {
+ if (dead && pt->zombie) {
+ WARN_ON(test_bit(pde, pd->used_pdes));
+ free_pt_single(pt, vm->dev);
+ pd->page_tables[pde] = NULL;
+ continue;
+ }
WARN(!test_bit(pde, pd->used_pdes),
"PDE %d not reserved, but is allocated (%p)",
pde, vm);
+ }
bitmap_clear(pt->used_ptes,
gen8_pte_index(pd_start),
gen8_pte_count(pd_start, pd_len));
+
if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PT)) {
+ WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
+ if (!dead) {
+ pt->zombie = 1;
+ continue;
+ }
free_pt_single(pt, vm->dev);
pd->page_tables[pde] = NULL;
- WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
+
}
}
+ gen8_ppgtt_clear_range(vm, pd_start, pd_len, true);
+
if (bitmap_empty(pd->used_pdes, I915_PDES_PER_PD)) {
+ WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
+ if (!dead) {
+ /* We've unmapped a possibly live context. Make
+ * note of it so we can clean it up later. */
+ pd->zombie = 1;
+ continue;
+ }
free_pd_single(pd, vm->dev);
ppgtt->pdp.pagedirs[pdpe] = NULL;
- WARN_ON(!test_and_clear_bit(pdpe, ppgtt->pdp.used_pdpes));
}
}
}
+static void gen8_teardown_va_range(struct i915_address_space *vm,
+ uint64_t start, uint64_t length)
+{
+ __gen8_teardown_va_range(vm, start, length, false);
+}
+
static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
{
- gen8_teardown_va_range(&ppgtt->base,
- ppgtt->base.start, ppgtt->base.total);
+ trace_i915_va_teardown(&ppgtt->base,
+ ppgtt->base.start, ppgtt->base.total,
+ VM_TO_TRACE_NAME(&ppgtt->base));
+ __gen8_teardown_va_range(&ppgtt->base,
+ ppgtt->base.start, ppgtt->base.total,
+ true);
}
static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
@@ -638,58 +677,167 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
gen8_ppgtt_free(ppgtt);
}
-static int gen8_ppgtt_alloc_pagetabs(struct i915_pagedir *pd,
+/**
+ * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
+ * @ppgtt: Master ppgtt structure.
+ * @pd: Page directory for this address range.
+ * @start: Starting virtual address to begin allocations.
+ * @length Size of the allocations.
+ * @new_pts: Bitmap set by function with new allocations. Likely used by the
+ * caller to free on error.
+ *
+ * Allocate the required number of page tables. Extremely similar to
+ * gen8_ppgtt_alloc_pagedirs(). The main difference is here we are limited by
+ * the page directory boundary (instead of the page directory pointer). That
+ * boundary is 1GB virtual. Therefore, unlike gen8_ppgtt_alloc_pagedirs(), it is
+ * possible, and likely that the caller will need to use multiple calls of this
+ * function to achieve the appropriate allocation.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
+ struct i915_pagedir *pd,
uint64_t start,
uint64_t length,
- struct drm_device *dev)
+ unsigned long *new_pts)
{
- struct i915_pagetab *unused;
+ struct i915_pagetab *pt;
uint64_t temp;
uint32_t pde;
- gen8_for_each_pde(unused, pd, start, length, temp, pde) {
- BUG_ON(unused);
- pd->page_tables[pde] = alloc_pt_single(dev);
- if (IS_ERR(pd->page_tables[pde]))
+ gen8_for_each_pde(pt, pd, start, length, temp, pde) {
+ /* Don't reallocate page tables */
+ if (pt) {
+ /* Scratch is never allocated this way */
+ WARN_ON(pt->scratch);
+ /* If there is a zombie, we can reuse it and save time
+ * on the allocation. If we clear the zombie status and
+ * the caller somehow fails, we'll probably hit some
+ * assertions, so it's up to them to fix up the bitmaps.
+ */
+ continue;
+ }
+
+ pt = alloc_pt_single(ppgtt->base.dev);
+ if (IS_ERR(pt))
goto unwind_out;
+
+ pd->page_tables[pde] = pt;
+ set_bit(pde, new_pts);
}
return 0;
unwind_out:
- while (pde--)
- free_pt_single(pd->page_tables[pde], dev);
+ for_each_set_bit(pde, new_pts, I915_PDES_PER_PD)
+ free_pt_single(pd->page_tables[pde], ppgtt->base.dev);
return -ENOMEM;
}
-/* bitmap of new pagedirs */
-static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
+/**
+ * gen8_ppgtt_alloc_pagedirs() - Allocate page directories for VA range.
+ * @ppgtt: Master ppgtt structure.
+ * @pdp: Page directory pointer for this address range.
+ * @start: Starting virtual address to begin allocations.
+ * @length Size of the allocations.
+ * @new_pds Bitmap set by function with new allocations. Likely used by the
+ * caller to free on error.
+ *
+ * Allocate the required number of page directories starting at the pde index of
+ * @start, and ending at the pde index @start + @length. This function will skip
+ * over already allocated page directories within the range, and only allocate
+ * new ones, setting the appropriate pointer within the pdp as well as the
+ * correct position in the bitmap @new_pds.
+ *
+ * The function will only allocate the pages within the range for a give page
+ * directory pointer. In other words, if @start + @length straddles a virtually
+ * addressed PDP boundary (512GB for 4k pages), there will be more allocations
+ * required by the caller, This is not currently possible, and the BUG in the
+ * code will prevent it.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_pagedirs(struct i915_hw_ppgtt *ppgtt,
+ struct i915_pagedirpo *pdp,
uint64_t start,
uint64_t length,
- struct drm_device *dev)
+ unsigned long *new_pds)
{
- struct i915_pagedir *unused;
+ struct i915_pagedir *pd;
uint64_t temp;
uint32_t pdpe;
+ BUG_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
+
/* FIXME: PPGTT container_of won't work for 64b */
BUG_ON((start + length) > 0x800000000ULL);
- gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
- BUG_ON(unused);
- pdp->pagedirs[pdpe] = alloc_pd_single(dev);
+ gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
+ if (pd)
+ continue;
- if (IS_ERR(pdp->pagedirs[pdpe]))
+ pd = alloc_pd_single(ppgtt->base.dev);
+ if (IS_ERR(pd))
goto unwind_out;
+
+ pdp->pagedirs[pdpe] = pd;
+ set_bit(pdpe, new_pds);
}
return 0;
unwind_out:
- while (pdpe--)
- free_pd_single(pdp->pagedirs[pdpe], dev);
+ for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
+ free_pd_single(pdp->pagedirs[pdpe], ppgtt->base.dev);
+
+ return -ENOMEM;
+}
+
+static inline void
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
+{
+ int i;
+ for (i = 0; i < GEN8_LEGACY_PDPES; i++)
+ kfree(new_pts[i]);
+ kfree(new_pts);
+ kfree(new_pds);
+}
+
+/* Fills in the page directory bitmap, ant the array of page tables bitmap. Both
+ * of these are based on the number of PDPEs in the system.
+ */
+int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
+ unsigned long ***new_pts)
+{
+ int i;
+ unsigned long *pds;
+ unsigned long **pts;
+
+ pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES), sizeof(unsigned long), GFP_KERNEL);
+ if (!pds)
+ return -ENOMEM;
+
+ pts = kcalloc(I915_PDES_PER_PD, sizeof(unsigned long *), GFP_KERNEL);
+ if (!pts) {
+ kfree(pds);
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+ pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES_PER_PD),
+ sizeof(unsigned long), GFP_KERNEL);
+ if (!pts[i])
+ goto err_out;
+ }
+ *new_pds = pds;
+ *new_pts = (unsigned long **)pts;
+
+ return 0;
+
+err_out:
+ free_gen8_temp_bitmaps(pds, pts);
return -ENOMEM;
}
@@ -699,6 +847,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
+ unsigned long *new_page_dirs, **new_page_tables;
struct i915_pagedir *pd;
const uint64_t orig_start = start;
const uint64_t orig_length = length;
@@ -706,43 +855,103 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
uint32_t pdpe;
int ret;
- /* Do the allocations first so we can easily bail out */
- ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length,
- ppgtt->base.dev);
+#ifndef CONFIG_64BIT
+ /* Disallow 64b address on 32b platforms. Nothing is wrong with doing
+ * this in hardware, but a lot of the drm code is not prepared to handle
+ * 64b offset on 32b platforms. */
+ if (start + length > 0x100000000ULL)
+ return -E2BIG;
+#endif
+
+ /* Wrap is never okay since we can only represent 48b, and we don't
+ * actually use the other side of the canonical address space.
+ */
+ if (WARN_ON(start + length < start))
+ return -ERANGE;
+
+ ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
if (ret)
return ret;
+ /* Do the allocations first so we can easily bail out */
+ ret = gen8_ppgtt_alloc_pagedirs(ppgtt, &ppgtt->pdp, start, length,
+ new_page_dirs);
+ if (ret) {
+ free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+ return ret;
+ }
+
+ /* For every page directory referenced, allocate page tables */
gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
- ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
- ppgtt->base.dev);
+ bitmap_zero(new_page_tables[pdpe], I915_PDES_PER_PD);
+ ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
+ new_page_tables[pdpe]);
if (ret)
goto err_out;
}
- /* Now mark everything we've touched as used. This doesn't allow for
- * robust error checking, but it makes the code a hell of a lot simpler.
- */
start = orig_start;
length = orig_length;
+ /* Allocations have completed successfully, so set the bitmaps, and do
+ * the mappings. */
gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+ gen8_ppgtt_pde_t *const pagedir = kmap_atomic(pd->page);
struct i915_pagetab *pt;
uint64_t pd_len = gen8_clamp_pd(start, length);
uint64_t pd_start = start;
uint32_t pde;
- gen8_for_each_pde(pt, &ppgtt->pd, pd_start, pd_len, temp, pde) {
- bitmap_set(pd->page_tables[pde]->used_ptes,
- gen8_pte_index(start),
- gen8_pte_count(start, length));
+
+ /* Every pd should be allocated, we just did that above. */
+ BUG_ON(!pd);
+
+ gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+ /* Same reasoning as pd */
+ BUG_ON(!pt);
+ BUG_ON(!pd_len);
+ BUG_ON(!gen8_pte_count(pd_start, pd_len));
+
+ /* Set our used ptes within the page table */
+ bitmap_set(pt->used_ptes,
+ gen8_pte_index(pd_start),
+ gen8_pte_count(pd_start, pd_len));
+
+ /* Our pde is now pointing to the pagetable, pt */
set_bit(pde, pd->used_pdes);
+
+ /* Map the PDE to the page table */
+ __gen8_do_map_pt(pagedir + pde, pt, vm->dev);
+
+ /* NB: We haven't yet mapped ptes to pages. At this
+ * point we're still relying on insert_entries() */
+
+ /* No longer possible this page table is a zombie */
+ pt->zombie = 0;
}
+
+ if (!HAS_LLC(vm->dev))
+ drm_clflush_virt_range(pagedir, PAGE_SIZE);
+
+ kunmap_atomic(pagedir);
+
set_bit(pdpe, ppgtt->pdp.used_pdpes);
+ /* This pd is officially not a zombie either */
+ ppgtt->pdp.pagedirs[pdpe]->zombie = 0;
}
+ free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
return 0;
err_out:
- gen8_teardown_va_range(vm, orig_start, start);
+ while (pdpe--) {
+ for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES_PER_PD)
+ free_pt_single(pd->page_tables[temp], vm->dev);
+ }
+
+ for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
+ free_pd_single(ppgtt->pdp.pagedirs[pdpe], vm->dev);
+
+ free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
return ret;
}
@@ -753,37 +962,68 @@ err_out:
* space.
*
*/
-static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
+static int gen8_ppgtt_init_common(struct i915_hw_ppgtt *ppgtt, uint64_t size)
{
- struct i915_pagedir *pd;
- uint64_t temp, start = 0;
- const uint64_t orig_length = size;
- uint32_t pdpe;
- int ret;
+ ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
+ if (IS_ERR(ppgtt->scratch_pd))
+ return PTR_ERR(ppgtt->scratch_pd);
ppgtt->base.start = 0;
ppgtt->base.total = size;
- ppgtt->base.clear_range = gen8_ppgtt_clear_range;
- ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+ ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+
ppgtt->switch_mm = gen8_mm_switch;
- ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
- if (IS_ERR(ppgtt->scratch_pd))
- return PTR_ERR(ppgtt->scratch_pd);
+ return 0;
+}
+
+static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+ struct drm_device *dev = ppgtt->base.dev;
+ struct drm_i915_private *dev_priv = dev->dev_private;
+ struct i915_pagedir *pd;
+ uint64_t temp, start = 0, size = dev_priv->gtt.base.total;
+ uint32_t pdpe;
+ int ret;
+ ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+ if (ret)
+ return ret;
+
+ /* Aliasing PPGTT has to always work and be mapped because of the way we
+ * use RESTORE_INHIBIT in the context switch. This will be fixed
+ * eventually. */
ret = gen8_alloc_va_range(&ppgtt->base, start, size);
if (ret) {
free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
return ret;
}
- start = 0;
- size = orig_length;
-
gen8_for_each_pdpe(pd, &ppgtt->pdp, start, size, temp, pdpe)
gen8_map_pagetable_range(pd, start, size, ppgtt->base.dev);
+ ppgtt->base.allocate_va_range = NULL;
+ ppgtt->base.teardown_va_range = NULL;
+ ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+
+ return 0;
+}
+
+static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
+{
+ struct drm_device *dev = ppgtt->base.dev;
+ struct drm_i915_private *dev_priv = dev->dev_private;
+ int ret;
+
+ ret = gen8_ppgtt_init_common(ppgtt, dev_priv->gtt.base.total);
+ if (ret)
+ return ret;
+
+ ppgtt->base.allocate_va_range = gen8_alloc_va_range;
+ ppgtt->base.teardown_va_range = gen8_teardown_va_range;
+ ppgtt->base.clear_range = NULL;
+
return 0;
}
@@ -1315,9 +1555,9 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing)
if (ret)
return ret;
- ppgtt->base.allocate_va_range = gen6_alloc_va_range;
- ppgtt->base.teardown_va_range = gen6_teardown_va_range;
- ppgtt->base.clear_range = gen6_ppgtt_clear_range;
+ ppgtt->base.allocate_va_range = aliasing ? NULL : gen6_alloc_va_range;
+ ppgtt->base.teardown_va_range = aliasing ? NULL : gen6_teardown_va_range;
+ ppgtt->base.clear_range = aliasing ? gen6_ppgtt_clear_range : NULL;
ppgtt->base.insert_entries = gen6_ppgtt_insert_entries;
ppgtt->base.cleanup = gen6_ppgtt_cleanup;
ppgtt->base.start = 0;
@@ -1355,8 +1595,10 @@ static int __hw_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt,
if (INTEL_INFO(dev)->gen < 8)
return gen6_ppgtt_init(ppgtt, aliasing);
+ else if ((IS_GEN8(dev) || IS_GEN9(dev)) && aliasing)
+ return gen8_aliasing_ppgtt_init(ppgtt);
else if (IS_GEN8(dev) || IS_GEN9(dev))
- return gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total);
+ return gen8_ppgtt_init(ppgtt);
else
BUG();
}
@@ -1370,8 +1612,9 @@ int i915_ppgtt_init(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt)
kref_init(&ppgtt->ref);
drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
ppgtt->base.total);
- ppgtt->base.clear_range(&ppgtt->base, 0,
- ppgtt->base.total, true);
+ if (ppgtt->base.clear_range)
+ ppgtt->base.clear_range(&ppgtt->base, 0,
+ ppgtt->base.total, true);
i915_init_vm(dev_priv, &ppgtt->base);
}
@@ -1489,10 +1732,7 @@ ppgtt_bind_vma(struct i915_vma *vma,
static void ppgtt_unbind_vma(struct i915_vma *vma)
{
- vma->vm->clear_range(vma->vm,
- vma->node.start,
- vma->obj->base.size,
- true);
+ WARN_ON(vma->vm->teardown_va_range && vma->vm->clear_range);
if (vma->vm->teardown_va_range) {
trace_i915_va_teardown(vma->vm,
vma->node.start, vma->node.size,
@@ -1501,7 +1741,14 @@ static void ppgtt_unbind_vma(struct i915_vma *vma)
vma->vm->teardown_va_range(vma->vm,
vma->node.start, vma->node.size);
ppgtt_invalidate_tlbs(vma->vm);
- }
+ } else if (vma->vm->clear_range) {
+ vma->vm->clear_range(vma->vm,
+ vma->node.start,
+ vma->obj->base.size,
+ true);
+ } else
+ BUG();
+
}
extern int intel_iommu_gfx_mapped;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 120f213..947d214 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -177,13 +177,26 @@ struct i915_vma {
u32 flags);
};
-
+/* Zombies. We write page tables with the CPU, and hardware switches them with
+ * the GPU. As such, the only time we can safely remove a page table is when we
+ * know the context is idle. Since we have no good way to do this, we use the
+ * zombie.
+ *
+ * Under memory pressure, if the system is idle, zombies may be reaped.
+ *
+ * There are 3 states a page table can be in (not including scratch)
+ * bitmap = 0, zombie = 0: unallocated
+ * bitmap = 1, zombie = 0: allocated
+ * bitmap = 0, zombie = 1: zombie
+ * bitmap = 1, zombie = 1: invalid
+ */
struct i915_pagetab {
struct page *page;
dma_addr_t daddr;
unsigned long *used_ptes;
unsigned int scratch:1;
+ unsigned zombie:1;
};
struct i915_pagedir {
@@ -195,6 +208,7 @@ struct i915_pagedir {
unsigned long *used_pdes;
struct i915_pagetab *page_tables[I915_PDES_PER_PD];
+ unsigned zombie:1;
};
struct i915_pagedirpo {
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [RFC 38/38] drm/i915/bdw: Dynamic page table allocations in lrc mode
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (36 preceding siblings ...)
2014-10-07 17:11 ` [RFC 37/38] drm/i915/bdw: Dynamic page table allocations Michel Thierry
@ 2014-10-07 17:11 ` Michel Thierry
2014-10-08 7:13 ` [RFC 00/38] PPGTT dynamic page allocations Chris Wilson
2014-11-04 12:54 ` Daniel Vetter
39 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-10-07 17:11 UTC (permalink / raw)
To: intel-gfx
Logic ring contexts need to know the PDPs when they are populated. With
dynamic page table allocations, these PDPs may not exist yet.
Check if PDPs have been allocated and use the scratch page if they do
not exist yet.
Before submission, update the PDPs in the logic ring context as PDPs
have been allocated.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/intel_lrc.c | 80 +++++++++++++++++++++++++++++++++++-----
1 file changed, 70 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b426fe6..edbc35e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -341,6 +341,7 @@ static void execlists_elsp_write(struct intel_engine_cs *ring,
static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
struct drm_i915_gem_object *ring_obj,
+ struct i915_hw_ppgtt *ppgtt,
u32 tail)
{
struct page *page;
@@ -352,6 +353,40 @@ static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
reg_state[CTX_RING_TAIL+1] = tail;
reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
+ /* True PPGTT with dynamic page allocation: update PDP registers and
+ * point the unallocated PDPs to the scratch page
+ */
+ if (ppgtt) {
+ if (test_bit(3, ppgtt->pdp.used_pdpes)) {
+ reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedirs[3]->daddr);
+ reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedirs[3]->daddr);
+ } else {
+ reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+ reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+ }
+ if (test_bit(2, ppgtt->pdp.used_pdpes)) {
+ reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedirs[2]->daddr);
+ reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedirs[2]->daddr);
+ } else {
+ reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+ reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+ }
+ if (test_bit(1, ppgtt->pdp.used_pdpes)) {
+ reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedirs[1]->daddr);
+ reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedirs[1]->daddr);
+ } else {
+ reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+ reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+ }
+ if (test_bit(0, ppgtt->pdp.used_pdpes)) {
+ reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedirs[0]->daddr);
+ reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedirs[0]->daddr);
+ } else {
+ reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+ reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+ }
+ }
+
kunmap_atomic(reg_state);
return 0;
@@ -370,7 +405,7 @@ static int execlists_submit_context(struct intel_engine_cs *ring,
WARN_ON(!i915_gem_obj_is_pinned(ctx_obj0));
WARN_ON(!i915_gem_obj_is_pinned(ringbuf0->obj));
- execlists_update_context(ctx_obj0, ringbuf0->obj, tail0);
+ execlists_update_context(ctx_obj0, ringbuf0->obj, to0->ppgtt, tail0);
if (to1) {
ringbuf1 = to1->engine[ring->id].ringbuf;
@@ -379,7 +414,7 @@ static int execlists_submit_context(struct intel_engine_cs *ring,
WARN_ON(!i915_gem_obj_is_pinned(ctx_obj1));
WARN_ON(!i915_gem_obj_is_pinned(ringbuf1->obj));
- execlists_update_context(ctx_obj1, ringbuf1->obj, tail1);
+ execlists_update_context(ctx_obj1, ringbuf1->obj, to1->ppgtt, tail1);
}
execlists_elsp_write(ring, ctx_obj0, ctx_obj1);
@@ -1719,14 +1754,39 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
- reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedirs[3]->daddr);
- reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedirs[3]->daddr);
- reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedirs[2]->daddr);
- reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedirs[2]->daddr);
- reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedirs[1]->daddr);
- reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedirs[1]->daddr);
- reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedirs[0]->daddr);
- reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedirs[0]->daddr);
+
+ /* With dynamic page allocation, PDPs may not be allocated at this point,
+ * Point the unallocated PDPs to the scratch page
+ */
+ if (test_bit(3, ppgtt->pdp.used_pdpes)) {
+ reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->pdp.pagedirs[3]->daddr);
+ reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->pdp.pagedirs[3]->daddr);
+ } else {
+ reg_state[CTX_PDP3_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+ reg_state[CTX_PDP3_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+ }
+ if (test_bit(2, ppgtt->pdp.used_pdpes)) {
+ reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->pdp.pagedirs[2]->daddr);
+ reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->pdp.pagedirs[2]->daddr);
+ } else {
+ reg_state[CTX_PDP2_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+ reg_state[CTX_PDP2_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+ }
+ if (test_bit(1, ppgtt->pdp.used_pdpes)) {
+ reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->pdp.pagedirs[1]->daddr);
+ reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->pdp.pagedirs[1]->daddr);
+ } else {
+ reg_state[CTX_PDP1_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+ reg_state[CTX_PDP1_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+ }
+ if (test_bit(0, ppgtt->pdp.used_pdpes)) {
+ reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->pdp.pagedirs[0]->daddr);
+ reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->pdp.pagedirs[0]->daddr);
+ } else {
+ reg_state[CTX_PDP0_UDW+1] = upper_32_bits(ppgtt->scratch_pd->daddr);
+ reg_state[CTX_PDP0_LDW+1] = lower_32_bits(ppgtt->scratch_pd->daddr);
+ }
+
if (ring->id == RCS) {
reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
--
2.0.3
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [RFC 00/38] PPGTT dynamic page allocations
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (37 preceding siblings ...)
2014-10-07 17:11 ` [RFC 38/38] drm/i915/bdw: Dynamic page table allocations in lrc mode Michel Thierry
@ 2014-10-08 7:13 ` Chris Wilson
2014-11-04 12:44 ` Daniel Vetter
2014-11-04 12:54 ` Daniel Vetter
39 siblings, 1 reply; 53+ messages in thread
From: Chris Wilson @ 2014-10-08 7:13 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx
On Tue, Oct 07, 2014 at 06:10:56PM +0100, Michel Thierry wrote:
> This is based on the first 55 patches of Ben's 48b addressing work, taking
> into consideration the latest changes in (mainly aliasing) ppgtt rules.
>
> Because of these changes in the tree, the first 17 patches of the original
> series are no longer needed, and some patches required more rework than others.
>
> For GEN8, it has also been extended to work in logical ring submission (lrc)
> mode, as it looks like it will be the preferred mode of operation.
> I also tried to update the lrc code at the same time the ppgtt refactoring
> occurred, leaving only one patch that is exclusively for lrc.
>
> I'm asking for comments, as this is the foundation for 48b virtual addressing
> in Broadwell.
I find the lack of activity tracking in the current ppgtt design severely
limiting. We have a number of tests (both igt and mesa) that fail
because the ppgtt pins gtt space for its lifetime. Transitioning the
backing pages to a bo allows us to evict, and even shrink, vm along with
regular objects. Plus the dynamic allocation here has also been
discussed with the idea of sparse allocation of bo... Imo, we want to
use bo (probably based on gemfs) for both.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 01/38] drm/i915: Add some extra guards in evict_vm
2014-10-07 17:10 ` [RFC 01/38] drm/i915: Add some extra guards in evict_vm Michel Thierry
@ 2014-10-08 13:36 ` Daniel Vetter
0 siblings, 0 replies; 53+ messages in thread
From: Daniel Vetter @ 2014-10-08 13:36 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx
On Tue, Oct 07, 2014 at 06:10:57PM +0100, Michel Thierry wrote:
> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_evict.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> index 886ff2e..7fd8b9b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> @@ -214,6 +214,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
> struct i915_vma *vma, *next;
> int ret;
>
> + BUG_ON(!mutex_is_locked(&vm->dev->struct_mutex));
No BUG_ON if it means a potential soft failure becomes a hard failure. A
lot of our code runs (at least a driver load time) under the console_lock.
Which means that if you die with a BUG your system is completely dead.
WARN_ON is perfectly fine here.
-Daniel
> trace_i915_gem_evict_vm(vm);
>
> if (do_idle) {
> @@ -222,6 +223,8 @@ int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle)
> return ret;
>
> i915_gem_retire_requests(vm->dev);
> +
> + WARN_ON(!list_empty(&vm->active_list));
> }
>
> list_for_each_entry_safe(vma, next, &vm->inactive_list, mm_list)
> --
> 2.0.3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 04/38] drm/i915: Make pin global flags explicit
2014-10-07 17:11 ` [RFC 04/38] drm/i915: Make pin global flags explicit Michel Thierry
@ 2014-10-08 13:36 ` Daniel Vetter
0 siblings, 0 replies; 53+ messages in thread
From: Daniel Vetter @ 2014-10-08 13:36 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx
On Tue, Oct 07, 2014 at 06:11:00PM +0100, Michel Thierry wrote:
> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> The driver currently lets callers pin global, and then tries to do
> things correctly inside the function. Doing so has two downsides:
> 1. It's not possible to exclusively pin to a global, or an aliasing
> address space.
> 2. It's difficult to read, and understand.
>
> The eventual goal when realized should fix both of the issues. This patch
> which should have no functional impact begins to address these issues
> without intentionally breaking things.
>
> v2: Replace PIN_GLOBAL with PIN_ALIASING in _pin(). Copy paste error
>
> v3: Rebased/reworked with flag conflict from negative relocations
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 14 ++++++++------
> drivers/gpu/drm/i915/i915_gem.c | 31 +++++++++++++++++++++++-------
> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 ++-
> drivers/gpu/drm/i915/i915_gem_gtt.c | 12 ++++++++++--
> drivers/gpu/drm/i915/i915_gem_gtt.h | 6 +++++-
> 5 files changed, 49 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 3c725ec..6b60e90 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2396,11 +2396,13 @@ void i915_init_vm(struct drm_i915_private *dev_priv,
> void i915_gem_free_object(struct drm_gem_object *obj);
> void i915_gem_vma_destroy(struct i915_vma *vma);
>
> -#define PIN_MAPPABLE 0x1
> -#define PIN_NONBLOCK 0x2
> -#define PIN_GLOBAL 0x4
> -#define PIN_OFFSET_BIAS 0x8
> -#define PIN_OFFSET_MASK (~4095)
> +#define PIN_MAPPABLE (1<<0)
> +#define PIN_NONBLOCK (1<<1)
> +#define PIN_GLOBAL (1<<2)
> +#define PIN_ALIASING (1<<3)
> +#define PIN_GLOBAL_ALIASED (PIN_ALIASING | PIN_GLOBAL)
> +#define PIN_OFFSET_BIAS (1<<4)
> +#define PIN_OFFSET_MASK (PAGE_MASK)
#define rename should be split out.
> int __must_check i915_gem_object_pin(struct drm_i915_gem_object *obj,
> struct i915_address_space *vm,
> uint32_t alignment,
> @@ -2618,7 +2620,7 @@ i915_gem_obj_ggtt_pin(struct drm_i915_gem_object *obj,
> unsigned flags)
> {
> return i915_gem_object_pin(obj, i915_obj_to_ggtt(obj),
> - alignment, flags | PIN_GLOBAL);
> + alignment, flags | PIN_GLOBAL_ALIASED);
> }
>
> static inline int
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 7745d22..dfb20e6 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3421,8 +3421,12 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
> unsigned long end =
> flags & PIN_MAPPABLE ? dev_priv->gtt.mappable_end : vm->total;
> struct i915_vma *vma;
> + u32 vma_bind_flags = 0;
> int ret;
>
> + if (WARN_ON((flags & (PIN_MAPPABLE | PIN_GLOBAL)) == PIN_MAPPABLE))
> + flags |= PIN_GLOBAL;
> +
> fence_size = i915_gem_get_gtt_size(dev,
> obj->base.size,
> obj->tiling_mode);
> @@ -3508,9 +3512,11 @@ search_free:
>
> WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
>
> + if (flags & PIN_GLOBAL_ALIASED)
> + vma_bind_flags = GLOBAL_BIND | ALIASING_BIND;
> +
> trace_i915_vma_bind(vma, flags);
> - i915_gem_vma_bind(vma, obj->cache_level,
> - flags & (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0);
> + i915_gem_vma_bind(vma, obj->cache_level, vma_bind_flags);
>
> return vma;
>
> @@ -3716,9 +3722,14 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> }
>
> list_for_each_entry(vma, &obj->vma_list, vma_link)
> - if (drm_mm_node_allocated(&vma->node))
> - i915_gem_vma_bind(vma, cache_level,
> - obj->has_global_gtt_mapping ? GLOBAL_BIND : 0);
> + if (drm_mm_node_allocated(&vma->node)) {
> + u32 bind_flags = 0;
> + if (obj->has_global_gtt_mapping)
> + bind_flags |= GLOBAL_BIND;
> + if (obj->has_aliasing_ppgtt_mapping)
> + bind_flags |= ALIASING_BIND;
> + i915_gem_vma_bind(vma, cache_level, bind_flags);
We should have a vma_rebind function for use here and in the gtt restore
code.
> + }
> }
>
> list_for_each_entry(vma, &obj->vma_list, vma_link)
> @@ -4114,8 +4125,14 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
> return PTR_ERR(vma);
> }
>
> - if (flags & PIN_GLOBAL && !obj->has_global_gtt_mapping)
> - i915_gem_vma_bind(vma, obj->cache_level, GLOBAL_BIND);
> + if (flags & PIN_GLOBAL_ALIASED) {
> + u32 bind_flags = 0;
> + if (flags & PIN_GLOBAL && !obj->has_global_gtt_mapping)
> + bind_flags |= GLOBAL_BIND;
> + if (flags & PIN_ALIASING && !obj->has_aliasing_ppgtt_mapping)
> + bind_flags |= ALIASING_BIND;
> + i915_gem_vma_bind(vma, obj->cache_level, bind_flags);
> + }
>
> vma->pin_count++;
> if (flags & PIN_MAPPABLE)
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 4564988..92191f0 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -362,7 +362,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> list_first_entry(&target_i915_obj->vma_list,
> typeof(*vma), vma_link);
> i915_gem_vma_bind(vma, target_i915_obj->cache_level,
> - GLOBAL_BIND);
> + GLOBAL_BIND | ALIASING_BIND);
If you read the comment above then it's clear we actually want a global
binding here specifically. Adding the aliasing_bind flag is confusing.
> }
>
> /* Validate that the target is in a valid r/w GPU domain */
> @@ -533,6 +533,7 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
> flags = 0;
> if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
> flags |= PIN_MAPPABLE;
> + /* FIXME: What kind of bind does Chris want? */
This is the place where the aliasing bind should have been. Presuming
follow-up patches indeed rework the binding then this will badly break
snb. Or any gen7 machine booted with i915.enable_ppgtt=1.
> if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
> flags |= PIN_GLOBAL;
> if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 0c203f4..d725883 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1336,8 +1336,16 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
> * Unfortunately above, we've just wiped out the mappings
> * without telling our object about it. So we need to fake it.
> */
> - obj->has_global_gtt_mapping = 0;
> - i915_gem_vma_bind(vma, obj->cache_level, GLOBAL_BIND);
> + if (obj->has_global_gtt_mapping || obj->has_aliasing_ppgtt_mapping) {
> + u32 bind_flags = 0;
> + if (obj->has_global_gtt_mapping)
> + bind_flags |= GLOBAL_BIND;
> + if (obj->has_aliasing_ppgtt_mapping)
> + bind_flags |= ALIASING_BIND;
> + obj->has_global_gtt_mapping = 0;
> + obj->has_aliasing_ppgtt_mapping = 0;
> + i915_gem_vma_bind(vma, obj->cache_level, bind_flags);
> + }
> }
>
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index d5c14af..5fd7fa9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -155,8 +155,12 @@ struct i915_vma {
> * setting the valid PTE entries to a reserved scratch page. */
> void (*unbind_vma)(struct i915_vma *vma);
> /* Map an object into an address space with the given cache flags. */
> +
> +/* Only use this if you know you want a strictly global binding */
> #define GLOBAL_BIND (1<<0)
> -#define PTE_READ_ONLY (1<<1)
> +/* Only use this if you know you want a strictly aliased binding */
> +#define ALIASING_BIND (1<<1)
> +#define PTE_READ_ONLY (1<<2)
> void (*bind_vma)(struct i915_vma *vma,
> enum i915_cache_level cache_level,
> u32 flags);
> --
> 2.0.3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 05/38] drm/i915: Split out aliasing binds
2014-10-07 17:11 ` [RFC 05/38] drm/i915: Split out aliasing binds Michel Thierry
@ 2014-10-08 13:41 ` Daniel Vetter
0 siblings, 0 replies; 53+ messages in thread
From: Daniel Vetter @ 2014-10-08 13:41 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx
On Tue, Oct 07, 2014 at 06:11:01PM +0100, Michel Thierry wrote:
> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> This patch finishes off actually separating the aliasing and global
> finds. Prior to this, all global binds would be aliased. Now if aliasing
> binds are required, they must be explicitly asked for. So far, we have
> no users of this outside of execbuf - but Mika has already submitted a
> patch requiring just this.
>
> A nice benefit of this is we should no longer be able to clobber GTT
> only objects from the aliasing PPGTT.
>
> v2: Only add aliasing binds for the GGTT/Aliasing PPGTT at execbuf
>
> v3: Rebase resolution with changed size of flags
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 2 +-
> drivers/gpu/drm/i915/i915_gem.c | 6 ++++--
> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +++--
> drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +++
> 4 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 6b60e90..c0fea18 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2620,7 +2620,7 @@ i915_gem_obj_ggtt_pin(struct drm_i915_gem_object *obj,
> unsigned flags)
> {
> return i915_gem_object_pin(obj, i915_obj_to_ggtt(obj),
> - alignment, flags | PIN_GLOBAL_ALIASED);
> + alignment, flags | PIN_GLOBAL);
This hunk looks like a patch split mixup and should probably be in the
previous patch.
Also I'm not really clear on all these flags and what they should do with
pure ppgtt/pure ggtt address spaces. We probably need to lock down abuse
(well, potential bugs) through copious sprinkling of WARN_ONs.
-Daniel
> }
>
> static inline int
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index dfb20e6..98186b2 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3512,8 +3512,10 @@ search_free:
>
> WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
>
> - if (flags & PIN_GLOBAL_ALIASED)
> - vma_bind_flags = GLOBAL_BIND | ALIASING_BIND;
> + if (flags & PIN_ALIASING)
> + vma_bind_flags = ALIASING_BIND;
> + if (flags & PIN_GLOBAL)
> + vma_bind_flags = GLOBAL_BIND;
>
> trace_i915_vma_bind(vma, flags);
> i915_gem_vma_bind(vma, obj->cache_level, vma_bind_flags);
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 92191f0..d3a89e6 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -527,10 +527,11 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
> {
> struct drm_i915_gem_object *obj = vma->obj;
> struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
> - uint64_t flags;
> + uint64_t flags = 0;
> int ret;
>
> - flags = 0;
> + if (i915_is_ggtt(vma->vm))
> + flags = PIN_ALIASING;
> if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
> flags |= PIN_MAPPABLE;
> /* FIXME: What kind of bind does Chris want? */
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index d725883..ac0197f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1597,6 +1597,9 @@ static void ggtt_bind_vma(struct i915_vma *vma,
> }
> }
>
> + if (!(flags & ALIASING_BIND))
> + return;
> +
> if (dev_priv->mm.aliasing_ppgtt &&
> (!obj->has_aliasing_ppgtt_mapping ||
> (cache_level != obj->cache_level))) {
> --
> 2.0.3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 06/38] drm/i915: fix gtt_total_entries()
2014-10-07 17:11 ` [RFC 06/38] drm/i915: fix gtt_total_entries() Michel Thierry
@ 2014-10-08 13:52 ` Daniel Vetter
0 siblings, 0 replies; 53+ messages in thread
From: Daniel Vetter @ 2014-10-08 13:52 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx
On Tue, Oct 07, 2014 at 06:11:02PM +0100, Michel Thierry wrote:
> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> It's useful to have it not as a macro for some upcoming work. Generally
> since we try to avoid macros anyway, I think it doesn't hurt to put this
> as its own patch.
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 4 ++--
> drivers/gpu/drm/i915/i915_gem_gtt.h | 8 ++++++--
> drivers/gpu/drm/i915/i915_gem_stolen.c | 2 +-
> 3 files changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index ac0197f..f677deb 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1489,7 +1489,7 @@ static void gen8_ggtt_clear_range(struct i915_address_space *vm,
> unsigned num_entries = length >> PAGE_SHIFT;
> gen8_gtt_pte_t scratch_pte, __iomem *gtt_base =
> (gen8_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
> - const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
> + const int max_entries = gtt_total_entries(&dev_priv->gtt) - first_entry;
> int i;
>
> if (WARN(num_entries > max_entries,
> @@ -1515,7 +1515,7 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm,
> unsigned num_entries = length >> PAGE_SHIFT;
> gen6_gtt_pte_t scratch_pte, __iomem *gtt_base =
> (gen6_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
> - const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
> + const int max_entries = gtt_total_entries(&dev_priv->gtt) - first_entry;
> int i;
>
> if (WARN(num_entries > max_entries,
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 5fd7fa9..98427ce 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -40,8 +40,6 @@ typedef uint32_t gen6_gtt_pte_t;
> typedef uint64_t gen8_gtt_pte_t;
> typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
>
> -#define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
> -
> #define I915_PPGTT_PT_ENTRIES (PAGE_SIZE / sizeof(gen6_gtt_pte_t))
> /* gen6-hsw has bit 11-4 for physical addr bit 39-32 */
> #define GEN6_GTT_ADDR_ENCODE(addr) ((addr) | (((addr) >> 28) & 0xff0))
> @@ -284,6 +282,12 @@ int i915_ppgtt_init_hw(struct drm_device *dev);
> void i915_ppgtt_release(struct kref *kref);
> struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_device *dev,
> struct drm_i915_file_private *fpriv);
> +
> +static inline size_t gtt_total_entries(struct i915_gtt *gtt)
Namespacing of non-file-private functions missing. Might still have some
offenders right around, so pleas fix those up too.
-Daniel
> +{
> + return gtt->base.total >> PAGE_SHIFT;
> +}
> +
> static inline void i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
> {
> if (ppgtt)
> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
> index 85fda6b..4e1b22e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> @@ -90,7 +90,7 @@ static unsigned long i915_stolen_to_physical(struct drm_device *dev)
> (gtt_start & PGTBL_ADDRESS_HI_MASK) << 28;
> else
> gtt_start &= PGTBL_ADDRESS_LO_MASK;
> - gtt_end = gtt_start + gtt_total_entries(dev_priv->gtt) * 4;
> + gtt_end = gtt_start + gtt_total_entries(&dev_priv->gtt) * 4;
>
> if (gtt_start >= stolen[0].start && gtt_start < stolen[0].end)
> stolen[0].end = gtt_start;
> --
> 2.0.3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 09/38] drm/i915: s/pd/pdpe, s/pt/pde
2014-10-07 17:11 ` [RFC 09/38] drm/i915: s/pd/pdpe, s/pt/pde Michel Thierry
@ 2014-10-08 13:55 ` Daniel Vetter
0 siblings, 0 replies; 53+ messages in thread
From: Daniel Vetter @ 2014-10-08 13:55 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx
On Tue, Oct 07, 2014 at 06:11:05PM +0100, Michel Thierry wrote:
> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> The actual correct way to think about this with the new style of page
> table data structures is as the actual entry that is being indexed into
> the array. "pd", and "pt" aren't representative of what the operation is
> doing.
>
> The clarity here will improve the readability of future patches.
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Given that I don't know what pdpe means, I disagree that this improves
clarity really. And generally in the core vm the p*e are pointers to the
actual entry itself, not the index like here.
So if we want this it needs to at least better sell the advantage.
-Daniel
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 0ee258b..12da57a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -502,40 +502,40 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
> }
>
> static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
> - const int pd)
> + const int pdpe)
> {
> dma_addr_t pd_addr;
> int ret;
>
> pd_addr = pci_map_page(ppgtt->base.dev->pdev,
> - &ppgtt->pd_pages[pd], 0,
> + &ppgtt->pd_pages[pdpe], 0,
> PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>
> ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
> if (ret)
> return ret;
>
> - ppgtt->pd_dma_addr[pd] = pd_addr;
> + ppgtt->pd_dma_addr[pdpe] = pd_addr;
>
> return 0;
> }
>
> static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
> - const int pd,
> - const int pt)
> + const int pdpe,
> + const int pde)
> {
> dma_addr_t pt_addr;
> struct page *p;
> int ret;
>
> - p = ppgtt->gen8_pt_pages[pd][pt];
> + p = ppgtt->gen8_pt_pages[pdpe][pde];
> pt_addr = pci_map_page(ppgtt->base.dev->pdev,
> p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
> if (ret)
> return ret;
>
> - ppgtt->gen8_pt_dma_addr[pd][pt] = pt_addr;
> + ppgtt->gen8_pt_dma_addr[pdpe][pde] = pt_addr;
>
> return 0;
> }
> --
> 2.0.3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 13/38] drm/i915: Make gen6_write_pdes gen6_map_page_tables
2014-10-07 17:11 ` [RFC 13/38] drm/i915: Make gen6_write_pdes gen6_map_page_tables Michel Thierry
@ 2014-10-08 14:04 ` Daniel Vetter
0 siblings, 0 replies; 53+ messages in thread
From: Daniel Vetter @ 2014-10-08 14:04 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx
On Tue, Oct 07, 2014 at 06:11:09PM +0100, Michel Thierry wrote:
> From: Ben Widawsky <benjamin.widawsky@intel.com>
>
> Split out single mappings which will help with upcoming work. Also while
> here, rename the function because it is a better description - but this
> function is going away soon.
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 39 ++++++++++++++++++++++---------------
> 1 file changed, 23 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 00b5e5a..f5a1ac9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -678,26 +678,33 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
> }
> }
>
> -static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
> +static void gen6_map_single(struct i915_hw_ppgtt *ppgtt,
> + const unsigned pde_index,
> + dma_addr_t daddr)
Maybe this will unravel later on, but I disagree on the naming scheme
here. Thus far we had
- bind/unbind for high-level/logical mapping on the gpu side.
- write/clear_entry for low-level pte munging on the gpu (this stuff
here).
- dma_map/unmap for iommu mappings.
So using map/unmap here is fairly confusing.
Now if we actually want to bikeshed these functions we should switch
from the pci_map/unmap to the dma_map/unmap functions.
-Daniel
> {
> struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
> - gen6_gtt_pte_t __iomem *pd_addr;
> uint32_t pd_entry;
> + gen6_gtt_pte_t __iomem *pd_addr =
> + (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
> +
> + pd_entry = GEN6_PDE_ADDR_ENCODE(daddr);
> + pd_entry |= GEN6_PDE_VALID;
> +
> + writel(pd_entry, pd_addr + pde_index);
> +}
> +
> +/* Map all the page tables found in the ppgtt structure to incrementing page
> + * directories. */
> +static void gen6_map_page_tables(struct i915_hw_ppgtt *ppgtt)
> +{
> + struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
> int i;
>
> WARN_ON(ppgtt->pd_offset & 0x3f);
> - pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
> - ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
> - for (i = 0; i < ppgtt->num_pd_entries; i++) {
> - dma_addr_t pt_addr;
> -
> - pt_addr = ppgtt->pt_dma_addr[i];
> - pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
> - pd_entry |= GEN6_PDE_VALID;
> + for (i = 0; i < ppgtt->num_pd_entries; i++)
> + gen6_map_single(ppgtt, i, ppgtt->pt_dma_addr[i]);
>
> - writel(pd_entry, pd_addr + i);
> - }
> - readl(pd_addr);
> + readl(dev_priv->gtt.gsm);
> }
>
> static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
> @@ -1087,7 +1094,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> ppgtt->node.size >> 20,
> ppgtt->node.start / PAGE_SIZE);
>
> - gen6_write_pdes(ppgtt);
> + gen6_map_page_tables(ppgtt);
> DRM_DEBUG("Adding PPGTT at offset %x\n",
> ppgtt->pd_offset << 10);
>
> @@ -1365,11 +1372,11 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
> /* TODO: Perhaps it shouldn't be gen6 specific */
> if (i915_is_ggtt(vm)) {
> if (dev_priv->mm.aliasing_ppgtt)
> - gen6_write_pdes(dev_priv->mm.aliasing_ppgtt);
> + gen6_map_page_tables(dev_priv->mm.aliasing_ppgtt);
> continue;
> }
>
> - gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt, base));
> + gen6_map_page_tables(container_of(vm, struct i915_hw_ppgtt, base));
> }
>
> i915_ggtt_flush(dev_priv);
> --
> 2.0.3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 00/38] PPGTT dynamic page allocations
2014-10-08 7:13 ` [RFC 00/38] PPGTT dynamic page allocations Chris Wilson
@ 2014-11-04 12:44 ` Daniel Vetter
2014-11-04 13:01 ` Chris Wilson
0 siblings, 1 reply; 53+ messages in thread
From: Daniel Vetter @ 2014-11-04 12:44 UTC (permalink / raw)
To: Chris Wilson, Michel Thierry, intel-gfx
On Wed, Oct 08, 2014 at 08:13:33AM +0100, Chris Wilson wrote:
> On Tue, Oct 07, 2014 at 06:10:56PM +0100, Michel Thierry wrote:
> > This is based on the first 55 patches of Ben's 48b addressing work, taking
> > into consideration the latest changes in (mainly aliasing) ppgtt rules.
> >
> > Because of these changes in the tree, the first 17 patches of the original
> > series are no longer needed, and some patches required more rework than others.
> >
> > For GEN8, it has also been extended to work in logical ring submission (lrc)
> > mode, as it looks like it will be the preferred mode of operation.
> > I also tried to update the lrc code at the same time the ppgtt refactoring
> > occurred, leaving only one patch that is exclusively for lrc.
> >
> > I'm asking for comments, as this is the foundation for 48b virtual addressing
> > in Broadwell.
>
> I find the lack of activity tracking in the current ppgtt design severely
> limiting. We have a number of tests (both igt and mesa) that fail
> because the ppgtt pins gtt space for its lifetime. Transitioning the
> backing pages to a bo allows us to evict, and even shrink, vm along with
> regular objects. Plus the dynamic allocation here has also been
> discussed with the idea of sparse allocation of bo... Imo, we want to
> use bo (probably based on gemfs) for both.
Picking up an old story ... I guess you're talking about the PD
reservation ppgtt needs on gen7 and which is stolen from the GGTT?
One totally crazy idea I've had is to add an ->evict function to the vma
and just use the vma to track this stuff, with no object attached. That
should be enough for the shrinker, presuming we wrap enough code into the
optional ->evict callback. By default it'd do the normal bo evict. And
with a vfunc ->evict we could also subsume the pageflip stall and ctx
switch tricks into the same infrastructure.
Just food for thoughts.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 00/38] PPGTT dynamic page allocations
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
` (38 preceding siblings ...)
2014-10-08 7:13 ` [RFC 00/38] PPGTT dynamic page allocations Chris Wilson
@ 2014-11-04 12:54 ` Daniel Vetter
2014-11-04 16:29 ` Michel Thierry
39 siblings, 1 reply; 53+ messages in thread
From: Daniel Vetter @ 2014-11-04 12:54 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx
On Tue, Oct 07, 2014 at 06:10:56PM +0100, Michel Thierry wrote:
> This is based on the first 55 patches of Ben's 48b addressing work, taking
> into consideration the latest changes in (mainly aliasing) ppgtt rules.
>
> Because of these changes in the tree, the first 17 patches of the original
> series are no longer needed, and some patches required more rework than others.
>
> For GEN8, it has also been extended to work in logical ring submission (lrc)
> mode, as it looks like it will be the preferred mode of operation.
> I also tried to update the lrc code at the same time the ppgtt refactoring
> occurred, leaving only one patch that is exclusively for lrc.
>
> I'm asking for comments, as this is the foundation for 48b virtual addressing
> in Broadwell.
>
> This list can be seen in 3 parts:
> [01-24] Include code rework for PPGTT (all GENs).
> [25-28] Adds page table allocation for GEN6/GEN7
> [29-38] Enables dynamic allocation in GEN8. It is enabled for both legacy
> and execlist submission modes.
>
> Ben Widawsky (37):
> drm/i915: Add some extra guards in evict_vm
> drm/i915/trace: Fix offsets for 64b
> drm/i915: Wrap VMA binding
> drm/i915: Make pin global flags explicit
> drm/i915: Split out aliasing binds
> drm/i915: fix gtt_total_entries()
> drm/i915: Rename to GEN8_LEGACY_PDPES
> drm/i915: Split out verbose PPGTT dumping
> drm/i915: s/pd/pdpe, s/pt/pde
> drm/i915: rename map/unmap to dma_map/unmap
> drm/i915: Setup less PPGTT on failed pagedir
> drm/i915: Un-hardcode number of page directories
> drm/i915: Make gen6_write_pdes gen6_map_page_tables
> drm/i915: Range clearing is PPGTT agnostic
> drm/i915: Page table helpers, and define renames
> drm/i915: construct page table abstractions
> drm/i915: Complete page table structures
> drm/i915: Create page table allocators
> drm/i915: Generalize GEN6 mapping
> drm/i915: Clean up pagetable DMA map & unmap
> drm/i915: Always dma map page table allocations
> drm/i915: Consolidate dma mappings
> drm/i915: Always dma map page directory allocations
> drm/i915: Track GEN6 page table usage
> drm/i915: Extract context switch skip logic
> drm/i915: Track page table reload need
> drm/i915: Initialize all contexts
> drm/i915: Finish gen6/7 dynamic page table allocation
> drm/i915/bdw: Use dynamic allocation idioms on free
> drm/i915/bdw: pagedirs rework allocation
> drm/i915/bdw: pagetable allocation rework
> drm/i915/bdw: Make the pdp switch a bit less hacky
> drm/i915: num_pd_pages/num_pd_entries isn't useful
> drm/i915: Extract PPGTT param from pagedir alloc
> drm/i915/bdw: Split out mappings
> drm/i915/bdw: begin bitmap tracking
> drm/i915/bdw: Dynamic page table allocations
>
> Michel Thierry (1):
> drm/i915/bdw: Dynamic page table allocations in lrc mode
Ok, high level review:
- The first part of this series seems to shuffle the code around in the
vma binding code. If we actually want to fix this I think we need a
REBIND flag and push the logic for rewriting ptes into the vma_bind
hooks. There's no other way really to fix this, and the breakage for
aliasing ppgtt this current code produces is what's blocking the cmd
parser atm.
- Imo mergin the vma_bind/insert_entries hooks isn't useful, they imo
provide good abstraction. Ofc we should move the vma_bind/unbind
funcstion into the vm functions, since the vfunc dictionary varies by vm
and not by vma. And they only really provide good abstraction if we
first fix up the binding mess.
- The code massively shuffles around the pte and page table handling code,
promising a lot better future. But I simply don't get it - to me this
all looks like massive amounts of churn for no clear gain.
- Imo the really critical part of dynamic pagetable alloc is where exactly
we add this new memory allocation point and how we handle failures. But
since there's so much churn that I somehow can't see through I can't
have a solid opinion on the proposed design.
So overall I think we should untangle this series first and throw out as
much churn as possible. E.g. the binding rework is very much separate imo.
Or someone needs to explain to my why we need all this code reflow.
With that out of the way reviewing the changes due to dynamic page table
alloc should be fairly simple. My expectation would have been that we'd
add a new interface to allocate vm ranges and essentially leave all the
current vma binding unchanged. Having the separate vm range extension is
somewhat important since the drm_mm allocator has a bit the tendency to
just walk up the address space, wasting piles of free space lower down.
Or like I've said maybe I just don't see the real issues and have way too
naive opinion here.
Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 00/38] PPGTT dynamic page allocations
2014-11-04 12:44 ` Daniel Vetter
@ 2014-11-04 13:01 ` Chris Wilson
2014-11-05 9:19 ` Daniel Vetter
0 siblings, 1 reply; 53+ messages in thread
From: Chris Wilson @ 2014-11-04 13:01 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
On Tue, Nov 04, 2014 at 01:44:47PM +0100, Daniel Vetter wrote:
> On Wed, Oct 08, 2014 at 08:13:33AM +0100, Chris Wilson wrote:
> > On Tue, Oct 07, 2014 at 06:10:56PM +0100, Michel Thierry wrote:
> > > This is based on the first 55 patches of Ben's 48b addressing work, taking
> > > into consideration the latest changes in (mainly aliasing) ppgtt rules.
> > >
> > > Because of these changes in the tree, the first 17 patches of the original
> > > series are no longer needed, and some patches required more rework than others.
> > >
> > > For GEN8, it has also been extended to work in logical ring submission (lrc)
> > > mode, as it looks like it will be the preferred mode of operation.
> > > I also tried to update the lrc code at the same time the ppgtt refactoring
> > > occurred, leaving only one patch that is exclusively for lrc.
> > >
> > > I'm asking for comments, as this is the foundation for 48b virtual addressing
> > > in Broadwell.
> >
> > I find the lack of activity tracking in the current ppgtt design severely
> > limiting. We have a number of tests (both igt and mesa) that fail
> > because the ppgtt pins gtt space for its lifetime. Transitioning the
> > backing pages to a bo allows us to evict, and even shrink, vm along with
> > regular objects. Plus the dynamic allocation here has also been
> > discussed with the idea of sparse allocation of bo... Imo, we want to
> > use bo (probably based on gemfs) for both.
>
> Picking up an old story ... I guess you're talking about the PD
> reservation ppgtt needs on gen7 and which is stolen from the GGTT?
>
> One totally crazy idea I've had is to add an ->evict function to the vma
> and just use the vma to track this stuff, with no object attached. That
> should be enough for the shrinker, presuming we wrap enough code into the
> optional ->evict callback. By default it'd do the normal bo evict. And
> with a vfunc ->evict we could also subsume the pageflip stall and ctx
> switch tricks into the same infrastructure.
>
Strangely enough, it already only uses the vma... It is simplest though
just to reuse an obj to store the pages and dma addresses (reusing the
common code), which then ties directly into the evicter over the GGTT,
and shrinker for normal RAM. The only trick you then need is to create a
special vma for the pde in ggtt. After my initial concern, with a little
bit of care using the shmemfs for allocation is only marginally slower
than alloc_page(). I am not yet convinced about teaching evict/shrink
new tricks (or specialising the hammers).
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 00/38] PPGTT dynamic page allocations
2014-11-04 12:54 ` Daniel Vetter
@ 2014-11-04 16:29 ` Michel Thierry
0 siblings, 0 replies; 53+ messages in thread
From: Michel Thierry @ 2014-11-04 16:29 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
[-- Attachment #1.1: Type: text/plain, Size: 5233 bytes --]
On 11/4/2014 12:54 PM, Daniel Vetter wrote:
> On Tue, Oct 07, 2014 at 06:10:56PM +0100, Michel Thierry wrote:
>> This is based on the first 55 patches of Ben's 48b addressing work, taking
>> into consideration the latest changes in (mainly aliasing) ppgtt rules.
>>
>> Because of these changes in the tree, the first 17 patches of the original
>> series are no longer needed, and some patches required more rework than others.
>>
>> For GEN8, it has also been extended to work in logical ring submission (lrc)
>> mode, as it looks like it will be the preferred mode of operation.
>> I also tried to update the lrc code at the same time the ppgtt refactoring
>> occurred, leaving only one patch that is exclusively for lrc.
>>
>> I'm asking for comments, as this is the foundation for 48b virtual addressing
>> in Broadwell.
>>
>> This list can be seen in 3 parts:
>> [01-24] Include code rework for PPGTT (all GENs).
>> [25-28] Adds page table allocation for GEN6/GEN7
>> [29-38] Enables dynamic allocation in GEN8. It is enabled for both legacy
>> and execlist submission modes.
>>
>> Ben Widawsky (37):
>> drm/i915: Add some extra guards in evict_vm
>> drm/i915/trace: Fix offsets for 64b
>> drm/i915: Wrap VMA binding
>> drm/i915: Make pin global flags explicit
>> drm/i915: Split out aliasing binds
>> drm/i915: fix gtt_total_entries()
>> drm/i915: Rename to GEN8_LEGACY_PDPES
>> drm/i915: Split out verbose PPGTT dumping
>> drm/i915: s/pd/pdpe, s/pt/pde
>> drm/i915: rename map/unmap to dma_map/unmap
>> drm/i915: Setup less PPGTT on failed pagedir
>> drm/i915: Un-hardcode number of page directories
>> drm/i915: Make gen6_write_pdes gen6_map_page_tables
>> drm/i915: Range clearing is PPGTT agnostic
>> drm/i915: Page table helpers, and define renames
>> drm/i915: construct page table abstractions
>> drm/i915: Complete page table structures
>> drm/i915: Create page table allocators
>> drm/i915: Generalize GEN6 mapping
>> drm/i915: Clean up pagetable DMA map & unmap
>> drm/i915: Always dma map page table allocations
>> drm/i915: Consolidate dma mappings
>> drm/i915: Always dma map page directory allocations
>> drm/i915: Track GEN6 page table usage
>> drm/i915: Extract context switch skip logic
>> drm/i915: Track page table reload need
>> drm/i915: Initialize all contexts
>> drm/i915: Finish gen6/7 dynamic page table allocation
>> drm/i915/bdw: Use dynamic allocation idioms on free
>> drm/i915/bdw: pagedirs rework allocation
>> drm/i915/bdw: pagetable allocation rework
>> drm/i915/bdw: Make the pdp switch a bit less hacky
>> drm/i915: num_pd_pages/num_pd_entries isn't useful
>> drm/i915: Extract PPGTT param from pagedir alloc
>> drm/i915/bdw: Split out mappings
>> drm/i915/bdw: begin bitmap tracking
>> drm/i915/bdw: Dynamic page table allocations
>>
>> Michel Thierry (1):
>> drm/i915/bdw: Dynamic page table allocations in lrc mode
> Ok, high level review:
>
> - The first part of this series seems to shuffle the code around in the
> vma binding code. If we actually want to fix this I think we need a
> REBIND flag and push the logic for rewriting ptes into the vma_bind
> hooks. There's no other way really to fix this, and the breakage for
> aliasing ppgtt this current code produces is what's blocking the cmd
> parser atm.
>
> - Imo mergin the vma_bind/insert_entries hooks isn't useful, they imo
> provide good abstraction. Ofc we should move the vma_bind/unbind
> funcstion into the vm functions, since the vfunc dictionary varies by vm
> and not by vma. And they only really provide good abstraction if we
> first fix up the binding mess.
>
> - The code massively shuffles around the pte and page table handling code,
> promising a lot better future. But I simply don't get it - to me this
> all looks like massive amounts of churn for no clear gain.
>
> - Imo the really critical part of dynamic pagetable alloc is where exactly
> we add this new memory allocation point and how we handle failures. But
> since there's so much churn that I somehow can't see through I can't
> have a solid opinion on the proposed design.
>
> So overall I think we should untangle this series first and throw out as
> much churn as possible. E.g. the binding rework is very much separate imo.
> Or someone needs to explain to my why we need all this code reflow.
>
> With that out of the way reviewing the changes due to dynamic page table
> alloc should be fairly simple. My expectation would have been that we'd
> add a new interface to allocate vm ranges and essentially leave all the
> current vma binding unchanged. Having the separate vm range extension is
> somewhat important since the drm_mm allocator has a bit the tendency to
> just walk up the address space, wasting piles of free space lower down.
>
> Or like I've said maybe I just don't see the real issues and have way too
> naive opinion here.
>
> Cheers, Daniel
Thanks for the comments Daniel,
I'll prepare another version, trying to just focus in the dynamic alloc
part.
-Michel
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5510 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 00/38] PPGTT dynamic page allocations
2014-11-04 13:01 ` Chris Wilson
@ 2014-11-05 9:19 ` Daniel Vetter
2014-11-05 9:50 ` Chris Wilson
0 siblings, 1 reply; 53+ messages in thread
From: Daniel Vetter @ 2014-11-05 9:19 UTC (permalink / raw)
To: Chris Wilson, Daniel Vetter, Michel Thierry, intel-gfx
On Tue, Nov 04, 2014 at 01:01:09PM +0000, Chris Wilson wrote:
> On Tue, Nov 04, 2014 at 01:44:47PM +0100, Daniel Vetter wrote:
> > On Wed, Oct 08, 2014 at 08:13:33AM +0100, Chris Wilson wrote:
> > > On Tue, Oct 07, 2014 at 06:10:56PM +0100, Michel Thierry wrote:
> > > > This is based on the first 55 patches of Ben's 48b addressing work, taking
> > > > into consideration the latest changes in (mainly aliasing) ppgtt rules.
> > > >
> > > > Because of these changes in the tree, the first 17 patches of the original
> > > > series are no longer needed, and some patches required more rework than others.
> > > >
> > > > For GEN8, it has also been extended to work in logical ring submission (lrc)
> > > > mode, as it looks like it will be the preferred mode of operation.
> > > > I also tried to update the lrc code at the same time the ppgtt refactoring
> > > > occurred, leaving only one patch that is exclusively for lrc.
> > > >
> > > > I'm asking for comments, as this is the foundation for 48b virtual addressing
> > > > in Broadwell.
> > >
> > > I find the lack of activity tracking in the current ppgtt design severely
> > > limiting. We have a number of tests (both igt and mesa) that fail
> > > because the ppgtt pins gtt space for its lifetime. Transitioning the
> > > backing pages to a bo allows us to evict, and even shrink, vm along with
> > > regular objects. Plus the dynamic allocation here has also been
> > > discussed with the idea of sparse allocation of bo... Imo, we want to
> > > use bo (probably based on gemfs) for both.
> >
> > Picking up an old story ... I guess you're talking about the PD
> > reservation ppgtt needs on gen7 and which is stolen from the GGTT?
> >
> > One totally crazy idea I've had is to add an ->evict function to the vma
> > and just use the vma to track this stuff, with no object attached. That
> > should be enough for the shrinker, presuming we wrap enough code into the
> > optional ->evict callback. By default it'd do the normal bo evict. And
> > with a vfunc ->evict we could also subsume the pageflip stall and ctx
> > switch tricks into the same infrastructure.
> >
>
> Strangely enough, it already only uses the vma... It is simplest though
> just to reuse an obj to store the pages and dma addresses (reusing the
> common code), which then ties directly into the evicter over the GGTT,
> and shrinker for normal RAM. The only trick you then need is to create a
> special vma for the pde in ggtt. After my initial concern, with a little
> bit of care using the shmemfs for allocation is only marginally slower
> than alloc_page(). I am not yet convinced about teaching evict/shrink
> new tricks (or specialising the hammers).
Well the new trick would have been used mostly for pageflip and ctx
objects. But maybe not worth the trouble since especially with pageflips
that special type is interim ... We could abuse Tvrtko's ggtt_view stuff
though and hand out special ids for ppgtt pdes and act accordingly.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 00/38] PPGTT dynamic page allocations
2014-11-05 9:19 ` Daniel Vetter
@ 2014-11-05 9:50 ` Chris Wilson
2014-11-05 9:58 ` Chris Wilson
0 siblings, 1 reply; 53+ messages in thread
From: Chris Wilson @ 2014-11-05 9:50 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx
On Wed, Nov 05, 2014 at 10:19:47AM +0100, Daniel Vetter wrote:
> On Tue, Nov 04, 2014 at 01:01:09PM +0000, Chris Wilson wrote:
> > On Tue, Nov 04, 2014 at 01:44:47PM +0100, Daniel Vetter wrote:
> > > On Wed, Oct 08, 2014 at 08:13:33AM +0100, Chris Wilson wrote:
> > > > On Tue, Oct 07, 2014 at 06:10:56PM +0100, Michel Thierry wrote:
> > > > > This is based on the first 55 patches of Ben's 48b addressing work, taking
> > > > > into consideration the latest changes in (mainly aliasing) ppgtt rules.
> > > > >
> > > > > Because of these changes in the tree, the first 17 patches of the original
> > > > > series are no longer needed, and some patches required more rework than others.
> > > > >
> > > > > For GEN8, it has also been extended to work in logical ring submission (lrc)
> > > > > mode, as it looks like it will be the preferred mode of operation.
> > > > > I also tried to update the lrc code at the same time the ppgtt refactoring
> > > > > occurred, leaving only one patch that is exclusively for lrc.
> > > > >
> > > > > I'm asking for comments, as this is the foundation for 48b virtual addressing
> > > > > in Broadwell.
> > > >
> > > > I find the lack of activity tracking in the current ppgtt design severely
> > > > limiting. We have a number of tests (both igt and mesa) that fail
> > > > because the ppgtt pins gtt space for its lifetime. Transitioning the
> > > > backing pages to a bo allows us to evict, and even shrink, vm along with
> > > > regular objects. Plus the dynamic allocation here has also been
> > > > discussed with the idea of sparse allocation of bo... Imo, we want to
> > > > use bo (probably based on gemfs) for both.
> > >
> > > Picking up an old story ... I guess you're talking about the PD
> > > reservation ppgtt needs on gen7 and which is stolen from the GGTT?
> > >
> > > One totally crazy idea I've had is to add an ->evict function to the vma
> > > and just use the vma to track this stuff, with no object attached. That
> > > should be enough for the shrinker, presuming we wrap enough code into the
> > > optional ->evict callback. By default it'd do the normal bo evict. And
> > > with a vfunc ->evict we could also subsume the pageflip stall and ctx
> > > switch tricks into the same infrastructure.
> > >
> >
> > Strangely enough, it already only uses the vma... It is simplest though
> > just to reuse an obj to store the pages and dma addresses (reusing the
> > common code), which then ties directly into the evicter over the GGTT,
> > and shrinker for normal RAM. The only trick you then need is to create a
> > special vma for the pde in ggtt. After my initial concern, with a little
> > bit of care using the shmemfs for allocation is only marginally slower
> > than alloc_page(). I am not yet convinced about teaching evict/shrink
> > new tricks (or specialising the hammers).
>
> Well the new trick would have been used mostly for pageflip and ctx
> objects. But maybe not worth the trouble since especially with pageflips
> that special type is interim ... We could abuse Tvrtko's ggtt_view stuff
> though and hand out special ids for ppgtt pdes and act accordingly.
Hmm, for pageflips it is not that useful unless you give me a method to
pre-bind it elsewhere. Currently, I force a pagefault on objects that
are allocated for use as scanout so that we do not incur the stall when
flipping. Even secondary planes we often use direct access for uploads
and so will also want a ggtt binding before flipping. (If they were
linear, or we used accessors everywhere, we could just ignore the
fence and GGTT for direct access... But that gets complicated and slow
for anything other than simple transfers.)
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [RFC 00/38] PPGTT dynamic page allocations
2014-11-05 9:50 ` Chris Wilson
@ 2014-11-05 9:58 ` Chris Wilson
0 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2014-11-05 9:58 UTC (permalink / raw)
To: Daniel Vetter, Michel Thierry, intel-gfx
On Wed, Nov 05, 2014 at 09:50:47AM +0000, Chris Wilson wrote:
> Hmm, for pageflips it is not that useful unless you give me a method to
> pre-bind it elsewhere. Currently, I force a pagefault on objects that
> are allocated for use as scanout so that we do not incur the stall when
> flipping. Even secondary planes we often use direct access for uploads
> and so will also want a ggtt binding before flipping. (If they were
> linear, or we used accessors everywhere, we could just ignore the
> fence and GGTT for direct access... But that gets complicated and slow
> for anything other than simple transfers.)
Actually using a separate vma for pageflips would not be that expensive
(i.e. no object stall) if you took my vma centric requests patch...
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 53+ messages in thread
end of thread, other threads:[~2014-11-05 9:58 UTC | newest]
Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-07 17:10 [RFC 00/38] PPGTT dynamic page allocations Michel Thierry
2014-10-07 17:10 ` [RFC 01/38] drm/i915: Add some extra guards in evict_vm Michel Thierry
2014-10-08 13:36 ` Daniel Vetter
2014-10-07 17:10 ` [RFC 02/38] drm/i915/trace: Fix offsets for 64b Michel Thierry
2014-10-07 17:10 ` [RFC 03/38] drm/i915: Wrap VMA binding Michel Thierry
2014-10-07 17:11 ` [RFC 04/38] drm/i915: Make pin global flags explicit Michel Thierry
2014-10-08 13:36 ` Daniel Vetter
2014-10-07 17:11 ` [RFC 05/38] drm/i915: Split out aliasing binds Michel Thierry
2014-10-08 13:41 ` Daniel Vetter
2014-10-07 17:11 ` [RFC 06/38] drm/i915: fix gtt_total_entries() Michel Thierry
2014-10-08 13:52 ` Daniel Vetter
2014-10-07 17:11 ` [RFC 07/38] drm/i915: Rename to GEN8_LEGACY_PDPES Michel Thierry
2014-10-07 17:11 ` [RFC 08/38] drm/i915: Split out verbose PPGTT dumping Michel Thierry
2014-10-07 17:11 ` [RFC 09/38] drm/i915: s/pd/pdpe, s/pt/pde Michel Thierry
2014-10-08 13:55 ` Daniel Vetter
2014-10-07 17:11 ` [RFC 10/38] drm/i915: rename map/unmap to dma_map/unmap Michel Thierry
2014-10-07 17:11 ` [RFC 11/38] drm/i915: Setup less PPGTT on failed pagedir Michel Thierry
2014-10-07 17:11 ` [RFC 12/38] drm/i915: Un-hardcode number of page directories Michel Thierry
2014-10-07 17:11 ` [RFC 13/38] drm/i915: Make gen6_write_pdes gen6_map_page_tables Michel Thierry
2014-10-08 14:04 ` Daniel Vetter
2014-10-07 17:11 ` [RFC 14/38] drm/i915: Range clearing is PPGTT agnostic Michel Thierry
2014-10-07 17:11 ` [RFC 15/38] drm/i915: Page table helpers, and define renames Michel Thierry
2014-10-07 17:11 ` [RFC 16/38] drm/i915: construct page table abstractions Michel Thierry
2014-10-07 17:11 ` [RFC 17/38] drm/i915: Complete page table structures Michel Thierry
2014-10-07 17:11 ` [RFC 18/38] drm/i915: Create page table allocators Michel Thierry
2014-10-07 17:11 ` [RFC 19/38] drm/i915: Generalize GEN6 mapping Michel Thierry
2014-10-07 17:11 ` [RFC 20/38] drm/i915: Clean up pagetable DMA map & unmap Michel Thierry
2014-10-07 17:11 ` [RFC 21/38] drm/i915: Always dma map page table allocations Michel Thierry
2014-10-07 17:11 ` [RFC 22/38] drm/i915: Consolidate dma mappings Michel Thierry
2014-10-07 17:11 ` [RFC 23/38] drm/i915: Always dma map page directory allocations Michel Thierry
2014-10-07 17:11 ` [RFC 24/38] drm/i915: Track GEN6 page table usage Michel Thierry
2014-10-07 17:11 ` [RFC 25/38] drm/i915: Extract context switch skip logic Michel Thierry
2014-10-07 17:11 ` [RFC 26/38] drm/i915: Track page table reload need Michel Thierry
2014-10-07 17:11 ` [RFC 27/38] drm/i915: Initialize all contexts Michel Thierry
2014-10-07 17:11 ` [RFC 28/38] drm/i915: Finish gen6/7 dynamic page table allocation Michel Thierry
2014-10-07 17:11 ` [RFC 29/38] drm/i915/bdw: Use dynamic allocation idioms on free Michel Thierry
2014-10-07 17:11 ` [RFC 30/38] drm/i915/bdw: pagedirs rework allocation Michel Thierry
2014-10-07 17:11 ` [RFC 31/38] drm/i915/bdw: pagetable allocation rework Michel Thierry
2014-10-07 17:11 ` [RFC 32/38] drm/i915/bdw: Make the pdp switch a bit less hacky Michel Thierry
2014-10-07 17:11 ` [RFC 33/38] drm/i915: num_pd_pages/num_pd_entries isn't useful Michel Thierry
2014-10-07 17:11 ` [RFC 34/38] drm/i915: Extract PPGTT param from pagedir alloc Michel Thierry
2014-10-07 17:11 ` [RFC 35/38] drm/i915/bdw: Split out mappings Michel Thierry
2014-10-07 17:11 ` [RFC 36/38] drm/i915/bdw: begin bitmap tracking Michel Thierry
2014-10-07 17:11 ` [RFC 37/38] drm/i915/bdw: Dynamic page table allocations Michel Thierry
2014-10-07 17:11 ` [RFC 38/38] drm/i915/bdw: Dynamic page table allocations in lrc mode Michel Thierry
2014-10-08 7:13 ` [RFC 00/38] PPGTT dynamic page allocations Chris Wilson
2014-11-04 12:44 ` Daniel Vetter
2014-11-04 13:01 ` Chris Wilson
2014-11-05 9:19 ` Daniel Vetter
2014-11-05 9:50 ` Chris Wilson
2014-11-05 9:58 ` Chris Wilson
2014-11-04 12:54 ` Daniel Vetter
2014-11-04 16:29 ` Michel Thierry
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox