* [PATCH v6 00/19] 48-bit PPGTT
@ 2015-07-29 16:23 Michel Thierry
2015-07-29 16:23 ` [PATCH v6 01/19] drm/i915: Remove unnecessary gen8_clamp_pd Michel Thierry
` (20 more replies)
0 siblings, 21 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:23 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
This clean-up version delays the 48-bit work to later patches and includes
more review comments from Akash and Chris. The first 5 patches prepare the
dynamic page allocation code to handle independent pdps, but no specific
code for 48-bit mode is added before the 5th patch.
In order expand the GPU address space, a 4th level translation is added,
the Page Map Level 4 (PML4). This PML4 has 512 PML4 Entries (PML4E),
PML4[0-511], each pointing to a PDP. All the existing "dynamic alloc
ppgtt" functions are used, only adding the 4th level changes. I also
updated some remaining variables that were 32b only.
There are 2 hardware workarounds needed to allow correct operation with
48b addresses (Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset).
A flag (EXEC_OBJECT_SUPPORTS_48B_ADDRESS) will indicate if a given object can
be allocated outside the first 4 PDPs; if not, the end range is forced to 4GB.
Also, more objects now use the DRM_MM_CREATE_TOP flag. To maintain
compatibility, in libdrm I added a new drm_intel_bo_emit_reloc_48bit function
that will flag these objects, while the existing drm_intel_bo_emit_reloc
clears it.
Finally, this feature is only available in BDW and Gen9, requires LRC
submission mode (execlists) and it can be detected by i915.enable_ppgtt=3.
Also note that this expanded address space is only available for full
PPGTT, aliasing PPGTT and Global GTT remain 32-bit.
I'll resend the userland patches (libdrm/mesa) in a different patchset, there
haven't been changes on them, but they require a rebase. I will also expand the
ppgtt igt test per Chris suggestions.
Michel Thierry (19):
drm/i915: Remove unnecessary gen8_clamp_pd
drm/i915/gen8: Make pdp allocation more dynamic
drm/i915/gen8: Abstract PDP usage
drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
drm/i915/gen8: Add dynamic page trace events
drm/i915/gen8: Add PML4 structure
drm/i915/gen8: implement alloc/free for 4lvl
drm/i915/gen8: Add 4 level switching infrastructure and lrc support
drm/i915/gen8: Pass sg_iter through pte inserts
drm/i915/gen8: Add 4 level support in insert_entries and clear_range
drm/i915/gen8: Initialize PDPs and PML4
drm/i915: Expand error state's address width to 64b
drm/i915/gen8: Add ppgtt info and debug_dump
drm/i915: object size needs to be u64
drm/i915: batch_obj vm offset must be u64
drm/i915/userptr: Kill user_size limit check
drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
drm/i915/gen8: Flip the 48b switch
drm/i915: Save some page table setup on repeated binds
drivers/gpu/drm/i915/i915_debugfs.c | 18 +-
drivers/gpu/drm/i915/i915_drv.h | 11 +-
drivers/gpu/drm/i915/i915_gem.c | 30 +-
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 +
drivers/gpu/drm/i915/i915_gem_gtt.c | 665 ++++++++++++++++++++++++-----
drivers/gpu/drm/i915/i915_gem_gtt.h | 64 ++-
drivers/gpu/drm/i915/i915_gem_userptr.c | 4 -
drivers/gpu/drm/i915/i915_gpu_error.c | 24 +-
drivers/gpu/drm/i915/i915_params.c | 2 +-
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/i915_trace.h | 32 +-
drivers/gpu/drm/i915/intel_lrc.c | 60 ++-
include/uapi/drm/i915_drm.h | 3 +-
13 files changed, 747 insertions(+), 180 deletions(-)
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* [PATCH v6 01/19] drm/i915: Remove unnecessary gen8_clamp_pd
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
@ 2015-07-29 16:23 ` Michel Thierry
2015-07-30 3:06 ` Goel, Akash
2015-07-29 16:23 ` [PATCH v6 02/19] drm/i915/gen8: Make pdp allocation more dynamic Michel Thierry
` (19 subsequent siblings)
20 siblings, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:23 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
gen8_clamp_pd clamps to the next page directory boundary, but the macro
gen8_for_each_pde already has a check to stop at the page directory
boundary.
Furthermore, i915_pte_count also restricts to the next page table
boundary.
v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
drivers/gpu/drm/i915/i915_gem_gtt.h | 11 -----------
2 files changed, 1 insertion(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index c2a291e..189572d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -955,7 +955,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
gen8_pde_t *const page_directory = kmap_px(pd);
struct i915_page_table *pt;
- uint64_t pd_len = gen8_clamp_pd(start, length);
+ uint64_t pd_len = length;
uint64_t pd_start = start;
uint32_t pde;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index e1cfa29..d5bf953 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -444,17 +444,6 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
temp = min(temp, length), \
start += temp, length -= temp)
-/* Clamp length to the next page_directory boundary */
-static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
-{
- uint64_t next_pd = ALIGN(start + 1, 1 << GEN8_PDPE_SHIFT);
-
- if (next_pd > (start + length))
- return length;
-
- return next_pd - start;
-}
-
static inline uint32_t gen8_pte_index(uint64_t address)
{
return i915_pte_index(address, GEN8_PDE_SHIFT);
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 02/19] drm/i915/gen8: Make pdp allocation more dynamic
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
2015-07-29 16:23 ` [PATCH v6 01/19] drm/i915: Remove unnecessary gen8_clamp_pd Michel Thierry
@ 2015-07-29 16:23 ` Michel Thierry
2015-07-30 3:18 ` Goel, Akash
2015-08-05 15:31 ` Daniel Vetter
2015-07-29 16:23 ` [PATCH v6 03/19] drm/i915/gen8: Abstract PDP usage Michel Thierry
` (18 subsequent siblings)
20 siblings, 2 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:23 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
This transitional patch doesn't do much for the existing code. However,
it should make upcoming patches to use the full 48b address space a bit
easier.
v2: Renamed pdp_free to be similar to pd/pt (unmap_and_free_pdp).
v3: To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.
v4: Rebase after s/page_tables/page_table/, added extra information
about 4-level page table formats and use IS_ENABLED macro.
v5: Check CONFIG_X86_64 instead of CONFIG_64BIT.
v6: Rebase after Mika's ppgtt cleanup / scratch merge patch series, and
follow
his nomenclature in pdp functions (there is no alloc_pdp yet).
v7: Rebase after merged version of Mika's ppgtt cleanup patch series.
v8: Rebase after final merged version of Mika's ppgtt/scratch patches.
v9: Introduce PML4 (and 48-bit checks) until next patch (Akash).
v10: Also use test_bit to detect when pd/pt are already allocated (Akash)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 86 +++++++++++++++++++++++++++++--------
drivers/gpu/drm/i915/i915_gem_gtt.h | 17 +++++---
2 files changed, 80 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 189572d..28f3227 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -522,6 +522,43 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
fill_px(vm->dev, pd, scratch_pde);
}
+static int __pdp_init(struct drm_device *dev,
+ struct i915_page_directory_pointer *pdp)
+{
+ size_t pdpes = I915_PDPES_PER_PDP(dev);
+
+ pdp->used_pdpes = kcalloc(BITS_TO_LONGS(pdpes),
+ sizeof(unsigned long),
+ GFP_KERNEL);
+ if (!pdp->used_pdpes)
+ return -ENOMEM;
+
+ pdp->page_directory = kcalloc(pdpes, sizeof(*pdp->page_directory),
+ GFP_KERNEL);
+ if (!pdp->page_directory) {
+ kfree(pdp->used_pdpes);
+ /* the PDP might be the statically allocated top level. Keep it
+ * as clean as possible */
+ pdp->used_pdpes = NULL;
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static void __pdp_fini(struct i915_page_directory_pointer *pdp)
+{
+ kfree(pdp->used_pdpes);
+ kfree(pdp->page_directory);
+ pdp->page_directory = NULL;
+}
+
+static void free_pdp(struct drm_device *dev,
+ struct i915_page_directory_pointer *pdp)
+{
+ __pdp_fini(pdp);
+}
+
/* Broadwell Page Directory Pointer Descriptors */
static int gen8_write_pdp(struct drm_i915_gem_request *req,
unsigned entry,
@@ -720,7 +757,8 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
container_of(vm, struct i915_hw_ppgtt, base);
int i;
- for_each_set_bit(i, ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES) {
+ for_each_set_bit(i, ppgtt->pdp.used_pdpes,
+ I915_PDPES_PER_PDP(ppgtt->base.dev)) {
if (WARN_ON(!ppgtt->pdp.page_directory[i]))
continue;
@@ -729,6 +767,7 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
}
+ free_pdp(ppgtt->base.dev, &ppgtt->pdp);
gen8_free_scratch(vm);
}
@@ -763,7 +802,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
gen8_for_each_pde(pt, pd, start, length, temp, pde) {
/* Don't reallocate page tables */
- if (pt) {
+ if (test_bit(pde, pd->used_pdes)) {
/* Scratch is never allocated this way */
WARN_ON(pt == ppgtt->base.scratch_pt);
continue;
@@ -820,11 +859,12 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
struct i915_page_directory *pd;
uint64_t temp;
uint32_t pdpe;
+ uint32_t pdpes = I915_PDPES_PER_PDP(dev);
- WARN_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
+ WARN_ON(!bitmap_empty(new_pds, pdpes));
gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
- if (pd)
+ if (test_bit(pdpe, pdp->used_pdpes))
continue;
pd = alloc_pd(dev);
@@ -839,18 +879,19 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
return 0;
unwind_out:
- for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
+ for_each_set_bit(pdpe, new_pds, pdpes)
free_pd(dev, pdp->page_directory[pdpe]);
return -ENOMEM;
}
static void
-free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
+ uint32_t pdpes)
{
int i;
- for (i = 0; i < GEN8_LEGACY_PDPES; i++)
+ for (i = 0; i < pdpes; i++)
kfree(new_pts[i]);
kfree(new_pts);
kfree(new_pds);
@@ -861,23 +902,24 @@ free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
*/
static
int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
- unsigned long ***new_pts)
+ unsigned long ***new_pts,
+ uint32_t pdpes)
{
int i;
unsigned long *pds;
unsigned long **pts;
- pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES), sizeof(unsigned long), GFP_KERNEL);
+ pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL);
if (!pds)
return -ENOMEM;
- pts = kcalloc(GEN8_LEGACY_PDPES, sizeof(unsigned long *), GFP_KERNEL);
+ pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL);
if (!pts) {
kfree(pds);
return -ENOMEM;
}
- for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+ for (i = 0; i < pdpes; i++) {
pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES),
sizeof(unsigned long), GFP_KERNEL);
if (!pts[i])
@@ -890,7 +932,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
return 0;
err_out:
- free_gen8_temp_bitmaps(pds, pts);
+ free_gen8_temp_bitmaps(pds, pts, pdpes);
return -ENOMEM;
}
@@ -916,6 +958,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
const uint64_t orig_length = length;
uint64_t temp;
uint32_t pdpe;
+ uint32_t pdpes = I915_PDPES_PER_PDP(ppgtt->base.dev);
int ret;
/* Wrap is never okay since we can only represent 48b, and we don't
@@ -927,7 +970,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
if (WARN_ON(start + length > ppgtt->base.total))
return -ENODEV;
- ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
+ ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables, pdpes);
if (ret)
return ret;
@@ -935,7 +978,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
new_page_dirs);
if (ret) {
- free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+ free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
return ret;
}
@@ -989,7 +1032,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
__set_bit(pdpe, ppgtt->pdp.used_pdpes);
}
- free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+ free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
mark_tlbs_dirty(ppgtt);
return 0;
@@ -999,10 +1042,10 @@ err_out:
free_pt(vm->dev, ppgtt->pdp.page_directory[pdpe]->page_table[temp]);
}
- for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
+ for_each_set_bit(pdpe, new_page_dirs, pdpes)
free_pd(vm->dev, ppgtt->pdp.page_directory[pdpe]);
- free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+ free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
mark_tlbs_dirty(ppgtt);
return ret;
}
@@ -1040,7 +1083,16 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->switch_mm = gen8_mm_switch;
+ ret = __pdp_init(false, &ppgtt->pdp);
+
+ if (ret)
+ goto free_scratch;
+
return 0;
+
+free_scratch:
+ gen8_free_scratch(&ppgtt->base);
+ return ret;
}
static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index d5bf953..87e389c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -98,6 +98,9 @@ typedef uint64_t gen8_pde_t;
#define GEN8_LEGACY_PDPES 4
#define GEN8_PTES I915_PTES(sizeof(gen8_pte_t))
+/* FIXME: Next patch will use dev */
+#define I915_PDPES_PER_PDP(dev) GEN8_LEGACY_PDPES
+
#define PPAT_UNCACHED_INDEX (_PAGE_PWT | _PAGE_PCD)
#define PPAT_CACHED_PDE_INDEX 0 /* WB LLC */
#define PPAT_CACHED_INDEX _PAGE_PAT /* WB LLCeLLC */
@@ -241,9 +244,10 @@ struct i915_page_directory {
};
struct i915_page_directory_pointer {
- /* struct page *page; */
- DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
- struct i915_page_directory *page_directory[GEN8_LEGACY_PDPES];
+ struct i915_page_dma base;
+
+ unsigned long *used_pdpes;
+ struct i915_page_directory **page_directory;
};
struct i915_address_space {
@@ -436,9 +440,10 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
temp = min(temp, length), \
start += temp, length -= temp)
-#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter) \
- for (iter = gen8_pdpe_index(start); \
- pd = (pdp)->page_directory[iter], length > 0 && iter < GEN8_LEGACY_PDPES; \
+#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter) \
+ for (iter = gen8_pdpe_index(start); \
+ pd = (pdp)->page_directory[iter], \
+ length > 0 && (iter < I915_PDPES_PER_PDP(dev)); \
iter++, \
temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start, \
temp = min(temp, length), \
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 03/19] drm/i915/gen8: Abstract PDP usage
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
2015-07-29 16:23 ` [PATCH v6 01/19] drm/i915: Remove unnecessary gen8_clamp_pd Michel Thierry
2015-07-29 16:23 ` [PATCH v6 02/19] drm/i915/gen8: Make pdp allocation more dynamic Michel Thierry
@ 2015-07-29 16:23 ` Michel Thierry
2015-07-30 10:02 ` [PATCH v7 " Michel Thierry
2015-07-29 16:23 ` [PATCH v6 04/19] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT Michel Thierry
` (17 subsequent siblings)
20 siblings, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:23 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
Up until now, ppgtt->pdp has always been the root of our page tables.
Legacy 32b addresses acted like it had 1 PDP with 4 PDPEs.
In preparation for 4 level page tables, we need to stop use ppgtt->pdp
directly unless we know it's what we want. The future structure will use
ppgtt->pml4 for the top level, and the pdp is just one of the entries
being pointed to by a pml4e. Places where this is not yet possible use a
temporal pdp local variable.
v2: Updated after dynamic page allocation changes.
v3: Rebase after s/page_tables/page_table/.
v4: Rebase after changes in "Dynamic page table allocations" patch.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v6: Rebase after final merged version of Mika's ppgtt/scratch patches.
v7: Keep pagetable map in-line (and avoid unnecessary for_each_pde
loops), remove redundant ppgtt pointer in _alloc_pagetabs (Akash)
v8: Fix text indentation in _alloc_pagetabs/page_directories (Chris)
v9: Defer gen8_alloc_va_range_4lvl definition until 4lvl is implemented,
clean-up gen8_ppgtt_cleanup [pun intended] (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 84 +++++++++++++++++++------------------
1 file changed, 44 insertions(+), 40 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 28f3227..bd56979 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -607,6 +607,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
+ struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
gen8_pte_t *pt_vaddr, scratch_pte;
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -621,10 +622,10 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
struct i915_page_directory *pd;
struct i915_page_table *pt;
- if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
+ if (WARN_ON(!pdp->page_directory[pdpe]))
break;
- pd = ppgtt->pdp.page_directory[pdpe];
+ pd = pdp->page_directory[pdpe];
if (WARN_ON(!pd->page_table[pde]))
break;
@@ -662,6 +663,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
+ struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
gen8_pte_t *pt_vaddr;
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -675,7 +677,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
break;
if (pt_vaddr == NULL) {
- struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
+ struct i915_page_directory *pd = pdp->page_directory[pdpe];
struct i915_page_table *pt = pd->page_table[pde];
pt_vaddr = kmap_px(pt);
}
@@ -755,28 +757,29 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
+ struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+ struct drm_device *dev = ppgtt->base.dev;
int i;
- for_each_set_bit(i, ppgtt->pdp.used_pdpes,
- I915_PDPES_PER_PDP(ppgtt->base.dev)) {
- if (WARN_ON(!ppgtt->pdp.page_directory[i]))
+ for_each_set_bit(i, pdp->used_pdpes, I915_PDPES_PER_PDP(dev)) {
+ if (WARN_ON(!pdp->page_directory[i]))
continue;
- gen8_free_page_tables(ppgtt->base.dev,
- ppgtt->pdp.page_directory[i]);
- free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
+ gen8_free_page_tables(dev, pdp->page_directory[i]);
+ free_pd(dev, pdp->page_directory[i]);
}
- free_pdp(ppgtt->base.dev, &ppgtt->pdp);
+ free_pdp(dev, pdp);
+
gen8_free_scratch(vm);
}
/**
* gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
- * @ppgtt: Master ppgtt structure.
- * @pd: Page directory for this address range.
+ * @vm: Master vm structure.
+ * @pd: Page directory for this address range.
* @start: Starting virtual address to begin allocations.
- * @length Size of the allocations.
+ * @length: Size of the allocations.
* @new_pts: Bitmap set by function with new allocations. Likely used by the
* caller to free on error.
*
@@ -789,13 +792,13 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
*
* Return: 0 if success; negative error code otherwise.
*/
-static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
+static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
struct i915_page_directory *pd,
uint64_t start,
uint64_t length,
unsigned long *new_pts)
{
- struct drm_device *dev = ppgtt->base.dev;
+ struct drm_device *dev = vm->dev;
struct i915_page_table *pt;
uint64_t temp;
uint32_t pde;
@@ -804,7 +807,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
/* Don't reallocate page tables */
if (test_bit(pde, pd->used_pdes)) {
/* Scratch is never allocated this way */
- WARN_ON(pt == ppgtt->base.scratch_pt);
+ WARN_ON(pt == vm->scratch_pt);
continue;
}
@@ -812,7 +815,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
if (IS_ERR(pt))
goto unwind_out;
- gen8_initialize_pt(&ppgtt->base, pt);
+ gen8_initialize_pt(vm, pt);
pd->page_table[pde] = pt;
__set_bit(pde, new_pts);
}
@@ -828,11 +831,11 @@ unwind_out:
/**
* gen8_ppgtt_alloc_page_directories() - Allocate page directories for VA range.
- * @ppgtt: Master ppgtt structure.
+ * @vm: Master vm structure.
* @pdp: Page directory pointer for this address range.
* @start: Starting virtual address to begin allocations.
- * @length Size of the allocations.
- * @new_pds Bitmap set by function with new allocations. Likely used by the
+ * @length: Size of the allocations.
+ * @new_pds: Bitmap set by function with new allocations. Likely used by the
* caller to free on error.
*
* Allocate the required number of page directories starting at the pde index of
@@ -849,13 +852,14 @@ unwind_out:
*
* Return: 0 if success; negative error code otherwise.
*/
-static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
- struct i915_page_directory_pointer *pdp,
- uint64_t start,
- uint64_t length,
- unsigned long *new_pds)
+static int
+gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
+ struct i915_page_directory_pointer *pdp,
+ uint64_t start,
+ uint64_t length,
+ unsigned long *new_pds)
{
- struct drm_device *dev = ppgtt->base.dev;
+ struct drm_device *dev = vm->dev;
struct i915_page_directory *pd;
uint64_t temp;
uint32_t pdpe;
@@ -871,7 +875,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
if (IS_ERR(pd))
goto unwind_out;
- gen8_initialize_pd(&ppgtt->base, pd);
+ gen8_initialize_pd(vm, pd);
pdp->page_directory[pdpe] = pd;
__set_bit(pdpe, new_pds);
}
@@ -947,18 +951,19 @@ static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
}
static int gen8_alloc_va_range(struct i915_address_space *vm,
- uint64_t start,
- uint64_t length)
+ uint64_t start, uint64_t length)
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
unsigned long *new_page_dirs, **new_page_tables;
+ struct drm_device *dev = vm->dev;
+ struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
struct i915_page_directory *pd;
const uint64_t orig_start = start;
const uint64_t orig_length = length;
uint64_t temp;
uint32_t pdpe;
- uint32_t pdpes = I915_PDPES_PER_PDP(ppgtt->base.dev);
+ uint32_t pdpes = I915_PDPES_PER_PDP(dev);
int ret;
/* Wrap is never okay since we can only represent 48b, and we don't
@@ -967,7 +972,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
if (WARN_ON(start + length < start))
return -ENODEV;
- if (WARN_ON(start + length > ppgtt->base.total))
+ if (WARN_ON(start + length > vm->total))
return -ENODEV;
ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables, pdpes);
@@ -975,16 +980,16 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
return ret;
/* Do the allocations first so we can easily bail out */
- ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
- new_page_dirs);
+ ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length,
+ new_page_dirs);
if (ret) {
free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
return ret;
}
/* For every page directory referenced, allocate page tables */
- gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
- ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
+ gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
+ ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length,
new_page_tables[pdpe]);
if (ret)
goto err_out;
@@ -995,7 +1000,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
/* Allocations have completed successfully, so set the bitmaps, and do
* the mappings. */
- gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+ gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
gen8_pde_t *const page_directory = kmap_px(pd);
struct i915_page_table *pt;
uint64_t pd_len = length;
@@ -1028,8 +1033,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
}
kunmap_px(ppgtt, page_directory);
-
- __set_bit(pdpe, ppgtt->pdp.used_pdpes);
+ __set_bit(pdpe, pdp->used_pdpes);
}
free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1039,11 +1043,11 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
err_out:
while (pdpe--) {
for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES)
- free_pt(vm->dev, ppgtt->pdp.page_directory[pdpe]->page_table[temp]);
+ free_pt(dev, pdp->page_directory[pdpe]->page_table[temp]);
}
for_each_set_bit(pdpe, new_page_dirs, pdpes)
- free_pd(vm->dev, ppgtt->pdp.page_directory[pdpe]);
+ free_pd(dev, pdp->page_directory[pdpe]);
free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
mark_tlbs_dirty(ppgtt);
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 04/19] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (2 preceding siblings ...)
2015-07-29 16:23 ` [PATCH v6 03/19] drm/i915/gen8: Abstract PDP usage Michel Thierry
@ 2015-07-29 16:23 ` Michel Thierry
2015-07-30 4:46 ` Goel, Akash
2015-07-30 10:02 ` [PATCH v7 " Michel Thierry
2015-07-29 16:23 ` [PATCH v6 05/19] drm/i915/gen8: Add dynamic page trace events Michel Thierry
` (16 subsequent siblings)
20 siblings, 2 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:23 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
The insert_entries function was the function used to write PTEs. For the
PPGTT it was "hardcoded" to only understand two level page tables, which
was the case for GEN7. We can reuse this for 4 level page tables, and
remove the concept of insert_entries, which was never viable past 2
level page tables anyway, but it requires a bit of rework to make the
function a bit more generic.
This patch begins the generalization work, and it will be heavily used
upon when the 48b code is complete. The patch series attempts to make
each function which touches a part of code specific to the page table
level and here is no exception.
v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v3: Rebase after final merged version of Mika's ppgtt/scratch patches.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 52 +++++++++++++++++++++++++++----------
1 file changed, 39 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bd56979..f338a13 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -600,24 +600,21 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
return 0;
}
-static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
- uint64_t start,
- uint64_t length,
- bool use_scratch)
+static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
+ struct i915_page_directory_pointer *pdp,
+ uint64_t start,
+ uint64_t length,
+ gen8_pte_t scratch_pte)
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
- struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
- gen8_pte_t *pt_vaddr, scratch_pte;
+ gen8_pte_t *pt_vaddr;
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
unsigned num_entries = length >> PAGE_SHIFT;
unsigned last_pte, i;
- scratch_pte = gen8_pte_encode(px_dma(ppgtt->base.scratch_page),
- I915_CACHE_LLC, use_scratch);
-
while (num_entries) {
struct i915_page_directory *pd;
struct i915_page_table *pt;
@@ -656,14 +653,30 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
}
}
-static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
- struct sg_table *pages,
- uint64_t start,
- enum i915_cache_level cache_level, u32 unused)
+static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
+ uint64_t start,
+ uint64_t length,
+ bool use_scratch)
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+
+ gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
+ I915_CACHE_LLC, use_scratch);
+
+ gen8_ppgtt_clear_pte_range(vm, pdp, start, length, scratch_pte);
+}
+
+static void
+gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
+ struct i915_page_directory_pointer *pdp,
+ struct sg_table *pages,
+ uint64_t start,
+ enum i915_cache_level cache_level)
+{
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
gen8_pte_t *pt_vaddr;
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -700,6 +713,19 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
kunmap_px(ppgtt, pt_vaddr);
}
+static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
+ struct sg_table *pages,
+ uint64_t start,
+ enum i915_cache_level cache_level,
+ u32 unused)
+{
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+ struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+
+ gen8_ppgtt_insert_pte_entries(vm, pdp, pages, start, cache_level);
+}
+
static void gen8_free_page_tables(struct drm_device *dev,
struct i915_page_directory *pd)
{
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 05/19] drm/i915/gen8: Add dynamic page trace events
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (3 preceding siblings ...)
2015-07-29 16:23 ` [PATCH v6 04/19] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT Michel Thierry
@ 2015-07-29 16:23 ` Michel Thierry
2015-07-30 3:48 ` Goel, Akash
2015-07-29 16:23 ` [PATCH v6 06/19] drm/i915/gen8: Add PML4 structure Michel Thierry
` (15 subsequent siblings)
20 siblings, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:23 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
The dynamic page allocation patch series added it for GEN6, this patch
adds them for GEN8.
v2: Consolidate pagetable/page_directory events
v3: Multiple rebases.
v4: Rebase after s/page_tables/page_table/.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v6: Rebase after gen8_map_pagetable_range removal.
v7: Use generic page name (px) in DECLARE_EVENT_CLASS (Akash)
v8: Defer define of i915_page_directory_pointer_entry_alloc (Akash)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++++++
drivers/gpu/drm/i915/i915_trace.h | 24 ++++++++++++++++--------
2 files changed, 22 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index f338a13..8c1db92 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -844,6 +844,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
gen8_initialize_pt(vm, pt);
pd->page_table[pde] = pt;
__set_bit(pde, new_pts);
+ trace_i915_page_table_entry_alloc(vm, pde, start, GEN8_PDE_SHIFT);
}
return 0;
@@ -904,6 +905,7 @@ gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
gen8_initialize_pd(vm, pd);
pdp->page_directory[pdpe] = pd;
__set_bit(pdpe, new_pds);
+ trace_i915_page_directory_entry_alloc(vm, pdpe, start, GEN8_PDPE_SHIFT);
}
return 0;
@@ -1053,6 +1055,10 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
/* Map the PDE to the page table */
page_directory[pde] = gen8_pde_encode(px_dma(pt),
I915_CACHE_LLC);
+ trace_i915_page_table_entry_map(&ppgtt->base, pde, pt,
+ gen8_pte_index(start),
+ gen8_pte_count(start, length),
+ GEN8_PTES);
/* NB: We haven't yet mapped ptes to pages. At this
* point we're still relying on insert_entries() */
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 2f34c47..f230d76 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -186,33 +186,41 @@ DEFINE_EVENT(i915_va, i915_va_alloc,
TP_ARGS(vm, start, length, name)
);
-DECLARE_EVENT_CLASS(i915_page_table_entry,
- TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
- TP_ARGS(vm, pde, start, pde_shift),
+DECLARE_EVENT_CLASS(i915_px_entry,
+ TP_PROTO(struct i915_address_space *vm, u32 px, u64 start, u64 px_shift),
+ TP_ARGS(vm, px, start, px_shift),
TP_STRUCT__entry(
__field(struct i915_address_space *, vm)
- __field(u32, pde)
+ __field(u32, px)
__field(u64, start)
__field(u64, end)
),
TP_fast_assign(
__entry->vm = vm;
- __entry->pde = pde;
+ __entry->px = px;
__entry->start = start;
- __entry->end = ((start + (1ULL << pde_shift)) & ~((1ULL << pde_shift)-1)) - 1;
+ __entry->end = ((start + (1ULL << px_shift)) & ~((1ULL << px_shift)-1)) - 1;
),
TP_printk("vm=%p, pde=%d (0x%llx-0x%llx)",
- __entry->vm, __entry->pde, __entry->start, __entry->end)
+ __entry->vm, __entry->px, __entry->start, __entry->end)
);
-DEFINE_EVENT(i915_page_table_entry, i915_page_table_entry_alloc,
+DEFINE_EVENT(i915_px_entry, i915_page_table_entry_alloc,
TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
TP_ARGS(vm, pde, start, pde_shift)
);
+DEFINE_EVENT_PRINT(i915_px_entry, i915_page_directory_entry_alloc,
+ TP_PROTO(struct i915_address_space *vm, u32 pdpe, u64 start, u64 pdpe_shift),
+ TP_ARGS(vm, pdpe, start, pdpe_shift),
+
+ TP_printk("vm=%p, pdpe=%d (0x%llx-0x%llx)",
+ __entry->vm, __entry->px, __entry->start, __entry->end)
+);
+
/* Avoid extra math because we only support two sizes. The format is defined by
* bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
#define TRACE_PT_SIZE(bits) \
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 06/19] drm/i915/gen8: Add PML4 structure
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (4 preceding siblings ...)
2015-07-29 16:23 ` [PATCH v6 05/19] drm/i915/gen8: Add dynamic page trace events Michel Thierry
@ 2015-07-29 16:23 ` Michel Thierry
2015-07-30 4:01 ` Goel, Akash
2015-07-30 10:04 ` [PATCH v7 " Michel Thierry
2015-07-29 16:23 ` [PATCH v6 07/19] drm/i915/gen8: implement alloc/free for 4lvl Michel Thierry
` (14 subsequent siblings)
20 siblings, 2 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:23 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
Introduces the Page Map Level 4 (PML4), ie. the new top level structure
of the page tables.
To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.
v2: Remove unnecessary CONFIG_X86_64 checks, ppgtt code is already
32/64-bit safe (Chris).
v3: Add goto free_scratch in temp 48-bit mode init code (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 3 ++-
drivers/gpu/drm/i915/i915_gem_gtt.c | 38 ++++++++++++++++++++++++-------------
drivers/gpu/drm/i915/i915_gem_gtt.h | 26 ++++++++++++++++++++-----
3 files changed, 48 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 40fea41..0b5cbe8 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2498,7 +2498,8 @@ struct drm_i915_cmd_table {
#define HAS_HW_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 6)
#define HAS_LOGICAL_RING_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 8)
#define USES_PPGTT(dev) (i915.enable_ppgtt)
-#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt == 2)
+#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt >= 2)
+#define USES_FULL_48BIT_PPGTT(dev) (i915.enable_ppgtt == 3)
#define HAS_OVERLAY(dev) (INTEL_INFO(dev)->has_overlay)
#define OVERLAY_NEEDS_PHYSICAL(dev) (INTEL_INFO(dev)->overlay_needs_physical)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 8c1db92..1a120a4 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -104,9 +104,12 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
{
bool has_aliasing_ppgtt;
bool has_full_ppgtt;
+ bool has_full_64bit_ppgtt;
has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
+ has_full_64bit_ppgtt = (IS_BROADWELL(dev) ||
+ INTEL_INFO(dev)->gen >= 9) && false; /* FIXME: 64b */
if (intel_vgpu_active(dev))
has_full_ppgtt = false; /* emulation is too hard */
@@ -125,6 +128,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
if (enable_ppgtt == 2 && has_full_ppgtt)
return 2;
+ if (enable_ppgtt == 3 && has_full_64bit_ppgtt)
+ return 3;
+
#ifdef CONFIG_INTEL_IOMMU
/* Disable ppgtt on SNB if VT-d is on. */
if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) {
@@ -557,6 +563,8 @@ static void free_pdp(struct drm_device *dev,
struct i915_page_directory_pointer *pdp)
{
__pdp_fini(pdp);
+ if (USES_FULL_48BIT_PPGTT(dev))
+ kfree(pdp);
}
/* Broadwell Page Directory Pointer Descriptors */
@@ -686,9 +694,6 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
pt_vaddr = NULL;
for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
- if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
- break;
-
if (pt_vaddr == NULL) {
struct i915_page_directory *pd = pdp->page_directory[pdpe];
struct i915_page_table *pt = pd->page_table[pde];
@@ -1102,14 +1107,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
return ret;
ppgtt->base.start = 0;
- ppgtt->base.total = 1ULL << 32;
- if (IS_ENABLED(CONFIG_X86_32))
- /* While we have a proliferation of size_t variables
- * we cannot represent the full ppgtt size on 32bit,
- * so limit it to the same size as the GGTT (currently
- * 2GiB).
- */
- ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
ppgtt->base.cleanup = gen8_ppgtt_cleanup;
ppgtt->base.allocate_va_range = gen8_alloc_va_range;
ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
@@ -1119,10 +1116,25 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->switch_mm = gen8_mm_switch;
- ret = __pdp_init(false, &ppgtt->pdp);
+ if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+ ret = __pdp_init(false, &ppgtt->pdp);
- if (ret)
+ if (ret)
+ goto free_scratch;
+
+ ppgtt->base.total = 1ULL << 32;
+ if (IS_ENABLED(CONFIG_X86_32))
+ /* While we have a proliferation of size_t variables
+ * we cannot represent the full ppgtt size on 32bit,
+ * so limit it to the same size as the GGTT (currently
+ * 2GiB).
+ */
+ ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
+ } else {
+ ppgtt->base.total = 1ULL << 48;
+ ret = -EPERM; /* Not yet implemented */
goto free_scratch;
+ }
return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 87e389c..04bc66f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -88,9 +88,17 @@ typedef uint64_t gen8_pde_t;
* PDPE | PDE | PTE | offset
* The difference as compared to normal x86 3 level page table is the PDPEs are
* programmed via register.
+ *
+ * GEN8 48b legacy style address is defined as a 4 level page table:
+ * 47:39 | 38:30 | 29:21 | 20:12 | 11:0
+ * PML4E | PDPE | PDE | PTE | offset
*/
+#define GEN8_PML4ES_PER_PML4 512
+#define GEN8_PML4E_SHIFT 39
#define GEN8_PDPE_SHIFT 30
-#define GEN8_PDPE_MASK 0x3
+/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
+ * tables */
+#define GEN8_PDPE_MASK 0x1ff
#define GEN8_PDE_SHIFT 21
#define GEN8_PDE_MASK 0x1ff
#define GEN8_PTE_SHIFT 12
@@ -98,8 +106,8 @@ typedef uint64_t gen8_pde_t;
#define GEN8_LEGACY_PDPES 4
#define GEN8_PTES I915_PTES(sizeof(gen8_pte_t))
-/* FIXME: Next patch will use dev */
-#define I915_PDPES_PER_PDP(dev) GEN8_LEGACY_PDPES
+#define I915_PDPES_PER_PDP(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
+ GEN8_PML4ES_PER_PML4 : GEN8_LEGACY_PDPES)
#define PPAT_UNCACHED_INDEX (_PAGE_PWT | _PAGE_PCD)
#define PPAT_CACHED_PDE_INDEX 0 /* WB LLC */
@@ -250,6 +258,13 @@ struct i915_page_directory_pointer {
struct i915_page_directory **page_directory;
};
+struct i915_pml4 {
+ struct i915_page_dma base;
+
+ DECLARE_BITMAP(used_pml4es, GEN8_PML4ES_PER_PML4);
+ struct i915_page_directory_pointer *pdps[GEN8_PML4ES_PER_PML4];
+};
+
struct i915_address_space {
struct drm_mm mm;
struct drm_device *dev;
@@ -345,8 +360,9 @@ struct i915_hw_ppgtt {
struct drm_mm_node node;
unsigned long pd_dirty_rings;
union {
- struct i915_page_directory_pointer pdp;
- struct i915_page_directory pd;
+ struct i915_pml4 pml4; /* GEN8+ & 48b PPGTT */
+ struct i915_page_directory_pointer pdp; /* GEN8+ */
+ struct i915_page_directory pd; /* GEN6-7 */
};
struct drm_i915_file_private *file_priv;
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 07/19] drm/i915/gen8: implement alloc/free for 4lvl
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (5 preceding siblings ...)
2015-07-29 16:23 ` [PATCH v6 06/19] drm/i915/gen8: Add PML4 structure Michel Thierry
@ 2015-07-29 16:23 ` Michel Thierry
2015-07-30 10:05 ` [PATCH v7 " Michel Thierry
2015-07-29 16:23 ` [PATCH v6 08/19] drm/i915/gen8: Add 4 level switching infrastructure and lrc support Michel Thierry
` (13 subsequent siblings)
20 siblings, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:23 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
PML4 has no special attributes, and there will always be a PML4.
So simply initialize it at creation, and destroy it at the end.
The code for 4lvl is able to call into the existing 3lvl page table code
to handle all of the lower levels.
v2: Return something at the end of gen8_alloc_va_range_4lvl to keep the
compiler happy. And define ret only in one place.
Updated gen8_ppgtt_unmap_pages and gen8_ppgtt_free to handle 4lvl.
v3: Use i915_dma_unmap_single instead of pci API. Fix a
couple of incorrect checks when unmapping pdp and pd pages (Akash).
v4: Call __pdp_fini also for 32b PPGTT. Clean up alloc_pdp param list.
v5: Prevent (harmless) out of range access in gen8_for_each_pml4e.
v6: Simplify alloc_vma_range_4lvl and gen8_ppgtt_init_common error
paths. (Akash)
v7: Rebase, s/gen8_ppgtt_free_*/gen8_ppgtt_cleanup_*/.
v8: Change location of pml4_init/fini. It will make next patches
cleaner.
v9: Rebase after Mika's ppgtt cleanup / scratch merge patch series, while
trying to reuse as much as possible for pdp alloc. pml4_init/fini
replaced by setup/cleanup_px macros.
v10: Rebase after Mika's merged ppgtt cleanup patch series.
v11: Rebase after final merged version of Mika's ppgtt/scratch
patches.
v12: Fix pdpe start value in trace (Akash)
v13: Define all 4lvl functions in this patch directly, instead of
previous patches, add i915_page_directory_pointer_entry_alloc here,
use test_bit to detect when pdp is already allocated (Akash).
v14: Move pdp allocation into a new gen8_ppgtt_alloc_page_dirpointers
function, as we do for pds and pts; move pd and pdp setup functions to
this patch (Akash)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 239 +++++++++++++++++++++++++++++++++---
drivers/gpu/drm/i915/i915_gem_gtt.h | 15 ++-
drivers/gpu/drm/i915/i915_trace.h | 8 ++
3 files changed, 245 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 1a120a4..4179b80 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -210,6 +210,9 @@ static gen8_pde_t gen8_pde_encode(const dma_addr_t addr,
return pde;
}
+#define gen8_pdpe_encode gen8_pde_encode
+#define gen8_pml4e_encode gen8_pde_encode
+
static gen6_pte_t snb_pte_encode(dma_addr_t addr,
enum i915_cache_level level,
bool valid, u32 unused)
@@ -559,12 +562,73 @@ static void __pdp_fini(struct i915_page_directory_pointer *pdp)
pdp->page_directory = NULL;
}
+static struct
+i915_page_directory_pointer *alloc_pdp(struct drm_device *dev)
+{
+ struct i915_page_directory_pointer *pdp;
+ int ret = -ENOMEM;
+
+ WARN_ON(!USES_FULL_48BIT_PPGTT(dev));
+
+ pdp = kzalloc(sizeof(*pdp), GFP_KERNEL);
+ if (!pdp)
+ return ERR_PTR(-ENOMEM);
+
+ ret = __pdp_init(dev, pdp);
+ if (ret)
+ goto fail_bitmap;
+
+ ret = setup_px(dev, pdp);
+ if (ret)
+ goto fail_page_m;
+
+ return pdp;
+
+fail_page_m:
+ __pdp_fini(pdp);
+fail_bitmap:
+ kfree(pdp);
+
+ return ERR_PTR(ret);
+}
+
static void free_pdp(struct drm_device *dev,
struct i915_page_directory_pointer *pdp)
{
__pdp_fini(pdp);
- if (USES_FULL_48BIT_PPGTT(dev))
+ if (USES_FULL_48BIT_PPGTT(dev)) {
+ cleanup_px(dev, pdp);
kfree(pdp);
+ }
+}
+
+static void
+gen8_setup_page_directory(struct i915_hw_ppgtt *ppgtt,
+ struct i915_page_directory_pointer *pdp,
+ struct i915_page_directory *pd,
+ int index)
+{
+ gen8_ppgtt_pdpe_t *page_directorypo;
+
+ if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
+ return;
+
+ page_directorypo = kmap_px(pdp);
+ page_directorypo[index] = gen8_pdpe_encode(px_dma(pd), I915_CACHE_LLC);
+ kunmap_px(ppgtt, page_directorypo);
+}
+
+static void
+gen8_setup_page_directory_pointer(struct i915_hw_ppgtt *ppgtt,
+ struct i915_pml4 *pml4,
+ struct i915_page_directory_pointer *pdp,
+ int index)
+{
+ gen8_ppgtt_pml4e_t *pagemap = kmap_px(pml4);
+
+ WARN_ON(!USES_FULL_48BIT_PPGTT(ppgtt->base.dev));
+ pagemap[index] = gen8_pml4e_encode(px_dma(pdp), I915_CACHE_LLC);
+ kunmap_px(ppgtt, pagemap);
}
/* Broadwell Page Directory Pointer Descriptors */
@@ -784,12 +848,9 @@ static void gen8_free_scratch(struct i915_address_space *vm)
free_scratch_page(dev, vm->scratch_page);
}
-static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
+static void gen8_ppgtt_cleanup_3lvl(struct drm_device *dev,
+ struct i915_page_directory_pointer *pdp)
{
- struct i915_hw_ppgtt *ppgtt =
- container_of(vm, struct i915_hw_ppgtt, base);
- struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
- struct drm_device *dev = ppgtt->base.dev;
int i;
for_each_set_bit(i, pdp->used_pdpes, I915_PDPES_PER_PDP(dev)) {
@@ -801,6 +862,31 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
}
free_pdp(dev, pdp);
+}
+
+static void gen8_ppgtt_cleanup_4lvl(struct i915_hw_ppgtt *ppgtt)
+{
+ int i;
+
+ for_each_set_bit(i, ppgtt->pml4.used_pml4es, GEN8_PML4ES_PER_PML4) {
+ if (WARN_ON(!ppgtt->pml4.pdps[i]))
+ continue;
+
+ gen8_ppgtt_cleanup_3lvl(ppgtt->base.dev, ppgtt->pml4.pdps[i]);
+ }
+
+ cleanup_px(ppgtt->base.dev, &ppgtt->pml4);
+}
+
+static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
+{
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+
+ if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
+ gen8_ppgtt_cleanup_3lvl(ppgtt->base.dev, &ppgtt->pdp);
+ else
+ gen8_ppgtt_cleanup_4lvl(ppgtt);
gen8_free_scratch(vm);
}
@@ -922,6 +1008,60 @@ unwind_out:
return -ENOMEM;
}
+/**
+ * gen8_ppgtt_alloc_page_dirpointers() - Allocate pdps for VA range.
+ * @vm: Master vm structure.
+ * @pml4: Page map level 4 for this address range.
+ * @start: Starting virtual address to begin allocations.
+ * @length: Size of the allocations.
+ * @new_pdps: Bitmap set by function with new allocations. Likely used by the
+ * caller to free on error.
+ *
+ * Allocate the required number of page directory pointers. Extremely similar to
+ * gen8_ppgtt_alloc_page_directories() and gen8_ppgtt_alloc_pagetabs().
+ * The main difference is here we are limited by the pml4 boundary (instead of
+ * the page directory pointer).
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int
+gen8_ppgtt_alloc_page_dirpointers(struct i915_address_space *vm,
+ struct i915_pml4 *pml4,
+ uint64_t start,
+ uint64_t length,
+ unsigned long *new_pdps)
+{
+ struct drm_device *dev = vm->dev;
+ struct i915_page_directory_pointer *pdp;
+ uint64_t temp;
+ uint32_t pml4e;
+
+ WARN_ON(!bitmap_empty(new_pdps, GEN8_PML4ES_PER_PML4));
+
+ gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+ if (!test_bit(pml4e, pml4->used_pml4es)) {
+ pdp = alloc_pdp(dev);
+ if (IS_ERR(pdp))
+ goto unwind_out;
+
+ pml4->pdps[pml4e] = pdp;
+ __set_bit(pml4e, new_pdps);
+ trace_i915_page_directory_pointer_entry_alloc(vm,
+ pml4e,
+ start,
+ GEN8_PML4E_SHIFT);
+ }
+ }
+
+ return 0;
+
+unwind_out:
+ for_each_set_bit(pml4e, new_pdps, GEN8_PML4ES_PER_PML4)
+ free_pdp(dev, pml4->pdps[pml4e]);
+
+ return -ENOMEM;
+}
+
static void
free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
uint32_t pdpes)
@@ -983,14 +1123,15 @@ static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
}
-static int gen8_alloc_va_range(struct i915_address_space *vm,
- uint64_t start, uint64_t length)
+static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
+ struct i915_page_directory_pointer *pdp,
+ uint64_t start,
+ uint64_t length)
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
unsigned long *new_page_dirs, **new_page_tables;
struct drm_device *dev = vm->dev;
- struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
struct i915_page_directory *pd;
const uint64_t orig_start = start;
const uint64_t orig_length = length;
@@ -1071,6 +1212,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
kunmap_px(ppgtt, page_directory);
__set_bit(pdpe, pdp->used_pdpes);
+ gen8_setup_page_directory(ppgtt, pdp, pd, pdpe);
}
free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1091,6 +1233,68 @@ err_out:
return ret;
}
+static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
+ struct i915_pml4 *pml4,
+ uint64_t start,
+ uint64_t length)
+{
+ DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4);
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+ struct i915_page_directory_pointer *pdp;
+ uint64_t temp, pml4e;
+ int ret = 0;
+
+ /* Do the pml4 allocations first, so we don't need to track the newly
+ * allocated tables below the pdp */
+ bitmap_zero(new_pdps, GEN8_PML4ES_PER_PML4);
+
+ /* The pagedirectory and pagetable allocations are done in the shared 3
+ * and 4 level code. Just allocate the pdps.
+ */
+ ret = gen8_ppgtt_alloc_page_dirpointers(vm, pml4, start, length,
+ new_pdps);
+ if (ret)
+ return ret;
+
+ WARN(bitmap_weight(new_pdps, GEN8_PML4ES_PER_PML4) > 2,
+ "The allocation has spanned more than 512GB. "
+ "It is highly likely this is incorrect.");
+
+ gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+ WARN_ON(!pdp);
+
+ ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
+ if (ret)
+ goto err_out;
+
+ gen8_setup_page_directory_pointer(ppgtt, pml4, pdp, pml4e);
+ }
+
+ bitmap_or(pml4->used_pml4es, new_pdps, pml4->used_pml4es,
+ GEN8_PML4ES_PER_PML4);
+
+ return 0;
+
+err_out:
+ for_each_set_bit(pml4e, new_pdps, GEN8_PML4ES_PER_PML4)
+ gen8_ppgtt_cleanup_3lvl(vm->dev, pml4->pdps[pml4e]);
+
+ return ret;
+}
+
+static int gen8_alloc_va_range(struct i915_address_space *vm,
+ uint64_t start, uint64_t length)
+{
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+
+ if (USES_FULL_48BIT_PPGTT(vm->dev))
+ return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+ else
+ return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
+}
+
/*
* GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
* with a net effect resembling a 2-level page table in normal x86 terms. Each
@@ -1116,9 +1320,14 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->switch_mm = gen8_mm_switch;
- if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
- ret = __pdp_init(false, &ppgtt->pdp);
+ if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+ ret = setup_px(ppgtt->base.dev, &ppgtt->pml4);
+ if (ret)
+ goto free_scratch;
+ ppgtt->base.total = 1ULL << 48;
+ } else {
+ ret = __pdp_init(false, &ppgtt->pdp);
if (ret)
goto free_scratch;
@@ -1130,10 +1339,10 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
* 2GiB).
*/
ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
- } else {
- ppgtt->base.total = 1ULL << 48;
- ret = -EPERM; /* Not yet implemented */
- goto free_scratch;
+
+ trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base,
+ 0, 0,
+ GEN8_PML4E_SHIFT);
}
return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 04bc66f..11d44b3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -39,6 +39,8 @@ struct drm_i915_file_private;
typedef uint32_t gen6_pte_t;
typedef uint64_t gen8_pte_t;
typedef uint64_t gen8_pde_t;
+typedef uint64_t gen8_ppgtt_pdpe_t;
+typedef uint64_t gen8_ppgtt_pml4e_t;
#define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
@@ -95,6 +97,7 @@ typedef uint64_t gen8_pde_t;
*/
#define GEN8_PML4ES_PER_PML4 512
#define GEN8_PML4E_SHIFT 39
+#define GEN8_PML4E_MASK (GEN8_PML4ES_PER_PML4 - 1)
#define GEN8_PDPE_SHIFT 30
/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
* tables */
@@ -465,6 +468,15 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
temp = min(temp, length), \
start += temp, length -= temp)
+#define gen8_for_each_pml4e(pdp, pml4, start, length, temp, iter) \
+ for (iter = gen8_pml4e_index(start); \
+ pdp = (pml4)->pdps[iter], \
+ length > 0 && iter < GEN8_PML4ES_PER_PML4; \
+ iter++, \
+ temp = ALIGN(start+1, 1ULL << GEN8_PML4E_SHIFT) - start, \
+ temp = min(temp, length), \
+ start += temp, length -= temp)
+
static inline uint32_t gen8_pte_index(uint64_t address)
{
return i915_pte_index(address, GEN8_PDE_SHIFT);
@@ -482,8 +494,7 @@ static inline uint32_t gen8_pdpe_index(uint64_t address)
static inline uint32_t gen8_pml4e_index(uint64_t address)
{
- WARN_ON(1); /* For 64B */
- return 0;
+ return (address >> GEN8_PML4E_SHIFT) & GEN8_PML4E_MASK;
}
static inline size_t gen8_pte_count(uint64_t address, uint64_t length)
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index f230d76..e6b5c74 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -221,6 +221,14 @@ DEFINE_EVENT_PRINT(i915_px_entry, i915_page_directory_entry_alloc,
__entry->vm, __entry->px, __entry->start, __entry->end)
);
+DEFINE_EVENT_PRINT(i915_px_entry, i915_page_directory_pointer_entry_alloc,
+ TP_PROTO(struct i915_address_space *vm, u32 pml4e, u64 start, u64 pml4e_shift),
+ TP_ARGS(vm, pml4e, start, pml4e_shift),
+
+ TP_printk("vm=%p, pml4e=%d (0x%llx-0x%llx)",
+ __entry->vm, __entry->px, __entry->start, __entry->end)
+);
+
/* Avoid extra math because we only support two sizes. The format is defined by
* bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
#define TRACE_PT_SIZE(bits) \
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 08/19] drm/i915/gen8: Add 4 level switching infrastructure and lrc support
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (6 preceding siblings ...)
2015-07-29 16:23 ` [PATCH v6 07/19] drm/i915/gen8: implement alloc/free for 4lvl Michel Thierry
@ 2015-07-29 16:23 ` Michel Thierry
2015-07-30 4:14 ` Goel, Akash
2015-07-30 10:06 ` [PATCH v7 " Michel Thierry
2015-07-29 16:23 ` [PATCH v6 09/19] drm/i915/gen8: Pass sg_iter through pte inserts Michel Thierry
` (12 subsequent siblings)
20 siblings, 2 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:23 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
In 64b (48bit canonical) PPGTT addressing, the PDP0 register contains
the base address to PML4, while the other PDP registers are ignored.
In LRC, the addressing mode must be specified in every context
descriptor, and the base address to PML4 is stored in the reg state.
v2: PML4 update in legacy context switch is left for historic reasons,
the preferred mode of operation is with lrc context based submission.
v3: s/gen8_map_page_directory/gen8_setup_page_directory and
s/gen8_map_page_directory_pointer/gen8_setup_page_directory_pointer.
Also, clflush will be needed for bxt. (Akash)
v4: Squashed lrc-specific code and use a macro to set PML4 register.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
PDP update in bb_start is only for legacy 32b mode.
v6: Rebase after final merged version of Mika's ppgtt/scratch
patches.
v7: There is no need to update the pml4 register value in
execlists_update_context. (Akash)
v8: Move pd and pdp setup functions to a previous patch, they do not
belong here. (Akash)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 17 +++++++----
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/intel_lrc.c | 60 ++++++++++++++++++++++++++-----------
3 files changed, 55 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 4179b80..c6c8af7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -656,8 +656,8 @@ static int gen8_write_pdp(struct drm_i915_gem_request *req,
return 0;
}
-static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
- struct drm_i915_gem_request *req)
+static int gen8_legacy_mm_switch(struct i915_hw_ppgtt *ppgtt,
+ struct drm_i915_gem_request *req)
{
int i, ret;
@@ -672,6 +672,12 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
return 0;
}
+static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
+ struct drm_i915_gem_request *req)
+{
+ return gen8_write_pdp(req, 0, px_dma(&ppgtt->pml4));
+}
+
static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
struct i915_page_directory_pointer *pdp,
uint64_t start,
@@ -1318,14 +1324,13 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->base.unbind_vma = ppgtt_unbind_vma;
ppgtt->base.bind_vma = ppgtt_bind_vma;
- ppgtt->switch_mm = gen8_mm_switch;
-
if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
ret = setup_px(ppgtt->base.dev, &ppgtt->pml4);
if (ret)
goto free_scratch;
ppgtt->base.total = 1ULL << 48;
+ ppgtt->switch_mm = gen8_48b_mm_switch;
} else {
ret = __pdp_init(false, &ppgtt->pdp);
if (ret)
@@ -1340,6 +1345,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
*/
ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
+ ppgtt->switch_mm = gen8_legacy_mm_switch;
trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base,
0, 0,
GEN8_PML4E_SHIFT);
@@ -1537,8 +1543,9 @@ static void gen8_ppgtt_enable(struct drm_device *dev)
int j;
for_each_ring(ring, dev_priv, j) {
+ u32 four_level = USES_FULL_48BIT_PPGTT(dev) ? GEN8_GFX_PPGTT_48B : 0;
I915_WRITE(RING_MODE_GEN7(ring),
- _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
+ _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE | four_level));
}
}
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 3a77678..5bd1b6a 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1670,6 +1670,7 @@ enum skl_disp_power_wells {
#define GFX_REPLAY_MODE (1<<11)
#define GFX_PSMI_GRANULARITY (1<<10)
#define GFX_PPGTT_ENABLE (1<<9)
+#define GEN8_GFX_PPGTT_48B (1<<7)
#define VLV_DISPLAY_BASE 0x180000
#define VLV_MIPI_BASE VLV_DISPLAY_BASE
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 99bba8e..0b65188 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -196,13 +196,21 @@
reg_state[CTX_PDP ## n ## _LDW+1] = lower_32_bits(_addr); \
}
+#define ASSIGN_CTX_PML4(ppgtt, reg_state) { \
+ reg_state[CTX_PDP0_UDW + 1] = upper_32_bits(px_dma(&ppgtt->pml4)); \
+ reg_state[CTX_PDP0_LDW + 1] = lower_32_bits(px_dma(&ppgtt->pml4)); \
+}
+
enum {
ADVANCED_CONTEXT = 0,
- LEGACY_CONTEXT,
+ LEGACY_32B_CONTEXT,
ADVANCED_AD_CONTEXT,
LEGACY_64B_CONTEXT
};
-#define GEN8_CTX_MODE_SHIFT 3
+#define GEN8_CTX_ADDRESSING_MODE_SHIFT 3
+#define GEN8_CTX_ADDRESSING_MODE(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
+ LEGACY_64B_CONTEXT :\
+ LEGACY_32B_CONTEXT)
enum {
FAULT_AND_HANG = 0,
FAULT_AND_HALT, /* Debug only */
@@ -273,7 +281,7 @@ static uint64_t execlists_ctx_descriptor(struct drm_i915_gem_request *rq)
WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
desc = GEN8_CTX_VALID;
- desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
+ desc |= GEN8_CTX_ADDRESSING_MODE(dev) << GEN8_CTX_ADDRESSING_MODE_SHIFT;
if (IS_GEN8(ctx_obj->base.dev))
desc |= GEN8_CTX_L3LLC_COHERENT;
desc |= GEN8_CTX_PRIVILEGE;
@@ -348,10 +356,12 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
reg_state[CTX_RING_TAIL+1] = rq->tail;
reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(rb_obj);
- /* True PPGTT with dynamic page allocation: update PDP registers and
- * point the unallocated PDPs to the scratch page
- */
- if (ppgtt) {
+ if (ppgtt && !USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+ /* True 32b PPGTT with dynamic page allocation: update PDP
+ * registers and point the unallocated PDPs to scratch page.
+ * PML4 is allocated during ppgtt init, so this is not needed
+ * in 48-bit mode.
+ */
ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
@@ -1512,12 +1522,15 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
* Ideally, we should set Force PD Restore in ctx descriptor,
* but we can't. Force Restore would be a second option, but
* it is unsafe in case of lite-restore (because the ctx is
- * not idle). */
+ * not idle). PML4 is allocated during ppgtt init so this is
+ * not needed in 48-bit.*/
if (req->ctx->ppgtt &&
(intel_ring_flag(req->ring) & req->ctx->ppgtt->pd_dirty_rings)) {
- ret = intel_logical_ring_emit_pdps(req);
- if (ret)
- return ret;
+ if (GEN8_CTX_ADDRESSING_MODE(req->i915) == LEGACY_32B_CONTEXT) {
+ ret = intel_logical_ring_emit_pdps(req);
+ if (ret)
+ return ret;
+ }
req->ctx->ppgtt->pd_dirty_rings &= ~intel_ring_flag(req->ring);
}
@@ -2198,13 +2211,24 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
- /* With dynamic page allocation, PDPs may not be allocated at this point,
- * Point the unallocated PDPs to the scratch page
- */
- ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
- ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
- ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
- ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
+ if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+ /* 64b PPGTT (48bit canonical)
+ * PDP0_DESCRIPTOR contains the base address to PML4 and
+ * other PDP Descriptors are ignored.
+ */
+ ASSIGN_CTX_PML4(ppgtt, reg_state);
+ } else {
+ /* 32b PPGTT
+ * PDP*_DESCRIPTOR contains the base address of space supported.
+ * With dynamic page allocation, PDPs may not be allocated at
+ * this point. Point the unallocated PDPs to the scratch page
+ */
+ ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
+ ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
+ ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
+ ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
+ }
+
if (ring->id == RCS) {
reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
reg_state[CTX_R_PWR_CLK_STATE] = GEN8_R_PWR_CLK_STATE;
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 09/19] drm/i915/gen8: Pass sg_iter through pte inserts
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (7 preceding siblings ...)
2015-07-29 16:23 ` [PATCH v6 08/19] drm/i915/gen8: Add 4 level switching infrastructure and lrc support Michel Thierry
@ 2015-07-29 16:23 ` Michel Thierry
2015-07-30 4:19 ` Goel, Akash
2015-08-03 8:52 ` [PATCH v9 " Michel Thierry
2015-07-29 16:23 ` [PATCH v6 10/19] drm/i915/gen8: Add 4 level support in insert_entries and clear_range Michel Thierry
` (11 subsequent siblings)
20 siblings, 2 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:23 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
As a step towards implementing 4 levels, while not discarding the
existing pte insert functions, we need to pass the sg_iter through.
The current function understands to the page directory granularity.
An object's pages may span the page directory, and so using the iter
directly as we write the PTEs allows the iterator to stay coherent
through a VMA insert operation spanning multiple page table levels.
v2: Rebase after s/page_tables/page_table/.
v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series;
updated commit message (s/map/insert).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index c6c8af7..7c024e98 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -749,7 +749,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
static void
gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
struct i915_page_directory_pointer *pdp,
- struct sg_table *pages,
+ struct sg_page_iter *sg_iter,
uint64_t start,
enum i915_cache_level cache_level)
{
@@ -759,11 +759,10 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
- struct sg_page_iter sg_iter;
pt_vaddr = NULL;
- for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
+ while (__sg_page_iter_next(sg_iter)) {
if (pt_vaddr == NULL) {
struct i915_page_directory *pd = pdp->page_directory[pdpe];
struct i915_page_table *pt = pd->page_table[pde];
@@ -771,7 +770,7 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
}
pt_vaddr[pte] =
- gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
+ gen8_pte_encode(sg_page_iter_dma_address(sg_iter),
cache_level, true);
if (++pte == GEN8_PTES) {
kunmap_px(ppgtt, pt_vaddr);
@@ -797,8 +796,10 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+ struct sg_page_iter sg_iter;
- gen8_ppgtt_insert_pte_entries(vm, pdp, pages, start, cache_level);
+ __sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
+ gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter, start, cache_level);
}
static void gen8_free_page_tables(struct drm_device *dev,
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 10/19] drm/i915/gen8: Add 4 level support in insert_entries and clear_range
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (8 preceding siblings ...)
2015-07-29 16:23 ` [PATCH v6 09/19] drm/i915/gen8: Pass sg_iter through pte inserts Michel Thierry
@ 2015-07-29 16:23 ` Michel Thierry
2015-07-30 4:50 ` Goel, Akash
2015-08-03 8:53 ` [PATCH v9 " Michel Thierry
2015-07-29 16:23 ` [PATCH v6 11/19] drm/i915/gen8: Initialize PDPs and PML4 Michel Thierry
` (10 subsequent siblings)
20 siblings, 2 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:23 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
When 48b is enabled, gen8_ppgtt_insert_entries needs to read the Page Map
Level 4 (PML4), before it selects which Page Directory Pointer (PDP)
it will write to.
Similarly, gen8_ppgtt_clear_range needs to get the correct PDP/PD range.
This patch was inspired by Ben's "Depend exclusively on map and
unmap_vma".
v2: Rebase after s/page_tables/page_table/.
v3: Remove unnecessary pdpe loop in gen8_ppgtt_clear_range_4lvl and use
clamp_pdp in gen8_ppgtt_insert_entries (Akash).
v4: Merge gen8_ppgtt_clear_range_4lvl into gen8_ppgtt_clear_range to
maintain symmetry with gen8_ppgtt_insert_entries (Akash).
v5: Do not mix pages and bytes in insert_entries (Akash).
v6: Prevent overflow in sg_nents << PAGE_SHIFT, when inserting 4GB at
once.
v7: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
Use gen8_px_index functions, and remove unnecessary number of pages
parameter in insert_pte_entries.
v8: Change gen8_ppgtt_clear_pte_range to stop at PDP boundary, instead of
adding and extra clamp function; remove unnecessary pdp_start/pdp_len
variables (Akash).
v9: pages->orig_nents instead of sg_nents(pages->sgl) to get the
length (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 49 +++++++++++++++++++++++++++----------
1 file changed, 36 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7c024e98..7070d42 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -687,9 +687,9 @@ static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
gen8_pte_t *pt_vaddr;
- unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
- unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
- unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
+ unsigned pdpe = gen8_pdpe_index(start);
+ unsigned pde = gen8_pde_index(start);
+ unsigned pte = gen8_pte_index(start);
unsigned num_entries = length >> PAGE_SHIFT;
unsigned last_pte, i;
@@ -725,7 +725,8 @@ static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
pte = 0;
if (++pde == I915_PDES) {
- pdpe++;
+ if (++pdpe == I915_PDPES_PER_PDP(vm->dev))
+ break;
pde = 0;
}
}
@@ -738,12 +739,21 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
- struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
-
gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
I915_CACHE_LLC, use_scratch);
- gen8_ppgtt_clear_pte_range(vm, pdp, start, length, scratch_pte);
+ if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+ gen8_ppgtt_clear_pte_range(vm, &ppgtt->pdp, start, length,
+ scratch_pte);
+ } else {
+ uint64_t templ4, pml4e;
+ struct i915_page_directory_pointer *pdp;
+
+ gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
+ gen8_ppgtt_clear_pte_range(vm, pdp, start, length,
+ scratch_pte);
+ }
+ }
}
static void
@@ -756,9 +766,9 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
gen8_pte_t *pt_vaddr;
- unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
- unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
- unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
+ unsigned pdpe = gen8_pdpe_index(start);
+ unsigned pde = gen8_pde_index(start);
+ unsigned pte = gen8_pte_index(start);
pt_vaddr = NULL;
@@ -776,7 +786,8 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
kunmap_px(ppgtt, pt_vaddr);
pt_vaddr = NULL;
if (++pde == I915_PDES) {
- pdpe++;
+ if (++pdpe == I915_PDPES_PER_PDP(vm->dev))
+ break;
pde = 0;
}
pte = 0;
@@ -795,11 +806,23 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
- struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
struct sg_page_iter sg_iter;
__sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
- gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter, start, cache_level);
+
+ if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+ gen8_ppgtt_insert_pte_entries(vm, &ppgtt->pdp, &sg_iter, start,
+ cache_level);
+ } else {
+ struct i915_page_directory_pointer *pdp;
+ uint64_t templ4, pml4e;
+ uint64_t length = (uint64_t)pages->orig_nents << PAGE_SHIFT;
+
+ gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
+ gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter,
+ start, cache_level);
+ }
+ }
}
static void gen8_free_page_tables(struct drm_device *dev,
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 11/19] drm/i915/gen8: Initialize PDPs and PML4
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (9 preceding siblings ...)
2015-07-29 16:23 ` [PATCH v6 10/19] drm/i915/gen8: Add 4 level support in insert_entries and clear_range Michel Thierry
@ 2015-07-29 16:23 ` Michel Thierry
2015-07-30 4:56 ` Goel, Akash
2015-07-29 16:23 ` [PATCH v6 12/19] drm/i915: Expand error state's address width to 64b Michel Thierry
` (9 subsequent siblings)
20 siblings, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:23 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
Similar to PDs, while setting up a page directory pointer, make all entries
of the pdp point to the scratch pd before mapping (and make all its entries
point to the scratch page); this is to be safe in case of out of bound
access or proactive prefetch.
Also add a scratch pdp, which the PML4 entries point to.
v2: Handle scratch_pdp allocation failure correctly, and keep
initialize_px functions together (Akash)
v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series. Rely on
the added macros to initialize the pdps.
v4: Rebase after final merged version of Mika's ppgtt/scratch patches
(and removed commit message part related to v3).
v5: Update commit message to also mention PML4 table initialization and
the new scratch pdp (Akash).
Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 38 +++++++++++++++++++++++++++++++++++++
drivers/gpu/drm/i915/i915_gem_gtt.h | 1 +
2 files changed, 39 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7070d42..73cfe56 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -602,6 +602,27 @@ static void free_pdp(struct drm_device *dev,
}
}
+static void gen8_initialize_pdp(struct i915_address_space *vm,
+ struct i915_page_directory_pointer *pdp)
+{
+ gen8_ppgtt_pdpe_t scratch_pdpe;
+
+ scratch_pdpe = gen8_pdpe_encode(px_dma(vm->scratch_pd), I915_CACHE_LLC);
+
+ fill_px(vm->dev, pdp, scratch_pdpe);
+}
+
+static void gen8_initialize_pml4(struct i915_address_space *vm,
+ struct i915_pml4 *pml4)
+{
+ gen8_ppgtt_pml4e_t scratch_pml4e;
+
+ scratch_pml4e = gen8_pml4e_encode(px_dma(vm->scratch_pdp),
+ I915_CACHE_LLC);
+
+ fill_px(vm->dev, pml4, scratch_pml4e);
+}
+
static void
gen8_setup_page_directory(struct i915_hw_ppgtt *ppgtt,
struct i915_page_directory_pointer *pdp,
@@ -863,8 +884,20 @@ static int gen8_init_scratch(struct i915_address_space *vm)
return PTR_ERR(vm->scratch_pd);
}
+ if (USES_FULL_48BIT_PPGTT(dev)) {
+ vm->scratch_pdp = alloc_pdp(dev);
+ if (IS_ERR(vm->scratch_pdp)) {
+ free_pd(dev, vm->scratch_pd);
+ free_pt(dev, vm->scratch_pt);
+ free_scratch_page(dev, vm->scratch_page);
+ return PTR_ERR(vm->scratch_pdp);
+ }
+ }
+
gen8_initialize_pt(vm, vm->scratch_pt);
gen8_initialize_pd(vm, vm->scratch_pd);
+ if (USES_FULL_48BIT_PPGTT(dev))
+ gen8_initialize_pdp(vm, vm->scratch_pdp);
return 0;
}
@@ -873,6 +906,8 @@ static void gen8_free_scratch(struct i915_address_space *vm)
{
struct drm_device *dev = vm->dev;
+ if (USES_FULL_48BIT_PPGTT(dev))
+ free_pdp(dev, vm->scratch_pdp);
free_pd(dev, vm->scratch_pd);
free_pt(dev, vm->scratch_pt);
free_scratch_page(dev, vm->scratch_page);
@@ -1074,6 +1109,7 @@ gen8_ppgtt_alloc_page_dirpointers(struct i915_address_space *vm,
if (IS_ERR(pdp))
goto unwind_out;
+ gen8_initialize_pdp(vm, pdp);
pml4->pdps[pml4e] = pdp;
__set_bit(pml4e, new_pdps);
trace_i915_page_directory_pointer_entry_alloc(vm,
@@ -1353,6 +1389,8 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
if (ret)
goto free_scratch;
+ gen8_initialize_pml4(&ppgtt->base, &ppgtt->pml4);
+
ppgtt->base.total = 1ULL << 48;
ppgtt->switch_mm = gen8_48b_mm_switch;
} else {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 11d44b3..70c50e7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -278,6 +278,7 @@ struct i915_address_space {
struct i915_page_scratch *scratch_page;
struct i915_page_table *scratch_pt;
struct i915_page_directory *scratch_pd;
+ struct i915_page_directory_pointer *scratch_pdp; /* GEN8+ & 48b PPGTT */
/**
* List of objects currently involved in rendering.
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 12/19] drm/i915: Expand error state's address width to 64b
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (10 preceding siblings ...)
2015-07-29 16:23 ` [PATCH v6 11/19] drm/i915/gen8: Initialize PDPs and PML4 Michel Thierry
@ 2015-07-29 16:23 ` Michel Thierry
2015-07-30 5:09 ` Goel, Akash
2015-07-29 16:23 ` [PATCH v6 13/19] drm/i915/gen8: Add ppgtt info and debug_dump Michel Thierry
` (8 subsequent siblings)
20 siblings, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:23 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
v2: For semaphore errors, object is mapped to GGTT and offset will not
be > 4GB, print only lower 32-bits (Akash)
v3: Print gtt_offset in groups of 32-bit (Chris)
Cc: Akash Goel <akash.goel@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 4 ++--
drivers/gpu/drm/i915/i915_gpu_error.c | 24 ++++++++++++++----------
2 files changed, 16 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0b5cbe8..33926d9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -546,7 +546,7 @@ struct drm_i915_error_state {
struct drm_i915_error_object {
int page_count;
- u32 gtt_offset;
+ u64 gtt_offset;
u32 *pages[0];
} *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
@@ -572,7 +572,7 @@ struct drm_i915_error_state {
u32 size;
u32 name;
u32 rseqno[I915_NUM_RINGS], wseqno;
- u32 gtt_offset;
+ u64 gtt_offset;
u32 read_domains;
u32 write_domain;
s32 fence_reg:I915_MAX_NUM_FENCE_BITS;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 6f42569..f79c952 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -197,8 +197,9 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
err_printf(m, " %s [%d]:\n", name, count);
while (count--) {
- err_printf(m, " %08x %8u %02x %02x [ ",
- err->gtt_offset,
+ err_printf(m, " %08x_%08x %8u %02x %02x [ ",
+ upper_32_bits(err->gtt_offset),
+ lower_32_bits(err->gtt_offset),
err->size,
err->read_domains,
err->write_domain);
@@ -426,15 +427,17 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
err_printf(m, " (submitted by %s [%d])",
error->ring[i].comm,
error->ring[i].pid);
- err_printf(m, " --- gtt_offset = 0x%08x\n",
- obj->gtt_offset);
+ err_printf(m, " --- gtt_offset = 0x%08x %08x\n",
+ upper_32_bits(obj->gtt_offset),
+ lower_32_bits(obj->gtt_offset));
print_error_obj(m, obj);
}
obj = error->ring[i].wa_batchbuffer;
if (obj) {
err_printf(m, "%s (w/a) --- gtt_offset = 0x%08x\n",
- dev_priv->ring[i].name, obj->gtt_offset);
+ dev_priv->ring[i].name,
+ lower_32_bits(obj->gtt_offset));
print_error_obj(m, obj);
}
@@ -453,14 +456,14 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
if ((obj = error->ring[i].ringbuffer)) {
err_printf(m, "%s --- ringbuffer = 0x%08x\n",
dev_priv->ring[i].name,
- obj->gtt_offset);
+ lower_32_bits(obj->gtt_offset));
print_error_obj(m, obj);
}
if ((obj = error->ring[i].hws_page)) {
err_printf(m, "%s --- HW Status = 0x%08x\n",
dev_priv->ring[i].name,
- obj->gtt_offset);
+ lower_32_bits(obj->gtt_offset));
offset = 0;
for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
err_printf(m, "[%04x] %08x %08x %08x %08x\n",
@@ -476,13 +479,14 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
if ((obj = error->ring[i].ctx)) {
err_printf(m, "%s --- HW Context = 0x%08x\n",
dev_priv->ring[i].name,
- obj->gtt_offset);
+ lower_32_bits(obj->gtt_offset));
print_error_obj(m, obj);
}
}
if ((obj = error->semaphore_obj)) {
- err_printf(m, "Semaphore page = 0x%08x\n", obj->gtt_offset);
+ err_printf(m, "Semaphore page = 0x%08x\n",
+ lower_32_bits(obj->gtt_offset));
for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
err_printf(m, "[%04x] %08x %08x %08x %08x\n",
elt * 4,
@@ -590,7 +594,7 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
int num_pages;
bool use_ggtt;
int i = 0;
- u32 reloc_offset;
+ u64 reloc_offset;
if (src == NULL || src->pages == NULL)
return NULL;
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 13/19] drm/i915/gen8: Add ppgtt info and debug_dump
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (11 preceding siblings ...)
2015-07-29 16:23 ` [PATCH v6 12/19] drm/i915: Expand error state's address width to 64b Michel Thierry
@ 2015-07-29 16:23 ` Michel Thierry
2015-07-30 5:20 ` Goel, Akash
2015-07-29 16:23 ` [PATCH v6 14/19] drm/i915: object size needs to be u64 Michel Thierry
` (7 subsequent siblings)
20 siblings, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:23 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
v2: Clean up patch after rebases.
v3: gen8_dump_ppgtt for 32b and 48b PPGTT.
v4: Use used_pml4es/pdpes (Akash).
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v6: Rely on used_px bits instead of null checking (Akash)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
drivers/gpu/drm/i915/i915_debugfs.c | 18 ++++----
drivers/gpu/drm/i915/i915_gem_gtt.c | 84 +++++++++++++++++++++++++++++++++++++
2 files changed, 94 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 23a69307..b6f1a13 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2250,7 +2250,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
{
struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_engine_cs *ring;
- struct drm_file *file;
int i;
if (INTEL_INFO(dev)->gen == 6)
@@ -2273,13 +2272,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
ppgtt->debug_dump(ppgtt, m);
}
- list_for_each_entry_reverse(file, &dev->filelist, lhead) {
- struct drm_i915_file_private *file_priv = file->driver_priv;
-
- seq_printf(m, "proc: %s\n",
- get_pid_task(file->pid, PIDTYPE_PID)->comm);
- idr_for_each(&file_priv->context_idr, per_file_ctx, m);
- }
seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
}
@@ -2288,6 +2280,7 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
struct drm_info_node *node = m->private;
struct drm_device *dev = node->minor->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
+ struct drm_file *file;
int ret = mutex_lock_interruptible(&dev->struct_mutex);
if (ret)
@@ -2299,6 +2292,15 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
else if (INTEL_INFO(dev)->gen >= 6)
gen6_ppgtt_info(m, dev);
+ list_for_each_entry_reverse(file, &dev->filelist, lhead) {
+ struct drm_i915_file_private *file_priv = file->driver_priv;
+
+ seq_printf(m, "\nproc: %s\n",
+ get_pid_task(file->pid, PIDTYPE_PID)->comm);
+ idr_for_each(&file_priv->context_idr, per_file_ctx,
+ (void *)(unsigned long)m);
+ }
+
intel_runtime_pm_put(dev_priv);
mutex_unlock(&dev->struct_mutex);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 73cfe56..0d7c7c1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1361,6 +1361,89 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
}
+static void gen8_dump_pdp(struct i915_page_directory_pointer *pdp,
+ uint64_t start, uint64_t length,
+ gen8_pte_t scratch_pte,
+ struct seq_file *m)
+{
+ struct i915_page_directory *pd;
+ uint64_t temp;
+ uint32_t pdpe;
+
+ gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
+ struct i915_page_table *pt;
+ uint64_t pd_len = length;
+ uint64_t pd_start = start;
+ uint32_t pde;
+
+ if (!test_bit(pdpe, pdp->used_pdpes))
+ continue;
+
+ seq_printf(m, "\tPDPE #%d\n", pdpe);
+ gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+ uint32_t pte;
+ gen8_pte_t *pt_vaddr;
+
+ if (!test_bit(pde, pd->used_pdes))
+ continue;
+
+ pt_vaddr = kmap_px(pt);
+ for (pte = 0; pte < GEN8_PTES; pte += 4) {
+ uint64_t va =
+ (pdpe << GEN8_PDPE_SHIFT) |
+ (pde << GEN8_PDE_SHIFT) |
+ (pte << GEN8_PTE_SHIFT);
+ int i;
+ bool found = false;
+
+ for (i = 0; i < 4; i++)
+ if (pt_vaddr[pte + i] != scratch_pte)
+ found = true;
+ if (!found)
+ continue;
+
+ seq_printf(m, "\t\t0x%llx [%03d,%03d,%04d]: =", va, pdpe, pde, pte);
+ for (i = 0; i < 4; i++) {
+ if (pt_vaddr[pte + i] != scratch_pte)
+ seq_printf(m, " %llx", pt_vaddr[pte + i]);
+ else
+ seq_puts(m, " SCRATCH ");
+ }
+ seq_puts(m, "\n");
+ }
+ /* don't use kunmap_px, it could trigger
+ * an unnecessary flush.
+ */
+ kunmap_atomic(pt_vaddr);
+ }
+ }
+}
+
+static void gen8_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
+{
+ struct i915_address_space *vm = &ppgtt->base;
+ uint64_t start = ppgtt->base.start;
+ uint64_t length = ppgtt->base.total;
+ gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
+ I915_CACHE_LLC, true);
+
+ if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+ gen8_dump_pdp(&ppgtt->pdp, start, length, scratch_pte, m);
+ } else {
+ uint64_t templ4, pml4e;
+ struct i915_pml4 *pml4 = &ppgtt->pml4;
+ struct i915_page_directory_pointer *pdp;
+
+ gen8_for_each_pml4e(pdp, pml4, start, length, templ4, pml4e) {
+ if (!test_bit(pml4e, pml4->used_pml4es))
+ continue;
+
+ seq_printf(m, " PML4E #%llu\n", pml4e);
+ gen8_dump_pdp(pdp, start, length, scratch_pte, m);
+ }
+ }
+}
+
/*
* GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
* with a net effect resembling a 2-level page table in normal x86 terms. Each
@@ -1383,6 +1466,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->base.clear_range = gen8_ppgtt_clear_range;
ppgtt->base.unbind_vma = ppgtt_unbind_vma;
ppgtt->base.bind_vma = ppgtt_bind_vma;
+ ppgtt->debug_dump = gen8_dump_ppgtt;
if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
ret = setup_px(ppgtt->base.dev, &ppgtt->pml4);
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 14/19] drm/i915: object size needs to be u64
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (12 preceding siblings ...)
2015-07-29 16:23 ` [PATCH v6 13/19] drm/i915/gen8: Add ppgtt info and debug_dump Michel Thierry
@ 2015-07-29 16:23 ` Michel Thierry
2015-07-30 5:22 ` Goel, Akash
2015-07-29 16:23 ` [PATCH v6 15/19] drm/i915: batch_obj vm offset must " Michel Thierry
` (6 subsequent siblings)
20 siblings, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:23 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
In a 48b world, users can try to allocate buffers bigger than 4GB; in
these cases it is important that size is a 64b variable.
v2: Drop the warning about bind with size 0, it shouldn't happen anyway.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5d68578..80f5d97 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3348,7 +3348,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
{
struct drm_device *dev = obj->base.dev;
struct drm_i915_private *dev_priv = dev->dev_private;
- u32 size, fence_size, fence_alignment, unfenced_alignment;
+ u32 fence_alignment, unfenced_alignment;
+ u64 size, fence_size;
u64 start =
flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
u64 end =
@@ -3407,7 +3408,7 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
* attempt to find space.
*/
if (size > end) {
- DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%u > %s aperture=%llu\n",
+ DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%llu > %s aperture=%llu\n",
ggtt_view ? ggtt_view->type : 0,
size,
flags & PIN_MAPPABLE ? "mappable" : "total",
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 15/19] drm/i915: batch_obj vm offset must be u64
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (13 preceding siblings ...)
2015-07-29 16:23 ` [PATCH v6 14/19] drm/i915: object size needs to be u64 Michel Thierry
@ 2015-07-29 16:23 ` Michel Thierry
2015-07-30 5:23 ` Goel, Akash
2015-08-05 16:01 ` Daniel Vetter
2015-07-29 16:24 ` [PATCH v6 16/19] drm/i915/userptr: Kill user_size limit check Michel Thierry
` (5 subsequent siblings)
20 siblings, 2 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:23 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
Otherwise it can overflow in 48-bit mode, and cause an incorrect
exec_start.
Before commit 5f19e2bffa63a91cd4ac1adcec648e14a44277ce ("drm/i915: Merged
the many do_execbuf() parameters into a structure"), it was already an u64.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 33926d9..ed2fbcd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1674,7 +1674,7 @@ struct i915_execbuffer_params {
struct drm_file *file;
uint32_t dispatch_flags;
uint32_t args_batch_start_offset;
- uint32_t batch_obj_vm_offset;
+ uint64_t batch_obj_vm_offset;
struct intel_engine_cs *ring;
struct drm_i915_gem_object *batch_obj;
struct intel_context *ctx;
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 16/19] drm/i915/userptr: Kill user_size limit check
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (14 preceding siblings ...)
2015-07-29 16:23 ` [PATCH v6 15/19] drm/i915: batch_obj vm offset must " Michel Thierry
@ 2015-07-29 16:24 ` Michel Thierry
2015-07-30 5:25 ` Goel, Akash
2015-07-29 16:24 ` [PATCH v6 17/19] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
` (4 subsequent siblings)
20 siblings, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:24 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
GTT was only 32b and its max value is 4GB. In order to allow objects
bigger than 4GB in 48b PPGTT, i915_gem_userptr_ioctl we could check
against max 48b range (1ULL << 48).
But since the check no longer applies, just kill the limit.
v2: Use the default ctx to infer the ppgtt max size (Akash).
v3: Just kill the limit, it was only there for early detection of an
error when used for execbuffer (Chris).
Cc: Akash Goel <akash.goel@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_userptr.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 8fd431b..d11901d 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -813,7 +813,6 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
int
i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
{
- struct drm_i915_private *dev_priv = dev->dev_private;
struct drm_i915_gem_userptr *args = data;
struct drm_i915_gem_object *obj;
int ret;
@@ -826,9 +825,6 @@ i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file *file
if (offset_in_page(args->user_ptr | args->user_size))
return -EINVAL;
- if (args->user_size > dev_priv->gtt.base.total)
- return -E2BIG;
-
if (!access_ok(args->flags & I915_USERPTR_READ_ONLY ? VERIFY_READ : VERIFY_WRITE,
(char __user *)(unsigned long)args->user_ptr, args->user_size))
return -EFAULT;
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 17/19] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (15 preceding siblings ...)
2015-07-29 16:24 ` [PATCH v6 16/19] drm/i915/userptr: Kill user_size limit check Michel Thierry
@ 2015-07-29 16:24 ` Michel Thierry
2015-07-30 5:39 ` Goel, Akash
2015-08-05 15:58 ` Daniel Vetter
2015-07-29 16:24 ` [PATCH v6 18/19] drm/i915/gen8: Flip the 48b switch Michel Thierry
` (3 subsequent siblings)
20 siblings, 2 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:24 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
There are some allocations that must be only referenced by 32-bit
offsets. To limit the chances of having the first 4GB already full,
objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
DRM_MM_CREATE_TOP flags
In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
General State Heap (GSH) or Instruction State Heap (ISH) must be in a
32-bit range, because the General State Offset and Instruction State
Offset are limited to 32-bits.
Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
they can be allocated above the 32-bit address range. To limit the
chances of having the first 4GB already full, objects will use
DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.
v2: Changed flag logic from neeeds_32b, to supports_48b.
v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
to use last PIN_ defined instead of hard-coded value; use correct limit
check in eb_vma_misplaced. (Chris)
v5: Don't touch PIN_OFFSET_MASK and update workaround comment (Chris)
v6: Apply pin-high for ggtt too (Chris)
v7: Handle simultaneous pin-high and pin-mappable end correctly (Akash)
Fix check for entries currently using +4GB addresses, use min_t and
other polish in object_bind_to_vm (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 2 ++
drivers/gpu/drm/i915/i915_gem.c | 25 +++++++++++++++++++------
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 +++++++++++++
include/uapi/drm/i915_drm.h | 3 ++-
4 files changed, 36 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ed2fbcd..c344805 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2775,6 +2775,8 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
#define PIN_OFFSET_BIAS (1<<3)
#define PIN_USER (1<<4)
#define PIN_UPDATE (1<<5)
+#define PIN_ZONE_4G (1<<6)
+#define PIN_HIGH (1<<7)
#define PIN_OFFSET_MASK (~4095)
int __must_check
i915_gem_object_pin(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 80f5d97..e1ca63f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3349,11 +3349,9 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
struct drm_device *dev = obj->base.dev;
struct drm_i915_private *dev_priv = dev->dev_private;
u32 fence_alignment, unfenced_alignment;
+ u32 search_flag, alloc_flag;
+ u64 start, end;
u64 size, fence_size;
- u64 start =
- flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
- u64 end =
- flags & PIN_MAPPABLE ? dev_priv->gtt.mappable_end : vm->total;
struct i915_vma *vma;
int ret;
@@ -3393,6 +3391,13 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
}
+ start = flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
+ end = vm->total;
+ if (flags & PIN_MAPPABLE)
+ end = min_t(u64, end, dev_priv->gtt.mappable_end);
+ if (flags & PIN_ZONE_4G)
+ end = min_t(u64, end, (1ULL << 32));
+
if (alignment == 0)
alignment = flags & PIN_MAPPABLE ? fence_alignment :
unfenced_alignment;
@@ -3428,13 +3433,21 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
if (IS_ERR(vma))
goto err_unpin;
+ if (flags & PIN_HIGH) {
+ search_flag = DRM_MM_SEARCH_BELOW;
+ alloc_flag = DRM_MM_CREATE_TOP;
+ } else {
+ search_flag = DRM_MM_SEARCH_DEFAULT;
+ alloc_flag = DRM_MM_CREATE_DEFAULT;
+ }
+
search_free:
ret = drm_mm_insert_node_in_range_generic(&vm->mm, &vma->node,
size, alignment,
obj->cache_level,
start, end,
- DRM_MM_SEARCH_DEFAULT,
- DRM_MM_CREATE_DEFAULT);
+ search_flag,
+ alloc_flag);
if (ret) {
ret = i915_gem_evict_something(dev, vm, size, alignment,
obj->cache_level,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 923a3c4..78fc881 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -589,11 +589,20 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
flags |= PIN_GLOBAL;
+ /* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
+ * limit address to the first 4GBs for unflagged objects.
+ */
+ flags |= PIN_ZONE_4G;
+ if (entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS)
+ flags &= ~PIN_ZONE_4G;
+
if (!drm_mm_node_allocated(&vma->node)) {
if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
flags |= PIN_GLOBAL | PIN_MAPPABLE;
if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
+ if ((flags & PIN_MAPPABLE) == 0)
+ flags |= PIN_HIGH;
}
ret = i915_gem_object_pin(obj, vma->vm, entry->alignment, flags);
@@ -671,6 +680,10 @@ eb_vma_misplaced(struct i915_vma *vma)
if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
return !only_mappable_for_reloc(entry->flags);
+ if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) == 0 &&
+ (vma->node.start + vma->node.size - 1) >> 32)
+ return true;
+
return false;
}
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index dbd16a2..08e047c 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -690,7 +690,8 @@ struct drm_i915_gem_exec_object2 {
#define EXEC_OBJECT_NEEDS_FENCE (1<<0)
#define EXEC_OBJECT_NEEDS_GTT (1<<1)
#define EXEC_OBJECT_WRITE (1<<2)
-#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_WRITE<<1)
+#define EXEC_OBJECT_SUPPORTS_48B_ADDRESS (1<<3)
+#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_SUPPORTS_48B_ADDRESS<<1)
__u64 flags;
__u64 rsvd1;
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 18/19] drm/i915/gen8: Flip the 48b switch
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (16 preceding siblings ...)
2015-07-29 16:24 ` [PATCH v6 17/19] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
@ 2015-07-29 16:24 ` Michel Thierry
2015-07-30 5:49 ` Goel, Akash
2015-07-30 10:09 ` [PATCH v7 " Michel Thierry
2015-07-29 16:24 ` [PATCH v6 19/19] drm/i915: Save some page table setup on repeated binds Michel Thierry
` (2 subsequent siblings)
20 siblings, 2 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:24 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
Use 48b addresses if hw supports it (i915.enable_ppgtt=3).
Note, aliasing PPGTT remains 32b only.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 5 ++---
drivers/gpu/drm/i915/i915_params.c | 2 +-
2 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0d7c7c1..a7d3c07 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -108,8 +108,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
- has_full_64bit_ppgtt = (IS_BROADWELL(dev) ||
- INTEL_INFO(dev)->gen >= 9) && false; /* FIXME: 64b */
+ has_full_64bit_ppgtt = IS_BROADWELL(dev) || INTEL_INFO(dev)->gen >= 9;
if (intel_vgpu_active(dev))
has_full_ppgtt = false; /* emulation is too hard */
@@ -147,7 +146,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
}
if (INTEL_INFO(dev)->gen >= 8 && i915.enable_execlists)
- return 2;
+ return has_full_64bit_ppgtt ? 3 : 2;
else
return has_aliasing_ppgtt ? 1 : 0;
}
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 5ae4b0a..d961440 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -111,7 +111,7 @@ MODULE_PARM_DESC(enable_hangcheck,
module_param_named_unsafe(enable_ppgtt, i915.enable_ppgtt, int, 0400);
MODULE_PARM_DESC(enable_ppgtt,
"Override PPGTT usage. "
- "(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
+ "(-1=auto [default], 0=disabled, 1=aliasing, 2=full, 3=full_64b)");
module_param_named(enable_execlists, i915.enable_execlists, int, 0400);
MODULE_PARM_DESC(enable_execlists,
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 19/19] drm/i915: Save some page table setup on repeated binds
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (17 preceding siblings ...)
2015-07-29 16:24 ` [PATCH v6 18/19] drm/i915/gen8: Flip the 48b switch Michel Thierry
@ 2015-07-29 16:24 ` Michel Thierry
2015-07-30 11:26 ` [PATCH v6 00/19] 48-bit PPGTT Chris Wilson
2015-08-03 9:51 ` Michel Thierry
20 siblings, 0 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-29 16:24 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
Check if the required page tables are already allocated, if so, we can
skip altogether the inner loop of pdes, and move to the next page
directory.
If the new allocation is different than the existing one (i.e. new
allocation spans more ptes than already covered from earlier allocations),
the used_ptes bitmap may not get updated correctly, but none of the
code-checks rely on this.
Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a7d3c07..13cf23c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1249,6 +1249,16 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
/* Every pd should be allocated, we just did that above. */
WARN_ON(!pd);
+ /* Check if the required page tables are already allocated /
+ * mapped; if so, we can skip altogether the inner loop of pdes,
+ * and move to the next page directory.
+ */
+ if (bitmap_subset(new_page_tables[pdpe], pd->used_pdes,
+ I915_PDES)) {
+ kunmap_px(ppgtt, page_directory);
+ continue;
+ }
+
gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
/* Same reasoning as pd */
WARN_ON(!pt);
--
2.4.5
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* Re: [PATCH v6 01/19] drm/i915: Remove unnecessary gen8_clamp_pd
2015-07-29 16:23 ` [PATCH v6 01/19] drm/i915: Remove unnecessary gen8_clamp_pd Michel Thierry
@ 2015-07-30 3:06 ` Goel, Akash
0 siblings, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-30 3:06 UTC (permalink / raw)
To: Michel Thierry, intel-gfx; +Cc: akash.goel
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 7/29/2015 9:53 PM, Michel Thierry wrote:
> gen8_clamp_pd clamps to the next page directory boundary, but the macro
> gen8_for_each_pde already has a check to stop at the page directory
> boundary.
>
> Furthermore, i915_pte_count also restricts to the next page table
> boundary.
>
> v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
>
> Suggested-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
> drivers/gpu/drm/i915/i915_gem_gtt.h | 11 -----------
> 2 files changed, 1 insertion(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index c2a291e..189572d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -955,7 +955,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
> gen8_pde_t *const page_directory = kmap_px(pd);
> struct i915_page_table *pt;
> - uint64_t pd_len = gen8_clamp_pd(start, length);
> + uint64_t pd_len = length;
> uint64_t pd_start = start;
> uint32_t pde;
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index e1cfa29..d5bf953 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -444,17 +444,6 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
> temp = min(temp, length), \
> start += temp, length -= temp)
>
> -/* Clamp length to the next page_directory boundary */
> -static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
> -{
> - uint64_t next_pd = ALIGN(start + 1, 1 << GEN8_PDPE_SHIFT);
> -
> - if (next_pd > (start + length))
> - return length;
> -
> - return next_pd - start;
> -}
> -
> static inline uint32_t gen8_pte_index(uint64_t address)
> {
> return i915_pte_index(address, GEN8_PDE_SHIFT);
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 02/19] drm/i915/gen8: Make pdp allocation more dynamic
2015-07-29 16:23 ` [PATCH v6 02/19] drm/i915/gen8: Make pdp allocation more dynamic Michel Thierry
@ 2015-07-30 3:18 ` Goel, Akash
2015-08-05 15:31 ` Daniel Vetter
1 sibling, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-30 3:18 UTC (permalink / raw)
To: Michel Thierry, intel-gfx; +Cc: akash.goel
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 7/29/2015 9:53 PM, Michel Thierry wrote:
> This transitional patch doesn't do much for the existing code. However,
> it should make upcoming patches to use the full 48b address space a bit
> easier.
>
> v2: Renamed pdp_free to be similar to pd/pt (unmap_and_free_pdp).
> v3: To facilitate testing, 48b mode will be available on Broadwell and
> GEN9+, when i915.enable_ppgtt = 3.
> v4: Rebase after s/page_tables/page_table/, added extra information
> about 4-level page table formats and use IS_ENABLED macro.
> v5: Check CONFIG_X86_64 instead of CONFIG_64BIT.
> v6: Rebase after Mika's ppgtt cleanup / scratch merge patch series, and
> follow
> his nomenclature in pdp functions (there is no alloc_pdp yet).
> v7: Rebase after merged version of Mika's ppgtt cleanup patch series.
> v8: Rebase after final merged version of Mika's ppgtt/scratch patches.
> v9: Introduce PML4 (and 48-bit checks) until next patch (Akash).
> v10: Also use test_bit to detect when pd/pt are already allocated (Akash)
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 86 +++++++++++++++++++++++++++++--------
> drivers/gpu/drm/i915/i915_gem_gtt.h | 17 +++++---
> 2 files changed, 80 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 189572d..28f3227 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -522,6 +522,43 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
> fill_px(vm->dev, pd, scratch_pde);
> }
>
> +static int __pdp_init(struct drm_device *dev,
> + struct i915_page_directory_pointer *pdp)
> +{
> + size_t pdpes = I915_PDPES_PER_PDP(dev);
> +
> + pdp->used_pdpes = kcalloc(BITS_TO_LONGS(pdpes),
> + sizeof(unsigned long),
> + GFP_KERNEL);
> + if (!pdp->used_pdpes)
> + return -ENOMEM;
> +
> + pdp->page_directory = kcalloc(pdpes, sizeof(*pdp->page_directory),
> + GFP_KERNEL);
> + if (!pdp->page_directory) {
> + kfree(pdp->used_pdpes);
> + /* the PDP might be the statically allocated top level. Keep it
> + * as clean as possible */
> + pdp->used_pdpes = NULL;
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +
> +static void __pdp_fini(struct i915_page_directory_pointer *pdp)
> +{
> + kfree(pdp->used_pdpes);
> + kfree(pdp->page_directory);
> + pdp->page_directory = NULL;
> +}
> +
> +static void free_pdp(struct drm_device *dev,
> + struct i915_page_directory_pointer *pdp)
> +{
> + __pdp_fini(pdp);
> +}
> +
> /* Broadwell Page Directory Pointer Descriptors */
> static int gen8_write_pdp(struct drm_i915_gem_request *req,
> unsigned entry,
> @@ -720,7 +757,8 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> container_of(vm, struct i915_hw_ppgtt, base);
> int i;
>
> - for_each_set_bit(i, ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES) {
> + for_each_set_bit(i, ppgtt->pdp.used_pdpes,
> + I915_PDPES_PER_PDP(ppgtt->base.dev)) {
> if (WARN_ON(!ppgtt->pdp.page_directory[i]))
> continue;
>
> @@ -729,6 +767,7 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
> }
>
> + free_pdp(ppgtt->base.dev, &ppgtt->pdp);
> gen8_free_scratch(vm);
> }
>
> @@ -763,7 +802,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
>
> gen8_for_each_pde(pt, pd, start, length, temp, pde) {
> /* Don't reallocate page tables */
> - if (pt) {
> + if (test_bit(pde, pd->used_pdes)) {
> /* Scratch is never allocated this way */
> WARN_ON(pt == ppgtt->base.scratch_pt);
> continue;
> @@ -820,11 +859,12 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
> struct i915_page_directory *pd;
> uint64_t temp;
> uint32_t pdpe;
> + uint32_t pdpes = I915_PDPES_PER_PDP(dev);
>
> - WARN_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
> + WARN_ON(!bitmap_empty(new_pds, pdpes));
>
> gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
> - if (pd)
> + if (test_bit(pdpe, pdp->used_pdpes))
> continue;
>
> pd = alloc_pd(dev);
> @@ -839,18 +879,19 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
> return 0;
>
> unwind_out:
> - for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
> + for_each_set_bit(pdpe, new_pds, pdpes)
> free_pd(dev, pdp->page_directory[pdpe]);
>
> return -ENOMEM;
> }
>
> static void
> -free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
> +free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
> + uint32_t pdpes)
> {
> int i;
>
> - for (i = 0; i < GEN8_LEGACY_PDPES; i++)
> + for (i = 0; i < pdpes; i++)
> kfree(new_pts[i]);
> kfree(new_pts);
> kfree(new_pds);
> @@ -861,23 +902,24 @@ free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
> */
> static
> int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
> - unsigned long ***new_pts)
> + unsigned long ***new_pts,
> + uint32_t pdpes)
> {
> int i;
> unsigned long *pds;
> unsigned long **pts;
>
> - pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES), sizeof(unsigned long), GFP_KERNEL);
> + pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL);
> if (!pds)
> return -ENOMEM;
>
> - pts = kcalloc(GEN8_LEGACY_PDPES, sizeof(unsigned long *), GFP_KERNEL);
> + pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL);
> if (!pts) {
> kfree(pds);
> return -ENOMEM;
> }
>
> - for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
> + for (i = 0; i < pdpes; i++) {
> pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES),
> sizeof(unsigned long), GFP_KERNEL);
> if (!pts[i])
> @@ -890,7 +932,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
> return 0;
>
> err_out:
> - free_gen8_temp_bitmaps(pds, pts);
> + free_gen8_temp_bitmaps(pds, pts, pdpes);
> return -ENOMEM;
> }
>
> @@ -916,6 +958,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> const uint64_t orig_length = length;
> uint64_t temp;
> uint32_t pdpe;
> + uint32_t pdpes = I915_PDPES_PER_PDP(ppgtt->base.dev);
> int ret;
>
> /* Wrap is never okay since we can only represent 48b, and we don't
> @@ -927,7 +970,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> if (WARN_ON(start + length > ppgtt->base.total))
> return -ENODEV;
>
> - ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
> + ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables, pdpes);
> if (ret)
> return ret;
>
> @@ -935,7 +978,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
> new_page_dirs);
> if (ret) {
> - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
> return ret;
> }
>
> @@ -989,7 +1032,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> __set_bit(pdpe, ppgtt->pdp.used_pdpes);
> }
>
> - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
> mark_tlbs_dirty(ppgtt);
> return 0;
>
> @@ -999,10 +1042,10 @@ err_out:
> free_pt(vm->dev, ppgtt->pdp.page_directory[pdpe]->page_table[temp]);
> }
>
> - for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
> + for_each_set_bit(pdpe, new_page_dirs, pdpes)
> free_pd(vm->dev, ppgtt->pdp.page_directory[pdpe]);
>
> - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
> mark_tlbs_dirty(ppgtt);
> return ret;
> }
> @@ -1040,7 +1083,16 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>
> ppgtt->switch_mm = gen8_mm_switch;
>
> + ret = __pdp_init(false, &ppgtt->pdp);
> +
> + if (ret)
> + goto free_scratch;
> +
> return 0;
> +
> +free_scratch:
> + gen8_free_scratch(&ppgtt->base);
> + return ret;
> }
>
> static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index d5bf953..87e389c 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -98,6 +98,9 @@ typedef uint64_t gen8_pde_t;
> #define GEN8_LEGACY_PDPES 4
> #define GEN8_PTES I915_PTES(sizeof(gen8_pte_t))
>
> +/* FIXME: Next patch will use dev */
> +#define I915_PDPES_PER_PDP(dev) GEN8_LEGACY_PDPES
> +
> #define PPAT_UNCACHED_INDEX (_PAGE_PWT | _PAGE_PCD)
> #define PPAT_CACHED_PDE_INDEX 0 /* WB LLC */
> #define PPAT_CACHED_INDEX _PAGE_PAT /* WB LLCeLLC */
> @@ -241,9 +244,10 @@ struct i915_page_directory {
> };
>
> struct i915_page_directory_pointer {
> - /* struct page *page; */
> - DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
> - struct i915_page_directory *page_directory[GEN8_LEGACY_PDPES];
> + struct i915_page_dma base;
> +
> + unsigned long *used_pdpes;
> + struct i915_page_directory **page_directory;
> };
>
> struct i915_address_space {
> @@ -436,9 +440,10 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
> temp = min(temp, length), \
> start += temp, length -= temp)
>
> -#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter) \
> - for (iter = gen8_pdpe_index(start); \
> - pd = (pdp)->page_directory[iter], length > 0 && iter < GEN8_LEGACY_PDPES; \
> +#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter) \
> + for (iter = gen8_pdpe_index(start); \
> + pd = (pdp)->page_directory[iter], \
> + length > 0 && (iter < I915_PDPES_PER_PDP(dev)); \
> iter++, \
> temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start, \
> temp = min(temp, length), \
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 05/19] drm/i915/gen8: Add dynamic page trace events
2015-07-29 16:23 ` [PATCH v6 05/19] drm/i915/gen8: Add dynamic page trace events Michel Thierry
@ 2015-07-30 3:48 ` Goel, Akash
0 siblings, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-30 3:48 UTC (permalink / raw)
To: Michel Thierry, intel-gfx; +Cc: akash.goel
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 7/29/2015 9:53 PM, Michel Thierry wrote:
> The dynamic page allocation patch series added it for GEN6, this patch
> adds them for GEN8.
>
> v2: Consolidate pagetable/page_directory events
> v3: Multiple rebases.
> v4: Rebase after s/page_tables/page_table/.
> v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
> v6: Rebase after gen8_map_pagetable_range removal.
> v7: Use generic page name (px) in DECLARE_EVENT_CLASS (Akash)
> v8: Defer define of i915_page_directory_pointer_entry_alloc (Akash)
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++++++
> drivers/gpu/drm/i915/i915_trace.h | 24 ++++++++++++++++--------
> 2 files changed, 22 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index f338a13..8c1db92 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -844,6 +844,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
> gen8_initialize_pt(vm, pt);
> pd->page_table[pde] = pt;
> __set_bit(pde, new_pts);
> + trace_i915_page_table_entry_alloc(vm, pde, start, GEN8_PDE_SHIFT);
> }
>
> return 0;
> @@ -904,6 +905,7 @@ gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
> gen8_initialize_pd(vm, pd);
> pdp->page_directory[pdpe] = pd;
> __set_bit(pdpe, new_pds);
> + trace_i915_page_directory_entry_alloc(vm, pdpe, start, GEN8_PDPE_SHIFT);
> }
>
> return 0;
> @@ -1053,6 +1055,10 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> /* Map the PDE to the page table */
> page_directory[pde] = gen8_pde_encode(px_dma(pt),
> I915_CACHE_LLC);
> + trace_i915_page_table_entry_map(&ppgtt->base, pde, pt,
> + gen8_pte_index(start),
> + gen8_pte_count(start, length),
> + GEN8_PTES);
>
> /* NB: We haven't yet mapped ptes to pages. At this
> * point we're still relying on insert_entries() */
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index 2f34c47..f230d76 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -186,33 +186,41 @@ DEFINE_EVENT(i915_va, i915_va_alloc,
> TP_ARGS(vm, start, length, name)
> );
>
> -DECLARE_EVENT_CLASS(i915_page_table_entry,
> - TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
> - TP_ARGS(vm, pde, start, pde_shift),
> +DECLARE_EVENT_CLASS(i915_px_entry,
> + TP_PROTO(struct i915_address_space *vm, u32 px, u64 start, u64 px_shift),
> + TP_ARGS(vm, px, start, px_shift),
>
> TP_STRUCT__entry(
> __field(struct i915_address_space *, vm)
> - __field(u32, pde)
> + __field(u32, px)
> __field(u64, start)
> __field(u64, end)
> ),
>
> TP_fast_assign(
> __entry->vm = vm;
> - __entry->pde = pde;
> + __entry->px = px;
> __entry->start = start;
> - __entry->end = ((start + (1ULL << pde_shift)) & ~((1ULL << pde_shift)-1)) - 1;
> + __entry->end = ((start + (1ULL << px_shift)) & ~((1ULL << px_shift)-1)) - 1;
> ),
>
> TP_printk("vm=%p, pde=%d (0x%llx-0x%llx)",
> - __entry->vm, __entry->pde, __entry->start, __entry->end)
> + __entry->vm, __entry->px, __entry->start, __entry->end)
> );
>
> -DEFINE_EVENT(i915_page_table_entry, i915_page_table_entry_alloc,
> +DEFINE_EVENT(i915_px_entry, i915_page_table_entry_alloc,
> TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 pde_shift),
> TP_ARGS(vm, pde, start, pde_shift)
> );
>
> +DEFINE_EVENT_PRINT(i915_px_entry, i915_page_directory_entry_alloc,
> + TP_PROTO(struct i915_address_space *vm, u32 pdpe, u64 start, u64 pdpe_shift),
> + TP_ARGS(vm, pdpe, start, pdpe_shift),
> +
> + TP_printk("vm=%p, pdpe=%d (0x%llx-0x%llx)",
> + __entry->vm, __entry->px, __entry->start, __entry->end)
> +);
> +
> /* Avoid extra math because we only support two sizes. The format is defined by
> * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
> #define TRACE_PT_SIZE(bits) \
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 06/19] drm/i915/gen8: Add PML4 structure
2015-07-29 16:23 ` [PATCH v6 06/19] drm/i915/gen8: Add PML4 structure Michel Thierry
@ 2015-07-30 4:01 ` Goel, Akash
2015-07-30 9:31 ` Michel Thierry
2015-07-30 10:04 ` [PATCH v7 " Michel Thierry
1 sibling, 1 reply; 82+ messages in thread
From: Goel, Akash @ 2015-07-30 4:01 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
On 7/29/2015 9:53 PM, Michel Thierry wrote:
> Introduces the Page Map Level 4 (PML4), ie. the new top level structure
> of the page tables.
>
> To facilitate testing, 48b mode will be available on Broadwell and
> GEN9+, when i915.enable_ppgtt = 3.
>
> v2: Remove unnecessary CONFIG_X86_64 checks, ppgtt code is already
> 32/64-bit safe (Chris).
> v3: Add goto free_scratch in temp 48-bit mode init code (Akash).
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 3 ++-
> drivers/gpu/drm/i915/i915_gem_gtt.c | 38 ++++++++++++++++++++++++-------------
> drivers/gpu/drm/i915/i915_gem_gtt.h | 26 ++++++++++++++++++++-----
> 3 files changed, 48 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 40fea41..0b5cbe8 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2498,7 +2498,8 @@ struct drm_i915_cmd_table {
> #define HAS_HW_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 6)
> #define HAS_LOGICAL_RING_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 8)
> #define USES_PPGTT(dev) (i915.enable_ppgtt)
> -#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt == 2)
> +#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt >= 2)
> +#define USES_FULL_48BIT_PPGTT(dev) (i915.enable_ppgtt == 3)
>
> #define HAS_OVERLAY(dev) (INTEL_INFO(dev)->has_overlay)
> #define OVERLAY_NEEDS_PHYSICAL(dev) (INTEL_INFO(dev)->overlay_needs_physical)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 8c1db92..1a120a4 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -104,9 +104,12 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
> {
> bool has_aliasing_ppgtt;
> bool has_full_ppgtt;
> + bool has_full_64bit_ppgtt;
>
> has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
> has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
> + has_full_64bit_ppgtt = (IS_BROADWELL(dev) ||
> + INTEL_INFO(dev)->gen >= 9) && false; /* FIXME: 64b */
>
> if (intel_vgpu_active(dev))
> has_full_ppgtt = false; /* emulation is too hard */
> @@ -125,6 +128,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
> if (enable_ppgtt == 2 && has_full_ppgtt)
> return 2;
>
> + if (enable_ppgtt == 3 && has_full_64bit_ppgtt)
> + return 3;
> +
> #ifdef CONFIG_INTEL_IOMMU
> /* Disable ppgtt on SNB if VT-d is on. */
> if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) {
> @@ -557,6 +563,8 @@ static void free_pdp(struct drm_device *dev,
> struct i915_page_directory_pointer *pdp)
> {
> __pdp_fini(pdp);
> + if (USES_FULL_48BIT_PPGTT(dev))
> + kfree(pdp);
Sorry for the late comment.
This change is a bit of distraction here, should be moved to the
following 'alloc/free for 4lvl' patch.
Best regards
Akash
> }
>
> /* Broadwell Page Directory Pointer Descriptors */
> @@ -686,9 +694,6 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
> pt_vaddr = NULL;
>
> for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
> - if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
> - break;
> -
> if (pt_vaddr == NULL) {
> struct i915_page_directory *pd = pdp->page_directory[pdpe];
> struct i915_page_table *pt = pd->page_table[pde];
> @@ -1102,14 +1107,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> return ret;
>
> ppgtt->base.start = 0;
> - ppgtt->base.total = 1ULL << 32;
> - if (IS_ENABLED(CONFIG_X86_32))
> - /* While we have a proliferation of size_t variables
> - * we cannot represent the full ppgtt size on 32bit,
> - * so limit it to the same size as the GGTT (currently
> - * 2GiB).
> - */
> - ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
> ppgtt->base.cleanup = gen8_ppgtt_cleanup;
> ppgtt->base.allocate_va_range = gen8_alloc_va_range;
> ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
> @@ -1119,10 +1116,25 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>
> ppgtt->switch_mm = gen8_mm_switch;
>
> - ret = __pdp_init(false, &ppgtt->pdp);
> + if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> + ret = __pdp_init(false, &ppgtt->pdp);
>
> - if (ret)
> + if (ret)
> + goto free_scratch;
> +
> + ppgtt->base.total = 1ULL << 32;
> + if (IS_ENABLED(CONFIG_X86_32))
> + /* While we have a proliferation of size_t variables
> + * we cannot represent the full ppgtt size on 32bit,
> + * so limit it to the same size as the GGTT (currently
> + * 2GiB).
> + */
> + ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
> + } else {
> + ppgtt->base.total = 1ULL << 48;
> + ret = -EPERM; /* Not yet implemented */
> goto free_scratch;
> + }
>
> return 0;
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 87e389c..04bc66f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -88,9 +88,17 @@ typedef uint64_t gen8_pde_t;
> * PDPE | PDE | PTE | offset
> * The difference as compared to normal x86 3 level page table is the PDPEs are
> * programmed via register.
> + *
> + * GEN8 48b legacy style address is defined as a 4 level page table:
> + * 47:39 | 38:30 | 29:21 | 20:12 | 11:0
> + * PML4E | PDPE | PDE | PTE | offset
> */
> +#define GEN8_PML4ES_PER_PML4 512
> +#define GEN8_PML4E_SHIFT 39
> #define GEN8_PDPE_SHIFT 30
> -#define GEN8_PDPE_MASK 0x3
> +/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
> + * tables */
> +#define GEN8_PDPE_MASK 0x1ff
> #define GEN8_PDE_SHIFT 21
> #define GEN8_PDE_MASK 0x1ff
> #define GEN8_PTE_SHIFT 12
> @@ -98,8 +106,8 @@ typedef uint64_t gen8_pde_t;
> #define GEN8_LEGACY_PDPES 4
> #define GEN8_PTES I915_PTES(sizeof(gen8_pte_t))
>
> -/* FIXME: Next patch will use dev */
> -#define I915_PDPES_PER_PDP(dev) GEN8_LEGACY_PDPES
> +#define I915_PDPES_PER_PDP(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
> + GEN8_PML4ES_PER_PML4 : GEN8_LEGACY_PDPES)
>
> #define PPAT_UNCACHED_INDEX (_PAGE_PWT | _PAGE_PCD)
> #define PPAT_CACHED_PDE_INDEX 0 /* WB LLC */
> @@ -250,6 +258,13 @@ struct i915_page_directory_pointer {
> struct i915_page_directory **page_directory;
> };
>
> +struct i915_pml4 {
> + struct i915_page_dma base;
> +
> + DECLARE_BITMAP(used_pml4es, GEN8_PML4ES_PER_PML4);
> + struct i915_page_directory_pointer *pdps[GEN8_PML4ES_PER_PML4];
> +};
> +
> struct i915_address_space {
> struct drm_mm mm;
> struct drm_device *dev;
> @@ -345,8 +360,9 @@ struct i915_hw_ppgtt {
> struct drm_mm_node node;
> unsigned long pd_dirty_rings;
> union {
> - struct i915_page_directory_pointer pdp;
> - struct i915_page_directory pd;
> + struct i915_pml4 pml4; /* GEN8+ & 48b PPGTT */
> + struct i915_page_directory_pointer pdp; /* GEN8+ */
> + struct i915_page_directory pd; /* GEN6-7 */
> };
>
> struct drm_i915_file_private *file_priv;
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 08/19] drm/i915/gen8: Add 4 level switching infrastructure and lrc support
2015-07-29 16:23 ` [PATCH v6 08/19] drm/i915/gen8: Add 4 level switching infrastructure and lrc support Michel Thierry
@ 2015-07-30 4:14 ` Goel, Akash
2015-07-30 9:36 ` Michel Thierry
2015-07-30 10:06 ` [PATCH v7 " Michel Thierry
1 sibling, 1 reply; 82+ messages in thread
From: Goel, Akash @ 2015-07-30 4:14 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
On 7/29/2015 9:53 PM, Michel Thierry wrote:
> In 64b (48bit canonical) PPGTT addressing, the PDP0 register contains
> the base address to PML4, while the other PDP registers are ignored.
>
> In LRC, the addressing mode must be specified in every context
> descriptor, and the base address to PML4 is stored in the reg state.
>
> v2: PML4 update in legacy context switch is left for historic reasons,
> the preferred mode of operation is with lrc context based submission.
> v3: s/gen8_map_page_directory/gen8_setup_page_directory and
> s/gen8_map_page_directory_pointer/gen8_setup_page_directory_pointer.
> Also, clflush will be needed for bxt. (Akash)
> v4: Squashed lrc-specific code and use a macro to set PML4 register.
> v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
> PDP update in bb_start is only for legacy 32b mode.
> v6: Rebase after final merged version of Mika's ppgtt/scratch
> patches.
> v7: There is no need to update the pml4 register value in
> execlists_update_context. (Akash)
> v8: Move pd and pdp setup functions to a previous patch, they do not
> belong here. (Akash)
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 17 +++++++----
> drivers/gpu/drm/i915/i915_reg.h | 1 +
> drivers/gpu/drm/i915/intel_lrc.c | 60 ++++++++++++++++++++++++++-----------
> 3 files changed, 55 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 4179b80..c6c8af7 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -656,8 +656,8 @@ static int gen8_write_pdp(struct drm_i915_gem_request *req,
> return 0;
> }
>
> -static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
> - struct drm_i915_gem_request *req)
> +static int gen8_legacy_mm_switch(struct i915_hw_ppgtt *ppgtt,
> + struct drm_i915_gem_request *req)
> {
> int i, ret;
>
> @@ -672,6 +672,12 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
> return 0;
> }
>
> +static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
> + struct drm_i915_gem_request *req)
> +{
> + return gen8_write_pdp(req, 0, px_dma(&ppgtt->pml4));
> +}
> +
> static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
> struct i915_page_directory_pointer *pdp,
> uint64_t start,
> @@ -1318,14 +1324,13 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> ppgtt->base.unbind_vma = ppgtt_unbind_vma;
> ppgtt->base.bind_vma = ppgtt_bind_vma;
>
> - ppgtt->switch_mm = gen8_mm_switch;
> -
> if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> ret = setup_px(ppgtt->base.dev, &ppgtt->pml4);
> if (ret)
> goto free_scratch;
>
> ppgtt->base.total = 1ULL << 48;
> + ppgtt->switch_mm = gen8_48b_mm_switch;
> } else {
> ret = __pdp_init(false, &ppgtt->pdp);
> if (ret)
> @@ -1340,6 +1345,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> */
> ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
>
> + ppgtt->switch_mm = gen8_legacy_mm_switch;
> trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base,
> 0, 0,
> GEN8_PML4E_SHIFT);
> @@ -1537,8 +1543,9 @@ static void gen8_ppgtt_enable(struct drm_device *dev)
> int j;
>
> for_each_ring(ring, dev_priv, j) {
> + u32 four_level = USES_FULL_48BIT_PPGTT(dev) ? GEN8_GFX_PPGTT_48B : 0;
> I915_WRITE(RING_MODE_GEN7(ring),
> - _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
> + _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE | four_level));
> }
> }
>
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 3a77678..5bd1b6a 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -1670,6 +1670,7 @@ enum skl_disp_power_wells {
> #define GFX_REPLAY_MODE (1<<11)
> #define GFX_PSMI_GRANULARITY (1<<10)
> #define GFX_PPGTT_ENABLE (1<<9)
> +#define GEN8_GFX_PPGTT_48B (1<<7)
>
> #define VLV_DISPLAY_BASE 0x180000
> #define VLV_MIPI_BASE VLV_DISPLAY_BASE
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 99bba8e..0b65188 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -196,13 +196,21 @@
> reg_state[CTX_PDP ## n ## _LDW+1] = lower_32_bits(_addr); \
> }
>
> +#define ASSIGN_CTX_PML4(ppgtt, reg_state) { \
> + reg_state[CTX_PDP0_UDW + 1] = upper_32_bits(px_dma(&ppgtt->pml4)); \
> + reg_state[CTX_PDP0_LDW + 1] = lower_32_bits(px_dma(&ppgtt->pml4)); \
> +}
> +
> enum {
> ADVANCED_CONTEXT = 0,
> - LEGACY_CONTEXT,
> + LEGACY_32B_CONTEXT,
> ADVANCED_AD_CONTEXT,
> LEGACY_64B_CONTEXT
> };
> -#define GEN8_CTX_MODE_SHIFT 3
> +#define GEN8_CTX_ADDRESSING_MODE_SHIFT 3
> +#define GEN8_CTX_ADDRESSING_MODE(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
> + LEGACY_64B_CONTEXT :\
> + LEGACY_32B_CONTEXT)
> enum {
> FAULT_AND_HANG = 0,
> FAULT_AND_HALT, /* Debug only */
> @@ -273,7 +281,7 @@ static uint64_t execlists_ctx_descriptor(struct drm_i915_gem_request *rq)
> WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
>
> desc = GEN8_CTX_VALID;
> - desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
> + desc |= GEN8_CTX_ADDRESSING_MODE(dev) << GEN8_CTX_ADDRESSING_MODE_SHIFT;
> if (IS_GEN8(ctx_obj->base.dev))
> desc |= GEN8_CTX_L3LLC_COHERENT;
> desc |= GEN8_CTX_PRIVILEGE;
> @@ -348,10 +356,12 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
> reg_state[CTX_RING_TAIL+1] = rq->tail;
> reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(rb_obj);
>
> - /* True PPGTT with dynamic page allocation: update PDP registers and
> - * point the unallocated PDPs to the scratch page
> - */
> - if (ppgtt) {
> + if (ppgtt && !USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> + /* True 32b PPGTT with dynamic page allocation: update PDP
> + * registers and point the unallocated PDPs to scratch page.
> + * PML4 is allocated during ppgtt init, so this is not needed
> + * in 48-bit mode.
> + */
> ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
> ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
> ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
> @@ -1512,12 +1522,15 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
> * Ideally, we should set Force PD Restore in ctx descriptor,
> * but we can't. Force Restore would be a second option, but
> * it is unsafe in case of lite-restore (because the ctx is
> - * not idle). */
> + * not idle). PML4 is allocated during ppgtt init so this is
> + * not needed in 48-bit.*/
> if (req->ctx->ppgtt &&
> (intel_ring_flag(req->ring) & req->ctx->ppgtt->pd_dirty_rings)) {
> - ret = intel_logical_ring_emit_pdps(req);
> - if (ret)
> - return ret;
> + if (GEN8_CTX_ADDRESSING_MODE(req->i915) == LEGACY_32B_CONTEXT) {
Sorry for the late comment.
For consistency, better to use 'USES_FULL_48BIT_PPGTT' macro only here,
as that will also imply the same thing i.e. which type of Context
addressing mode is being used.
Best regards
Akash
> + ret = intel_logical_ring_emit_pdps(req);
> + if (ret)
> + return ret;
> + }
>
> req->ctx->ppgtt->pd_dirty_rings &= ~intel_ring_flag(req->ring);
> }
> @@ -2198,13 +2211,24 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
> reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
> reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
>
> - /* With dynamic page allocation, PDPs may not be allocated at this point,
> - * Point the unallocated PDPs to the scratch page
> - */
> - ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
> - ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
> - ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
> - ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
> + if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> + /* 64b PPGTT (48bit canonical)
> + * PDP0_DESCRIPTOR contains the base address to PML4 and
> + * other PDP Descriptors are ignored.
> + */
> + ASSIGN_CTX_PML4(ppgtt, reg_state);
> + } else {
> + /* 32b PPGTT
> + * PDP*_DESCRIPTOR contains the base address of space supported.
> + * With dynamic page allocation, PDPs may not be allocated at
> + * this point. Point the unallocated PDPs to the scratch page
> + */
> + ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
> + ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
> + ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
> + ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
> + }
> +
> if (ring->id == RCS) {
> reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
> reg_state[CTX_R_PWR_CLK_STATE] = GEN8_R_PWR_CLK_STATE;
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 09/19] drm/i915/gen8: Pass sg_iter through pte inserts
2015-07-29 16:23 ` [PATCH v6 09/19] drm/i915/gen8: Pass sg_iter through pte inserts Michel Thierry
@ 2015-07-30 4:19 ` Goel, Akash
2015-08-03 8:52 ` [PATCH v9 " Michel Thierry
1 sibling, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-30 4:19 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 7/29/2015 9:53 PM, Michel Thierry wrote:
> As a step towards implementing 4 levels, while not discarding the
> existing pte insert functions, we need to pass the sg_iter through.
> The current function understands to the page directory granularity.
> An object's pages may span the page directory, and so using the iter
> directly as we write the PTEs allows the iterator to stay coherent
> through a VMA insert operation spanning multiple page table levels.
>
> v2: Rebase after s/page_tables/page_table/.
> v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series;
> updated commit message (s/map/insert).
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index c6c8af7..7c024e98 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -749,7 +749,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> static void
> gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
> struct i915_page_directory_pointer *pdp,
> - struct sg_table *pages,
> + struct sg_page_iter *sg_iter,
> uint64_t start,
> enum i915_cache_level cache_level)
> {
> @@ -759,11 +759,10 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
> unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> - struct sg_page_iter sg_iter;
>
> pt_vaddr = NULL;
>
> - for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
> + while (__sg_page_iter_next(sg_iter)) {
> if (pt_vaddr == NULL) {
> struct i915_page_directory *pd = pdp->page_directory[pdpe];
> struct i915_page_table *pt = pd->page_table[pde];
> @@ -771,7 +770,7 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
> }
>
> pt_vaddr[pte] =
> - gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
> + gen8_pte_encode(sg_page_iter_dma_address(sg_iter),
> cache_level, true);
> if (++pte == GEN8_PTES) {
> kunmap_px(ppgtt, pt_vaddr);
> @@ -797,8 +796,10 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> + struct sg_page_iter sg_iter;
>
> - gen8_ppgtt_insert_pte_entries(vm, pdp, pages, start, cache_level);
> + __sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
> + gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter, start, cache_level);
> }
>
> static void gen8_free_page_tables(struct drm_device *dev,
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 04/19] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
2015-07-29 16:23 ` [PATCH v6 04/19] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT Michel Thierry
@ 2015-07-30 4:46 ` Goel, Akash
2015-07-30 9:31 ` Michel Thierry
2015-07-30 10:02 ` [PATCH v7 " Michel Thierry
1 sibling, 1 reply; 82+ messages in thread
From: Goel, Akash @ 2015-07-30 4:46 UTC (permalink / raw)
To: Michel Thierry, intel-gfx, akash.goel
On 7/29/2015 9:53 PM, Michel Thierry wrote:
> The insert_entries function was the function used to write PTEs. For the
> PPGTT it was "hardcoded" to only understand two level page tables, which
> was the case for GEN7. We can reuse this for 4 level page tables, and
> remove the concept of insert_entries, which was never viable past 2
> level page tables anyway, but it requires a bit of rework to make the
> function a bit more generic.
>
> This patch begins the generalization work, and it will be heavily used
> upon when the 48b code is complete. The patch series attempts to make
> each function which touches a part of code specific to the page table
> level and here is no exception.
>
> v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
> v3: Rebase after final merged version of Mika's ppgtt/scratch patches.
>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 52 +++++++++++++++++++++++++++----------
> 1 file changed, 39 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index bd56979..f338a13 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -600,24 +600,21 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
> return 0;
> }
>
> -static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> - uint64_t start,
> - uint64_t length,
> - bool use_scratch)
> +static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
> + struct i915_page_directory_pointer *pdp,
> + uint64_t start,
> + uint64_t length,
> + gen8_pte_t scratch_pte)
> {
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> - struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> - gen8_pte_t *pt_vaddr, scratch_pte;
> + gen8_pte_t *pt_vaddr;
> unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> unsigned num_entries = length >> PAGE_SHIFT;
> unsigned last_pte, i;
>
> - scratch_pte = gen8_pte_encode(px_dma(ppgtt->base.scratch_page),
> - I915_CACHE_LLC, use_scratch);
> -
Sorry for the late comment.
Would it be better to have a WARN_ON check here on NULL value of pdp
pointer, considering the pdp will no longer be static in case of 48 bit.
Actually there are already such checks used in this function for pd, pt
and page pointers.
Best regards
Akash
> while (num_entries) {
> struct i915_page_directory *pd;
> struct i915_page_table *pt;
> @@ -656,14 +653,30 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> }
> }
>
> -static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> - struct sg_table *pages,
> - uint64_t start,
> - enum i915_cache_level cache_level, u32 unused)
> +static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> + uint64_t start,
> + uint64_t length,
> + bool use_scratch)
> {
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> +
> + gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
> + I915_CACHE_LLC, use_scratch);
> +
> + gen8_ppgtt_clear_pte_range(vm, pdp, start, length, scratch_pte);
> +}
> +
> +static void
> +gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
> + struct i915_page_directory_pointer *pdp,
> + struct sg_table *pages,
> + uint64_t start,
> + enum i915_cache_level cache_level)
> +{
> + struct i915_hw_ppgtt *ppgtt =
> + container_of(vm, struct i915_hw_ppgtt, base);
> gen8_pte_t *pt_vaddr;
> unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> @@ -700,6 +713,19 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> kunmap_px(ppgtt, pt_vaddr);
> }
>
> +static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> + struct sg_table *pages,
> + uint64_t start,
> + enum i915_cache_level cache_level,
> + u32 unused)
> +{
> + struct i915_hw_ppgtt *ppgtt =
> + container_of(vm, struct i915_hw_ppgtt, base);
> + struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> +
> + gen8_ppgtt_insert_pte_entries(vm, pdp, pages, start, cache_level);
> +}
> +
> static void gen8_free_page_tables(struct drm_device *dev,
> struct i915_page_directory *pd)
> {
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 10/19] drm/i915/gen8: Add 4 level support in insert_entries and clear_range
2015-07-29 16:23 ` [PATCH v6 10/19] drm/i915/gen8: Add 4 level support in insert_entries and clear_range Michel Thierry
@ 2015-07-30 4:50 ` Goel, Akash
2015-08-03 8:53 ` [PATCH v9 " Michel Thierry
1 sibling, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-30 4:50 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 7/29/2015 9:53 PM, Michel Thierry wrote:
> When 48b is enabled, gen8_ppgtt_insert_entries needs to read the Page Map
> Level 4 (PML4), before it selects which Page Directory Pointer (PDP)
> it will write to.
>
> Similarly, gen8_ppgtt_clear_range needs to get the correct PDP/PD range.
>
> This patch was inspired by Ben's "Depend exclusively on map and
> unmap_vma".
>
> v2: Rebase after s/page_tables/page_table/.
> v3: Remove unnecessary pdpe loop in gen8_ppgtt_clear_range_4lvl and use
> clamp_pdp in gen8_ppgtt_insert_entries (Akash).
> v4: Merge gen8_ppgtt_clear_range_4lvl into gen8_ppgtt_clear_range to
> maintain symmetry with gen8_ppgtt_insert_entries (Akash).
> v5: Do not mix pages and bytes in insert_entries (Akash).
> v6: Prevent overflow in sg_nents << PAGE_SHIFT, when inserting 4GB at
> once.
> v7: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
> Use gen8_px_index functions, and remove unnecessary number of pages
> parameter in insert_pte_entries.
> v8: Change gen8_ppgtt_clear_pte_range to stop at PDP boundary, instead of
> adding and extra clamp function; remove unnecessary pdp_start/pdp_len
> variables (Akash).
> v9: pages->orig_nents instead of sg_nents(pages->sgl) to get the
> length (Akash).
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 49 +++++++++++++++++++++++++++----------
> 1 file changed, 36 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 7c024e98..7070d42 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -687,9 +687,9 @@ static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> gen8_pte_t *pt_vaddr;
> - unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> - unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> - unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> + unsigned pdpe = gen8_pdpe_index(start);
> + unsigned pde = gen8_pde_index(start);
> + unsigned pte = gen8_pte_index(start);
> unsigned num_entries = length >> PAGE_SHIFT;
> unsigned last_pte, i;
>
> @@ -725,7 +725,8 @@ static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
>
> pte = 0;
> if (++pde == I915_PDES) {
> - pdpe++;
> + if (++pdpe == I915_PDPES_PER_PDP(vm->dev))
> + break;
> pde = 0;
> }
> }
> @@ -738,12 +739,21 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> {
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> - struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> -
> gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
> I915_CACHE_LLC, use_scratch);
>
> - gen8_ppgtt_clear_pte_range(vm, pdp, start, length, scratch_pte);
> + if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
> + gen8_ppgtt_clear_pte_range(vm, &ppgtt->pdp, start, length,
> + scratch_pte);
> + } else {
> + uint64_t templ4, pml4e;
> + struct i915_page_directory_pointer *pdp;
> +
> + gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
> + gen8_ppgtt_clear_pte_range(vm, pdp, start, length,
> + scratch_pte);
> + }
> + }
> }
>
> static void
> @@ -756,9 +766,9 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> gen8_pte_t *pt_vaddr;
> - unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> - unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> - unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> + unsigned pdpe = gen8_pdpe_index(start);
> + unsigned pde = gen8_pde_index(start);
> + unsigned pte = gen8_pte_index(start);
>
> pt_vaddr = NULL;
>
> @@ -776,7 +786,8 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
> kunmap_px(ppgtt, pt_vaddr);
> pt_vaddr = NULL;
> if (++pde == I915_PDES) {
> - pdpe++;
> + if (++pdpe == I915_PDPES_PER_PDP(vm->dev))
> + break;
> pde = 0;
> }
> pte = 0;
> @@ -795,11 +806,23 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> {
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> - struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> struct sg_page_iter sg_iter;
>
> __sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
> - gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter, start, cache_level);
> +
> + if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
> + gen8_ppgtt_insert_pte_entries(vm, &ppgtt->pdp, &sg_iter, start,
> + cache_level);
> + } else {
> + struct i915_page_directory_pointer *pdp;
> + uint64_t templ4, pml4e;
> + uint64_t length = (uint64_t)pages->orig_nents << PAGE_SHIFT;
> +
> + gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
> + gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter,
> + start, cache_level);
> + }
> + }
> }
>
> static void gen8_free_page_tables(struct drm_device *dev,
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 11/19] drm/i915/gen8: Initialize PDPs and PML4
2015-07-29 16:23 ` [PATCH v6 11/19] drm/i915/gen8: Initialize PDPs and PML4 Michel Thierry
@ 2015-07-30 4:56 ` Goel, Akash
0 siblings, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-30 4:56 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 7/29/2015 9:53 PM, Michel Thierry wrote:
> Similar to PDs, while setting up a page directory pointer, make all entries
> of the pdp point to the scratch pd before mapping (and make all its entries
> point to the scratch page); this is to be safe in case of out of bound
> access or proactive prefetch.
>
> Also add a scratch pdp, which the PML4 entries point to.
>
> v2: Handle scratch_pdp allocation failure correctly, and keep
> initialize_px functions together (Akash)
> v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series. Rely on
> the added macros to initialize the pdps.
> v4: Rebase after final merged version of Mika's ppgtt/scratch patches
> (and removed commit message part related to v3).
> v5: Update commit message to also mention PML4 table initialization and
> the new scratch pdp (Akash).
>
> Suggested-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 38 +++++++++++++++++++++++++++++++++++++
> drivers/gpu/drm/i915/i915_gem_gtt.h | 1 +
> 2 files changed, 39 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 7070d42..73cfe56 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -602,6 +602,27 @@ static void free_pdp(struct drm_device *dev,
> }
> }
>
> +static void gen8_initialize_pdp(struct i915_address_space *vm,
> + struct i915_page_directory_pointer *pdp)
> +{
> + gen8_ppgtt_pdpe_t scratch_pdpe;
> +
> + scratch_pdpe = gen8_pdpe_encode(px_dma(vm->scratch_pd), I915_CACHE_LLC);
> +
> + fill_px(vm->dev, pdp, scratch_pdpe);
> +}
> +
> +static void gen8_initialize_pml4(struct i915_address_space *vm,
> + struct i915_pml4 *pml4)
> +{
> + gen8_ppgtt_pml4e_t scratch_pml4e;
> +
> + scratch_pml4e = gen8_pml4e_encode(px_dma(vm->scratch_pdp),
> + I915_CACHE_LLC);
> +
> + fill_px(vm->dev, pml4, scratch_pml4e);
> +}
> +
> static void
> gen8_setup_page_directory(struct i915_hw_ppgtt *ppgtt,
> struct i915_page_directory_pointer *pdp,
> @@ -863,8 +884,20 @@ static int gen8_init_scratch(struct i915_address_space *vm)
> return PTR_ERR(vm->scratch_pd);
> }
>
> + if (USES_FULL_48BIT_PPGTT(dev)) {
> + vm->scratch_pdp = alloc_pdp(dev);
> + if (IS_ERR(vm->scratch_pdp)) {
> + free_pd(dev, vm->scratch_pd);
> + free_pt(dev, vm->scratch_pt);
> + free_scratch_page(dev, vm->scratch_page);
> + return PTR_ERR(vm->scratch_pdp);
> + }
> + }
> +
> gen8_initialize_pt(vm, vm->scratch_pt);
> gen8_initialize_pd(vm, vm->scratch_pd);
> + if (USES_FULL_48BIT_PPGTT(dev))
> + gen8_initialize_pdp(vm, vm->scratch_pdp);
>
> return 0;
> }
> @@ -873,6 +906,8 @@ static void gen8_free_scratch(struct i915_address_space *vm)
> {
> struct drm_device *dev = vm->dev;
>
> + if (USES_FULL_48BIT_PPGTT(dev))
> + free_pdp(dev, vm->scratch_pdp);
> free_pd(dev, vm->scratch_pd);
> free_pt(dev, vm->scratch_pt);
> free_scratch_page(dev, vm->scratch_page);
> @@ -1074,6 +1109,7 @@ gen8_ppgtt_alloc_page_dirpointers(struct i915_address_space *vm,
> if (IS_ERR(pdp))
> goto unwind_out;
>
> + gen8_initialize_pdp(vm, pdp);
> pml4->pdps[pml4e] = pdp;
> __set_bit(pml4e, new_pdps);
> trace_i915_page_directory_pointer_entry_alloc(vm,
> @@ -1353,6 +1389,8 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> if (ret)
> goto free_scratch;
>
> + gen8_initialize_pml4(&ppgtt->base, &ppgtt->pml4);
> +
> ppgtt->base.total = 1ULL << 48;
> ppgtt->switch_mm = gen8_48b_mm_switch;
> } else {
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 11d44b3..70c50e7 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -278,6 +278,7 @@ struct i915_address_space {
> struct i915_page_scratch *scratch_page;
> struct i915_page_table *scratch_pt;
> struct i915_page_directory *scratch_pd;
> + struct i915_page_directory_pointer *scratch_pdp; /* GEN8+ & 48b PPGTT */
>
> /**
> * List of objects currently involved in rendering.
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 12/19] drm/i915: Expand error state's address width to 64b
2015-07-29 16:23 ` [PATCH v6 12/19] drm/i915: Expand error state's address width to 64b Michel Thierry
@ 2015-07-30 5:09 ` Goel, Akash
0 siblings, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-30 5:09 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 7/29/2015 9:53 PM, Michel Thierry wrote:
> v2: For semaphore errors, object is mapped to GGTT and offset will not
> be > 4GB, print only lower 32-bits (Akash)
> v3: Print gtt_offset in groups of 32-bit (Chris)
>
> Cc: Akash Goel <akash.goel@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 4 ++--
> drivers/gpu/drm/i915/i915_gpu_error.c | 24 ++++++++++++++----------
> 2 files changed, 16 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 0b5cbe8..33926d9 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -546,7 +546,7 @@ struct drm_i915_error_state {
>
> struct drm_i915_error_object {
> int page_count;
> - u32 gtt_offset;
> + u64 gtt_offset;
> u32 *pages[0];
> } *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
>
> @@ -572,7 +572,7 @@ struct drm_i915_error_state {
> u32 size;
> u32 name;
> u32 rseqno[I915_NUM_RINGS], wseqno;
> - u32 gtt_offset;
> + u64 gtt_offset;
> u32 read_domains;
> u32 write_domain;
> s32 fence_reg:I915_MAX_NUM_FENCE_BITS;
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 6f42569..f79c952 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -197,8 +197,9 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
> err_printf(m, " %s [%d]:\n", name, count);
>
> while (count--) {
> - err_printf(m, " %08x %8u %02x %02x [ ",
> - err->gtt_offset,
> + err_printf(m, " %08x_%08x %8u %02x %02x [ ",
> + upper_32_bits(err->gtt_offset),
> + lower_32_bits(err->gtt_offset),
> err->size,
> err->read_domains,
> err->write_domain);
> @@ -426,15 +427,17 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
> err_printf(m, " (submitted by %s [%d])",
> error->ring[i].comm,
> error->ring[i].pid);
> - err_printf(m, " --- gtt_offset = 0x%08x\n",
> - obj->gtt_offset);
> + err_printf(m, " --- gtt_offset = 0x%08x %08x\n",
> + upper_32_bits(obj->gtt_offset),
> + lower_32_bits(obj->gtt_offset));
> print_error_obj(m, obj);
> }
>
> obj = error->ring[i].wa_batchbuffer;
> if (obj) {
> err_printf(m, "%s (w/a) --- gtt_offset = 0x%08x\n",
> - dev_priv->ring[i].name, obj->gtt_offset);
> + dev_priv->ring[i].name,
> + lower_32_bits(obj->gtt_offset));
> print_error_obj(m, obj);
> }
>
> @@ -453,14 +456,14 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
> if ((obj = error->ring[i].ringbuffer)) {
> err_printf(m, "%s --- ringbuffer = 0x%08x\n",
> dev_priv->ring[i].name,
> - obj->gtt_offset);
> + lower_32_bits(obj->gtt_offset));
> print_error_obj(m, obj);
> }
>
> if ((obj = error->ring[i].hws_page)) {
> err_printf(m, "%s --- HW Status = 0x%08x\n",
> dev_priv->ring[i].name,
> - obj->gtt_offset);
> + lower_32_bits(obj->gtt_offset));
> offset = 0;
> for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
> err_printf(m, "[%04x] %08x %08x %08x %08x\n",
> @@ -476,13 +479,14 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
> if ((obj = error->ring[i].ctx)) {
> err_printf(m, "%s --- HW Context = 0x%08x\n",
> dev_priv->ring[i].name,
> - obj->gtt_offset);
> + lower_32_bits(obj->gtt_offset));
> print_error_obj(m, obj);
> }
> }
>
> if ((obj = error->semaphore_obj)) {
> - err_printf(m, "Semaphore page = 0x%08x\n", obj->gtt_offset);
> + err_printf(m, "Semaphore page = 0x%08x\n",
> + lower_32_bits(obj->gtt_offset));
> for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
> err_printf(m, "[%04x] %08x %08x %08x %08x\n",
> elt * 4,
> @@ -590,7 +594,7 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
> int num_pages;
> bool use_ggtt;
> int i = 0;
> - u32 reloc_offset;
> + u64 reloc_offset;
>
> if (src == NULL || src->pages == NULL)
> return NULL;
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 13/19] drm/i915/gen8: Add ppgtt info and debug_dump
2015-07-29 16:23 ` [PATCH v6 13/19] drm/i915/gen8: Add ppgtt info and debug_dump Michel Thierry
@ 2015-07-30 5:20 ` Goel, Akash
0 siblings, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-30 5:20 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 7/29/2015 9:53 PM, Michel Thierry wrote:
> v2: Clean up patch after rebases.
> v3: gen8_dump_ppgtt for 32b and 48b PPGTT.
> v4: Use used_pml4es/pdpes (Akash).
> v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
> v6: Rely on used_px bits instead of null checking (Akash)
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
> ---
> drivers/gpu/drm/i915/i915_debugfs.c | 18 ++++----
> drivers/gpu/drm/i915/i915_gem_gtt.c | 84 +++++++++++++++++++++++++++++++++++++
> 2 files changed, 94 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 23a69307..b6f1a13 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2250,7 +2250,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
> {
> struct drm_i915_private *dev_priv = dev->dev_private;
> struct intel_engine_cs *ring;
> - struct drm_file *file;
> int i;
>
> if (INTEL_INFO(dev)->gen == 6)
> @@ -2273,13 +2272,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
> ppgtt->debug_dump(ppgtt, m);
> }
>
> - list_for_each_entry_reverse(file, &dev->filelist, lhead) {
> - struct drm_i915_file_private *file_priv = file->driver_priv;
> -
> - seq_printf(m, "proc: %s\n",
> - get_pid_task(file->pid, PIDTYPE_PID)->comm);
> - idr_for_each(&file_priv->context_idr, per_file_ctx, m);
> - }
> seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
> }
>
> @@ -2288,6 +2280,7 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
> struct drm_info_node *node = m->private;
> struct drm_device *dev = node->minor->dev;
> struct drm_i915_private *dev_priv = dev->dev_private;
> + struct drm_file *file;
>
> int ret = mutex_lock_interruptible(&dev->struct_mutex);
> if (ret)
> @@ -2299,6 +2292,15 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
> else if (INTEL_INFO(dev)->gen >= 6)
> gen6_ppgtt_info(m, dev);
>
> + list_for_each_entry_reverse(file, &dev->filelist, lhead) {
> + struct drm_i915_file_private *file_priv = file->driver_priv;
> +
> + seq_printf(m, "\nproc: %s\n",
> + get_pid_task(file->pid, PIDTYPE_PID)->comm);
> + idr_for_each(&file_priv->context_idr, per_file_ctx,
> + (void *)(unsigned long)m);
> + }
> +
> intel_runtime_pm_put(dev_priv);
> mutex_unlock(&dev->struct_mutex);
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 73cfe56..0d7c7c1 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1361,6 +1361,89 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
> }
>
> +static void gen8_dump_pdp(struct i915_page_directory_pointer *pdp,
> + uint64_t start, uint64_t length,
> + gen8_pte_t scratch_pte,
> + struct seq_file *m)
> +{
> + struct i915_page_directory *pd;
> + uint64_t temp;
> + uint32_t pdpe;
> +
> + gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
> + struct i915_page_table *pt;
> + uint64_t pd_len = length;
> + uint64_t pd_start = start;
> + uint32_t pde;
> +
> + if (!test_bit(pdpe, pdp->used_pdpes))
> + continue;
> +
> + seq_printf(m, "\tPDPE #%d\n", pdpe);
> + gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
> + uint32_t pte;
> + gen8_pte_t *pt_vaddr;
> +
> + if (!test_bit(pde, pd->used_pdes))
> + continue;
> +
> + pt_vaddr = kmap_px(pt);
> + for (pte = 0; pte < GEN8_PTES; pte += 4) {
> + uint64_t va =
> + (pdpe << GEN8_PDPE_SHIFT) |
> + (pde << GEN8_PDE_SHIFT) |
> + (pte << GEN8_PTE_SHIFT);
> + int i;
> + bool found = false;
> +
> + for (i = 0; i < 4; i++)
> + if (pt_vaddr[pte + i] != scratch_pte)
> + found = true;
> + if (!found)
> + continue;
> +
> + seq_printf(m, "\t\t0x%llx [%03d,%03d,%04d]: =", va, pdpe, pde, pte);
> + for (i = 0; i < 4; i++) {
> + if (pt_vaddr[pte + i] != scratch_pte)
> + seq_printf(m, " %llx", pt_vaddr[pte + i]);
> + else
> + seq_puts(m, " SCRATCH ");
> + }
> + seq_puts(m, "\n");
> + }
> + /* don't use kunmap_px, it could trigger
> + * an unnecessary flush.
> + */
> + kunmap_atomic(pt_vaddr);
> + }
> + }
> +}
> +
> +static void gen8_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
> +{
> + struct i915_address_space *vm = &ppgtt->base;
> + uint64_t start = ppgtt->base.start;
> + uint64_t length = ppgtt->base.total;
> + gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
> + I915_CACHE_LLC, true);
> +
> + if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
> + gen8_dump_pdp(&ppgtt->pdp, start, length, scratch_pte, m);
> + } else {
> + uint64_t templ4, pml4e;
> + struct i915_pml4 *pml4 = &ppgtt->pml4;
> + struct i915_page_directory_pointer *pdp;
> +
> + gen8_for_each_pml4e(pdp, pml4, start, length, templ4, pml4e) {
> + if (!test_bit(pml4e, pml4->used_pml4es))
> + continue;
> +
> + seq_printf(m, " PML4E #%llu\n", pml4e);
> + gen8_dump_pdp(pdp, start, length, scratch_pte, m);
> + }
> + }
> +}
> +
> /*
> * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
> * with a net effect resembling a 2-level page table in normal x86 terms. Each
> @@ -1383,6 +1466,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> ppgtt->base.clear_range = gen8_ppgtt_clear_range;
> ppgtt->base.unbind_vma = ppgtt_unbind_vma;
> ppgtt->base.bind_vma = ppgtt_bind_vma;
> + ppgtt->debug_dump = gen8_dump_ppgtt;
>
> if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> ret = setup_px(ppgtt->base.dev, &ppgtt->pml4);
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 14/19] drm/i915: object size needs to be u64
2015-07-29 16:23 ` [PATCH v6 14/19] drm/i915: object size needs to be u64 Michel Thierry
@ 2015-07-30 5:22 ` Goel, Akash
0 siblings, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-30 5:22 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 7/29/2015 9:53 PM, Michel Thierry wrote:
> In a 48b world, users can try to allocate buffers bigger than 4GB; in
> these cases it is important that size is a 64b variable.
>
> v2: Drop the warning about bind with size 0, it shouldn't happen anyway.
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 5d68578..80f5d97 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3348,7 +3348,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
> {
> struct drm_device *dev = obj->base.dev;
> struct drm_i915_private *dev_priv = dev->dev_private;
> - u32 size, fence_size, fence_alignment, unfenced_alignment;
> + u32 fence_alignment, unfenced_alignment;
> + u64 size, fence_size;
> u64 start =
> flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
> u64 end =
> @@ -3407,7 +3408,7 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
> * attempt to find space.
> */
> if (size > end) {
> - DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%u > %s aperture=%llu\n",
> + DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%llu > %s aperture=%llu\n",
> ggtt_view ? ggtt_view->type : 0,
> size,
> flags & PIN_MAPPABLE ? "mappable" : "total",
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 15/19] drm/i915: batch_obj vm offset must be u64
2015-07-29 16:23 ` [PATCH v6 15/19] drm/i915: batch_obj vm offset must " Michel Thierry
@ 2015-07-30 5:23 ` Goel, Akash
2015-08-05 16:01 ` Daniel Vetter
1 sibling, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-30 5:23 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 7/29/2015 9:53 PM, Michel Thierry wrote:
> Otherwise it can overflow in 48-bit mode, and cause an incorrect
> exec_start.
>
> Before commit 5f19e2bffa63a91cd4ac1adcec648e14a44277ce ("drm/i915: Merged
> the many do_execbuf() parameters into a structure"), it was already an u64.
>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 33926d9..ed2fbcd 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1674,7 +1674,7 @@ struct i915_execbuffer_params {
> struct drm_file *file;
> uint32_t dispatch_flags;
> uint32_t args_batch_start_offset;
> - uint32_t batch_obj_vm_offset;
> + uint64_t batch_obj_vm_offset;
> struct intel_engine_cs *ring;
> struct drm_i915_gem_object *batch_obj;
> struct intel_context *ctx;
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 16/19] drm/i915/userptr: Kill user_size limit check
2015-07-29 16:24 ` [PATCH v6 16/19] drm/i915/userptr: Kill user_size limit check Michel Thierry
@ 2015-07-30 5:25 ` Goel, Akash
0 siblings, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-30 5:25 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 7/29/2015 9:54 PM, Michel Thierry wrote:
> GTT was only 32b and its max value is 4GB. In order to allow objects
> bigger than 4GB in 48b PPGTT, i915_gem_userptr_ioctl we could check
> against max 48b range (1ULL << 48).
>
> But since the check no longer applies, just kill the limit.
>
> v2: Use the default ctx to infer the ppgtt max size (Akash).
> v3: Just kill the limit, it was only there for early detection of an
> error when used for execbuffer (Chris).
>
> Cc: Akash Goel <akash.goel@intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_userptr.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index 8fd431b..d11901d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -813,7 +813,6 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
> int
> i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> {
> - struct drm_i915_private *dev_priv = dev->dev_private;
> struct drm_i915_gem_userptr *args = data;
> struct drm_i915_gem_object *obj;
> int ret;
> @@ -826,9 +825,6 @@ i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file *file
> if (offset_in_page(args->user_ptr | args->user_size))
> return -EINVAL;
>
> - if (args->user_size > dev_priv->gtt.base.total)
> - return -E2BIG;
> -
> if (!access_ok(args->flags & I915_USERPTR_READ_ONLY ? VERIFY_READ : VERIFY_WRITE,
> (char __user *)(unsigned long)args->user_ptr, args->user_size))
> return -EFAULT;
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 17/19] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
2015-07-29 16:24 ` [PATCH v6 17/19] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
@ 2015-07-30 5:39 ` Goel, Akash
2015-08-05 15:58 ` Daniel Vetter
1 sibling, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-30 5:39 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 7/29/2015 9:54 PM, Michel Thierry wrote:
> There are some allocations that must be only referenced by 32-bit
> offsets. To limit the chances of having the first 4GB already full,
> objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
> DRM_MM_CREATE_TOP flags
>
> In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
> General State Heap (GSH) or Instruction State Heap (ISH) must be in a
> 32-bit range, because the General State Offset and Instruction State
> Offset are limited to 32-bits.
>
> Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
> they can be allocated above the 32-bit address range. To limit the
> chances of having the first 4GB already full, objects will use
> DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.
>
> v2: Changed flag logic from neeeds_32b, to supports_48b.
> v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
> v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
> to use last PIN_ defined instead of hard-coded value; use correct limit
> check in eb_vma_misplaced. (Chris)
> v5: Don't touch PIN_OFFSET_MASK and update workaround comment (Chris)
> v6: Apply pin-high for ggtt too (Chris)
> v7: Handle simultaneous pin-high and pin-mappable end correctly (Akash)
> Fix check for entries currently using +4GB addresses, use min_t and
> other polish in object_bind_to_vm (Chris)
>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Akash Goel <akash.goel@intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 2 ++
> drivers/gpu/drm/i915/i915_gem.c | 25 +++++++++++++++++++------
> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 +++++++++++++
> include/uapi/drm/i915_drm.h | 3 ++-
> 4 files changed, 36 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index ed2fbcd..c344805 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2775,6 +2775,8 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
> #define PIN_OFFSET_BIAS (1<<3)
> #define PIN_USER (1<<4)
> #define PIN_UPDATE (1<<5)
> +#define PIN_ZONE_4G (1<<6)
> +#define PIN_HIGH (1<<7)
> #define PIN_OFFSET_MASK (~4095)
> int __must_check
> i915_gem_object_pin(struct drm_i915_gem_object *obj,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 80f5d97..e1ca63f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3349,11 +3349,9 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
> struct drm_device *dev = obj->base.dev;
> struct drm_i915_private *dev_priv = dev->dev_private;
> u32 fence_alignment, unfenced_alignment;
> + u32 search_flag, alloc_flag;
> + u64 start, end;
> u64 size, fence_size;
> - u64 start =
> - flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
> - u64 end =
> - flags & PIN_MAPPABLE ? dev_priv->gtt.mappable_end : vm->total;
> struct i915_vma *vma;
> int ret;
>
> @@ -3393,6 +3391,13 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
> size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
> }
>
> + start = flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
> + end = vm->total;
> + if (flags & PIN_MAPPABLE)
> + end = min_t(u64, end, dev_priv->gtt.mappable_end);
> + if (flags & PIN_ZONE_4G)
> + end = min_t(u64, end, (1ULL << 32));
> +
> if (alignment == 0)
> alignment = flags & PIN_MAPPABLE ? fence_alignment :
> unfenced_alignment;
> @@ -3428,13 +3433,21 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
> if (IS_ERR(vma))
> goto err_unpin;
>
> + if (flags & PIN_HIGH) {
> + search_flag = DRM_MM_SEARCH_BELOW;
> + alloc_flag = DRM_MM_CREATE_TOP;
> + } else {
> + search_flag = DRM_MM_SEARCH_DEFAULT;
> + alloc_flag = DRM_MM_CREATE_DEFAULT;
> + }
> +
> search_free:
> ret = drm_mm_insert_node_in_range_generic(&vm->mm, &vma->node,
> size, alignment,
> obj->cache_level,
> start, end,
> - DRM_MM_SEARCH_DEFAULT,
> - DRM_MM_CREATE_DEFAULT);
> + search_flag,
> + alloc_flag);
> if (ret) {
> ret = i915_gem_evict_something(dev, vm, size, alignment,
> obj->cache_level,
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 923a3c4..78fc881 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -589,11 +589,20 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
> if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
> flags |= PIN_GLOBAL;
>
> + /* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
> + * limit address to the first 4GBs for unflagged objects.
> + */
> + flags |= PIN_ZONE_4G;
> + if (entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS)
> + flags &= ~PIN_ZONE_4G;
> +
> if (!drm_mm_node_allocated(&vma->node)) {
> if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
> flags |= PIN_GLOBAL | PIN_MAPPABLE;
> if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
> flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
> + if ((flags & PIN_MAPPABLE) == 0)
> + flags |= PIN_HIGH;
> }
>
> ret = i915_gem_object_pin(obj, vma->vm, entry->alignment, flags);
> @@ -671,6 +680,10 @@ eb_vma_misplaced(struct i915_vma *vma)
> if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
> return !only_mappable_for_reloc(entry->flags);
>
> + if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) == 0 &&
> + (vma->node.start + vma->node.size - 1) >> 32)
> + return true;
> +
> return false;
> }
>
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index dbd16a2..08e047c 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -690,7 +690,8 @@ struct drm_i915_gem_exec_object2 {
> #define EXEC_OBJECT_NEEDS_FENCE (1<<0)
> #define EXEC_OBJECT_NEEDS_GTT (1<<1)
> #define EXEC_OBJECT_WRITE (1<<2)
> -#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_WRITE<<1)
> +#define EXEC_OBJECT_SUPPORTS_48B_ADDRESS (1<<3)
> +#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_SUPPORTS_48B_ADDRESS<<1)
> __u64 flags;
>
> __u64 rsvd1;
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 18/19] drm/i915/gen8: Flip the 48b switch
2015-07-29 16:24 ` [PATCH v6 18/19] drm/i915/gen8: Flip the 48b switch Michel Thierry
@ 2015-07-30 5:49 ` Goel, Akash
2015-07-30 10:09 ` [PATCH v7 " Michel Thierry
1 sibling, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-30 5:49 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
On 7/29/2015 9:54 PM, Michel Thierry wrote:
> Use 48b addresses if hw supports it (i915.enable_ppgtt=3).
>
> Note, aliasing PPGTT remains 32b only.
>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 5 ++---
> drivers/gpu/drm/i915/i915_params.c | 2 +-
> 2 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 0d7c7c1..a7d3c07 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -108,8 +108,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
>
> has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
> has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
> - has_full_64bit_ppgtt = (IS_BROADWELL(dev) ||
> - INTEL_INFO(dev)->gen >= 9) && false; /* FIXME: 64b */
> + has_full_64bit_ppgtt = IS_BROADWELL(dev) || INTEL_INFO(dev)->gen >= 9;
>
> if (intel_vgpu_active(dev))
> has_full_ppgtt = false; /* emulation is too hard */
> @@ -147,7 +146,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
> }
>
> if (INTEL_INFO(dev)->gen >= 8 && i915.enable_execlists)
> - return 2;
> + return has_full_64bit_ppgtt ? 3 : 2;
> else
> return has_aliasing_ppgtt ? 1 : 0;
> }
> diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> index 5ae4b0a..d961440 100644
> --- a/drivers/gpu/drm/i915/i915_params.c
> +++ b/drivers/gpu/drm/i915/i915_params.c
> @@ -111,7 +111,7 @@ MODULE_PARM_DESC(enable_hangcheck,
> module_param_named_unsafe(enable_ppgtt, i915.enable_ppgtt, int, 0400);
> MODULE_PARM_DESC(enable_ppgtt,
> "Override PPGTT usage. "
> - "(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
> + "(-1=auto [default], 0=disabled, 1=aliasing, 2=full, 3=full_64b)");
Sorry for the late comment.
Would it be better to use '_48b' here & above, instead of '_64b', to be
precise ?
Actually in other patches also, 48 bit has been used.
Best regards
Akash
>
> module_param_named(enable_execlists, i915.enable_execlists, int, 0400);
> MODULE_PARM_DESC(enable_execlists,
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 04/19] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
2015-07-30 4:46 ` Goel, Akash
@ 2015-07-30 9:31 ` Michel Thierry
0 siblings, 0 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-30 9:31 UTC (permalink / raw)
To: Goel, Akash, intel-gfx
On 7/30/2015 5:46 AM, Goel, Akash wrote:
> On 7/29/2015 9:53 PM, Michel Thierry wrote:
>> The insert_entries function was the function used to write PTEs. For the
>> PPGTT it was "hardcoded" to only understand two level page tables, which
>> was the case for GEN7. We can reuse this for 4 level page tables, and
>> remove the concept of insert_entries, which was never viable past 2
>> level page tables anyway, but it requires a bit of rework to make the
>> function a bit more generic.
>>
>> This patch begins the generalization work, and it will be heavily used
>> upon when the 48b code is complete. The patch series attempts to make
>> each function which touches a part of code specific to the page table
>> level and here is no exception.
>>
>> v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
>> v3: Rebase after final merged version of Mika's ppgtt/scratch patches.
>>
>> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
>> ---
>> drivers/gpu/drm/i915/i915_gem_gtt.c | 52
>> +++++++++++++++++++++++++++----------
>> 1 file changed, 39 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> index bd56979..f338a13 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -600,24 +600,21 @@ static int gen8_mm_switch(struct i915_hw_ppgtt
>> *ppgtt,
>> return 0;
>> }
>>
>> -static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>> - uint64_t start,
>> - uint64_t length,
>> - bool use_scratch)
>> +static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
>> + struct i915_page_directory_pointer *pdp,
>> + uint64_t start,
>> + uint64_t length,
>> + gen8_pte_t scratch_pte)
>> {
>> struct i915_hw_ppgtt *ppgtt =
>> container_of(vm, struct i915_hw_ppgtt, base);
>> - struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME:
>> 48b */
>> - gen8_pte_t *pt_vaddr, scratch_pte;
>> + gen8_pte_t *pt_vaddr;
>> unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
>> unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
>> unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
>> unsigned num_entries = length >> PAGE_SHIFT;
>> unsigned last_pte, i;
>>
>> - scratch_pte = gen8_pte_encode(px_dma(ppgtt->base.scratch_page),
>> - I915_CACHE_LLC, use_scratch);
>> -
>
> Sorry for the late comment.
> Would it be better to have a WARN_ON check here on NULL value of pdp
> pointer, considering the pdp will no longer be static in case of 48 bit.
>
> Actually there are already such checks used in this function for pd, pt
> and page pointers.
>
> Best regards
> Akash
>
Ok, I'll add the extra WARN_ON(!pdp).
-Michel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 06/19] drm/i915/gen8: Add PML4 structure
2015-07-30 4:01 ` Goel, Akash
@ 2015-07-30 9:31 ` Michel Thierry
0 siblings, 0 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-30 9:31 UTC (permalink / raw)
To: Goel, Akash, intel-gfx
On 7/30/2015 5:01 AM, Goel, Akash wrote:
> On 7/29/2015 9:53 PM, Michel Thierry wrote:
>> Introduces the Page Map Level 4 (PML4), ie. the new top level structure
>> of the page tables.
>>
>> To facilitate testing, 48b mode will be available on Broadwell and
>> GEN9+, when i915.enable_ppgtt = 3.
>>
>> v2: Remove unnecessary CONFIG_X86_64 checks, ppgtt code is already
>> 32/64-bit safe (Chris).
>> v3: Add goto free_scratch in temp 48-bit mode init code (Akash).
>>
>> Cc: Akash Goel <akash.goel@intel.com>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>> @@ -557,6 +563,8 @@ static void free_pdp(struct drm_device *dev,
>> struct i915_page_directory_pointer *pdp)
>> {
>> __pdp_fini(pdp);
>> + if (USES_FULL_48BIT_PPGTT(dev))
>> + kfree(pdp);
>
> Sorry for the late comment.
> This change is a bit of distraction here, should be moved to the
> following 'alloc/free for 4lvl' patch.
>
> Best regards
> Akash
>
kfree(pdp) moved to patch 7/19.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 08/19] drm/i915/gen8: Add 4 level switching infrastructure and lrc support
2015-07-30 4:14 ` Goel, Akash
@ 2015-07-30 9:36 ` Michel Thierry
0 siblings, 0 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-30 9:36 UTC (permalink / raw)
To: Goel, Akash, intel-gfx
On 7/30/2015 5:14 AM, Goel, Akash wrote:
> On 7/29/2015 9:53 PM, Michel Thierry wrote:
>> @@ -1512,12 +1522,15 @@ static int gen8_emit_bb_start(struct
>> drm_i915_gem_request *req,
>> * Ideally, we should set Force PD Restore in ctx descriptor,
>> * but we can't. Force Restore would be a second option, but
>> * it is unsafe in case of lite-restore (because the ctx is
>> - * not idle). */
>> + * not idle). PML4 is allocated during ppgtt init so this is
>> + * not needed in 48-bit.*/
>> if (req->ctx->ppgtt &&
>> (intel_ring_flag(req->ring) &
>> req->ctx->ppgtt->pd_dirty_rings)) {
>> - ret = intel_logical_ring_emit_pdps(req);
>> - if (ret)
>> - return ret;
>> + if (GEN8_CTX_ADDRESSING_MODE(req->i915) == LEGACY_32B_CONTEXT) {
> Sorry for the late comment.
> For consistency, better to use 'USES_FULL_48BIT_PPGTT' macro only here,
> as that will also imply the same thing i.e. which type of Context
> addressing mode is being used.
>
> Best regards
> Akash
>
Ack. I'm changing it to if (!USES_FULL_48BIT_PPGTT(req->i915)).
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* [PATCH v7 03/19] drm/i915/gen8: Abstract PDP usage
2015-07-29 16:23 ` [PATCH v6 03/19] drm/i915/gen8: Abstract PDP usage Michel Thierry
@ 2015-07-30 10:02 ` Michel Thierry
2015-07-31 4:11 ` Goel, Akash
0 siblings, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-07-30 10:02 UTC (permalink / raw)
To: intel-gfx; +Cc: Akash Goel
Up until now, ppgtt->pdp has always been the root of our page tables.
Legacy 32b addresses acted like it had 1 PDP with 4 PDPEs.
In preparation for 4 level page tables, we need to stop using ppgtt->pdp
directly unless we know it's what we want. The future structure will use
ppgtt->pml4 for the top level, and the pdp is just one of the entries
being pointed to by a pml4e. The temporal pdp local variable will be
removed once the rest of the 4-level code lands.
Also, start passing the vm pointer to the alloc functions, instead of
ppgtt.
v2: Updated after dynamic page allocation changes.
v3: Rebase after s/page_tables/page_table/.
v4: Rebase after changes in "Dynamic page table allocations" patch.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v6: Rebase after final merged version of Mika's ppgtt/scratch patches.
v7: Keep pagetable map in-line (and avoid unnecessary for_each_pde
loops), remove redundant ppgtt pointer in _alloc_pagetabs (Akash)
v8: Fix text indentation in _alloc_pagetabs/page_directories (Chris)
v9: Defer gen8_alloc_va_range_4lvl definition until 4lvl is implemented,
clean-up gen8_ppgtt_cleanup [pun intended] (Akash).
v10: Clean-up commit message (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 84 +++++++++++++++++++------------------
1 file changed, 44 insertions(+), 40 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 28f3227..bd56979 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -607,6 +607,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
+ struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
gen8_pte_t *pt_vaddr, scratch_pte;
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -621,10 +622,10 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
struct i915_page_directory *pd;
struct i915_page_table *pt;
- if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
+ if (WARN_ON(!pdp->page_directory[pdpe]))
break;
- pd = ppgtt->pdp.page_directory[pdpe];
+ pd = pdp->page_directory[pdpe];
if (WARN_ON(!pd->page_table[pde]))
break;
@@ -662,6 +663,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
+ struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
gen8_pte_t *pt_vaddr;
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -675,7 +677,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
break;
if (pt_vaddr == NULL) {
- struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
+ struct i915_page_directory *pd = pdp->page_directory[pdpe];
struct i915_page_table *pt = pd->page_table[pde];
pt_vaddr = kmap_px(pt);
}
@@ -755,28 +757,29 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
+ struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+ struct drm_device *dev = ppgtt->base.dev;
int i;
- for_each_set_bit(i, ppgtt->pdp.used_pdpes,
- I915_PDPES_PER_PDP(ppgtt->base.dev)) {
- if (WARN_ON(!ppgtt->pdp.page_directory[i]))
+ for_each_set_bit(i, pdp->used_pdpes, I915_PDPES_PER_PDP(dev)) {
+ if (WARN_ON(!pdp->page_directory[i]))
continue;
- gen8_free_page_tables(ppgtt->base.dev,
- ppgtt->pdp.page_directory[i]);
- free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
+ gen8_free_page_tables(dev, pdp->page_directory[i]);
+ free_pd(dev, pdp->page_directory[i]);
}
- free_pdp(ppgtt->base.dev, &ppgtt->pdp);
+ free_pdp(dev, pdp);
+
gen8_free_scratch(vm);
}
/**
* gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
- * @ppgtt: Master ppgtt structure.
- * @pd: Page directory for this address range.
+ * @vm: Master vm structure.
+ * @pd: Page directory for this address range.
* @start: Starting virtual address to begin allocations.
- * @length Size of the allocations.
+ * @length: Size of the allocations.
* @new_pts: Bitmap set by function with new allocations. Likely used by the
* caller to free on error.
*
@@ -789,13 +792,13 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
*
* Return: 0 if success; negative error code otherwise.
*/
-static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
+static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
struct i915_page_directory *pd,
uint64_t start,
uint64_t length,
unsigned long *new_pts)
{
- struct drm_device *dev = ppgtt->base.dev;
+ struct drm_device *dev = vm->dev;
struct i915_page_table *pt;
uint64_t temp;
uint32_t pde;
@@ -804,7 +807,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
/* Don't reallocate page tables */
if (test_bit(pde, pd->used_pdes)) {
/* Scratch is never allocated this way */
- WARN_ON(pt == ppgtt->base.scratch_pt);
+ WARN_ON(pt == vm->scratch_pt);
continue;
}
@@ -812,7 +815,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
if (IS_ERR(pt))
goto unwind_out;
- gen8_initialize_pt(&ppgtt->base, pt);
+ gen8_initialize_pt(vm, pt);
pd->page_table[pde] = pt;
__set_bit(pde, new_pts);
}
@@ -828,11 +831,11 @@ unwind_out:
/**
* gen8_ppgtt_alloc_page_directories() - Allocate page directories for VA range.
- * @ppgtt: Master ppgtt structure.
+ * @vm: Master vm structure.
* @pdp: Page directory pointer for this address range.
* @start: Starting virtual address to begin allocations.
- * @length Size of the allocations.
- * @new_pds Bitmap set by function with new allocations. Likely used by the
+ * @length: Size of the allocations.
+ * @new_pds: Bitmap set by function with new allocations. Likely used by the
* caller to free on error.
*
* Allocate the required number of page directories starting at the pde index of
@@ -849,13 +852,14 @@ unwind_out:
*
* Return: 0 if success; negative error code otherwise.
*/
-static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
- struct i915_page_directory_pointer *pdp,
- uint64_t start,
- uint64_t length,
- unsigned long *new_pds)
+static int
+gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
+ struct i915_page_directory_pointer *pdp,
+ uint64_t start,
+ uint64_t length,
+ unsigned long *new_pds)
{
- struct drm_device *dev = ppgtt->base.dev;
+ struct drm_device *dev = vm->dev;
struct i915_page_directory *pd;
uint64_t temp;
uint32_t pdpe;
@@ -871,7 +875,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
if (IS_ERR(pd))
goto unwind_out;
- gen8_initialize_pd(&ppgtt->base, pd);
+ gen8_initialize_pd(vm, pd);
pdp->page_directory[pdpe] = pd;
__set_bit(pdpe, new_pds);
}
@@ -947,18 +951,19 @@ static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
}
static int gen8_alloc_va_range(struct i915_address_space *vm,
- uint64_t start,
- uint64_t length)
+ uint64_t start, uint64_t length)
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
unsigned long *new_page_dirs, **new_page_tables;
+ struct drm_device *dev = vm->dev;
+ struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
struct i915_page_directory *pd;
const uint64_t orig_start = start;
const uint64_t orig_length = length;
uint64_t temp;
uint32_t pdpe;
- uint32_t pdpes = I915_PDPES_PER_PDP(ppgtt->base.dev);
+ uint32_t pdpes = I915_PDPES_PER_PDP(dev);
int ret;
/* Wrap is never okay since we can only represent 48b, and we don't
@@ -967,7 +972,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
if (WARN_ON(start + length < start))
return -ENODEV;
- if (WARN_ON(start + length > ppgtt->base.total))
+ if (WARN_ON(start + length > vm->total))
return -ENODEV;
ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables, pdpes);
@@ -975,16 +980,16 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
return ret;
/* Do the allocations first so we can easily bail out */
- ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
- new_page_dirs);
+ ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length,
+ new_page_dirs);
if (ret) {
free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
return ret;
}
/* For every page directory referenced, allocate page tables */
- gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
- ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
+ gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
+ ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length,
new_page_tables[pdpe]);
if (ret)
goto err_out;
@@ -995,7 +1000,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
/* Allocations have completed successfully, so set the bitmaps, and do
* the mappings. */
- gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+ gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
gen8_pde_t *const page_directory = kmap_px(pd);
struct i915_page_table *pt;
uint64_t pd_len = length;
@@ -1028,8 +1033,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
}
kunmap_px(ppgtt, page_directory);
-
- __set_bit(pdpe, ppgtt->pdp.used_pdpes);
+ __set_bit(pdpe, pdp->used_pdpes);
}
free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1039,11 +1043,11 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
err_out:
while (pdpe--) {
for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES)
- free_pt(vm->dev, ppgtt->pdp.page_directory[pdpe]->page_table[temp]);
+ free_pt(dev, pdp->page_directory[pdpe]->page_table[temp]);
}
for_each_set_bit(pdpe, new_page_dirs, pdpes)
- free_pd(vm->dev, ppgtt->pdp.page_directory[pdpe]);
+ free_pd(dev, pdp->page_directory[pdpe]);
free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
mark_tlbs_dirty(ppgtt);
--
2.5.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v7 04/19] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
2015-07-29 16:23 ` [PATCH v6 04/19] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT Michel Thierry
2015-07-30 4:46 ` Goel, Akash
@ 2015-07-30 10:02 ` Michel Thierry
2015-07-31 4:00 ` Goel, Akash
1 sibling, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-07-30 10:02 UTC (permalink / raw)
To: intel-gfx; +Cc: Akash Goel
The insert_entries function was the function used to write PTEs. For the
PPGTT it was "hardcoded" to only understand two level page tables, which
was the case for GEN7. We can reuse this for 4 level page tables, and
remove the concept of insert_entries, which was never viable past 2
level page tables anyway, but it requires a bit of rework to make the
function a bit more generic.
v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v3: Rebase after final merged version of Mika's ppgtt/scratch patches.
v4: Check and warn for NULL value of pdp pointer (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 53 ++++++++++++++++++++++++++++---------
1 file changed, 41 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bd56979..740ad5b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -600,23 +600,23 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
return 0;
}
-static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
- uint64_t start,
- uint64_t length,
- bool use_scratch)
+static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
+ struct i915_page_directory_pointer *pdp,
+ uint64_t start,
+ uint64_t length,
+ gen8_pte_t scratch_pte)
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
- struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
- gen8_pte_t *pt_vaddr, scratch_pte;
+ gen8_pte_t *pt_vaddr;
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
unsigned num_entries = length >> PAGE_SHIFT;
unsigned last_pte, i;
- scratch_pte = gen8_pte_encode(px_dma(ppgtt->base.scratch_page),
- I915_CACHE_LLC, use_scratch);
+ if (WARN_ON(!pdp))
+ return;
while (num_entries) {
struct i915_page_directory *pd;
@@ -656,14 +656,30 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
}
}
-static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
- struct sg_table *pages,
- uint64_t start,
- enum i915_cache_level cache_level, u32 unused)
+static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
+ uint64_t start,
+ uint64_t length,
+ bool use_scratch)
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+
+ gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
+ I915_CACHE_LLC, use_scratch);
+
+ gen8_ppgtt_clear_pte_range(vm, pdp, start, length, scratch_pte);
+}
+
+static void
+gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
+ struct i915_page_directory_pointer *pdp,
+ struct sg_table *pages,
+ uint64_t start,
+ enum i915_cache_level cache_level)
+{
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
gen8_pte_t *pt_vaddr;
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -700,6 +716,19 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
kunmap_px(ppgtt, pt_vaddr);
}
+static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
+ struct sg_table *pages,
+ uint64_t start,
+ enum i915_cache_level cache_level,
+ u32 unused)
+{
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+ struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+
+ gen8_ppgtt_insert_pte_entries(vm, pdp, pages, start, cache_level);
+}
+
static void gen8_free_page_tables(struct drm_device *dev,
struct i915_page_directory *pd)
{
--
2.5.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v7 06/19] drm/i915/gen8: Add PML4 structure
2015-07-29 16:23 ` [PATCH v6 06/19] drm/i915/gen8: Add PML4 structure Michel Thierry
2015-07-30 4:01 ` Goel, Akash
@ 2015-07-30 10:04 ` Michel Thierry
2015-07-31 4:35 ` Goel, Akash
2015-07-31 12:12 ` [PATCH v8 " Michel Thierry
1 sibling, 2 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-30 10:04 UTC (permalink / raw)
To: intel-gfx; +Cc: Akash Goel
Introduces the Page Map Level 4 (PML4), ie. the new top level structure
of the page tables.
To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.
v2: Remove unnecessary CONFIG_X86_64 checks, ppgtt code is already
32/64-bit safe (Chris).
v3: Add goto free_scratch in temp 48-bit mode init code (Akash).
v4: kfree the pdp until the 4lvl alloc/free patch (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 3 ++-
drivers/gpu/drm/i915/i915_gem_gtt.c | 36 +++++++++++++++++++++++-------------
drivers/gpu/drm/i915/i915_gem_gtt.h | 26 +++++++++++++++++++++-----
3 files changed, 46 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 04aa34a..4729eaf 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2498,7 +2498,8 @@ struct drm_i915_cmd_table {
#define HAS_HW_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 6)
#define HAS_LOGICAL_RING_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 8)
#define USES_PPGTT(dev) (i915.enable_ppgtt)
-#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt == 2)
+#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt >= 2)
+#define USES_FULL_48BIT_PPGTT(dev) (i915.enable_ppgtt == 3)
#define HAS_OVERLAY(dev) (INTEL_INFO(dev)->has_overlay)
#define OVERLAY_NEEDS_PHYSICAL(dev) (INTEL_INFO(dev)->overlay_needs_physical)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7f71746..3288154 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -104,9 +104,12 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
{
bool has_aliasing_ppgtt;
bool has_full_ppgtt;
+ bool has_full_64bit_ppgtt;
has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
+ has_full_64bit_ppgtt = (IS_BROADWELL(dev) ||
+ INTEL_INFO(dev)->gen >= 9) && false; /* FIXME: 64b */
if (intel_vgpu_active(dev))
has_full_ppgtt = false; /* emulation is too hard */
@@ -125,6 +128,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
if (enable_ppgtt == 2 && has_full_ppgtt)
return 2;
+ if (enable_ppgtt == 3 && has_full_64bit_ppgtt)
+ return 3;
+
#ifdef CONFIG_INTEL_IOMMU
/* Disable ppgtt on SNB if VT-d is on. */
if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) {
@@ -689,9 +695,6 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
pt_vaddr = NULL;
for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
- if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
- break;
-
if (pt_vaddr == NULL) {
struct i915_page_directory *pd = pdp->page_directory[pdpe];
struct i915_page_table *pt = pd->page_table[pde];
@@ -1105,14 +1108,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
return ret;
ppgtt->base.start = 0;
- ppgtt->base.total = 1ULL << 32;
- if (IS_ENABLED(CONFIG_X86_32))
- /* While we have a proliferation of size_t variables
- * we cannot represent the full ppgtt size on 32bit,
- * so limit it to the same size as the GGTT (currently
- * 2GiB).
- */
- ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
ppgtt->base.cleanup = gen8_ppgtt_cleanup;
ppgtt->base.allocate_va_range = gen8_alloc_va_range;
ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
@@ -1122,10 +1117,25 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->switch_mm = gen8_mm_switch;
- ret = __pdp_init(false, &ppgtt->pdp);
+ if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+ ret = __pdp_init(false, &ppgtt->pdp);
- if (ret)
+ if (ret)
+ goto free_scratch;
+
+ ppgtt->base.total = 1ULL << 32;
+ if (IS_ENABLED(CONFIG_X86_32))
+ /* While we have a proliferation of size_t variables
+ * we cannot represent the full ppgtt size on 32bit,
+ * so limit it to the same size as the GGTT (currently
+ * 2GiB).
+ */
+ ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
+ } else {
+ ppgtt->base.total = 1ULL << 48;
+ ret = -EPERM; /* Not yet implemented */
goto free_scratch;
+ }
return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 87e389c..04bc66f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -88,9 +88,17 @@ typedef uint64_t gen8_pde_t;
* PDPE | PDE | PTE | offset
* The difference as compared to normal x86 3 level page table is the PDPEs are
* programmed via register.
+ *
+ * GEN8 48b legacy style address is defined as a 4 level page table:
+ * 47:39 | 38:30 | 29:21 | 20:12 | 11:0
+ * PML4E | PDPE | PDE | PTE | offset
*/
+#define GEN8_PML4ES_PER_PML4 512
+#define GEN8_PML4E_SHIFT 39
#define GEN8_PDPE_SHIFT 30
-#define GEN8_PDPE_MASK 0x3
+/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
+ * tables */
+#define GEN8_PDPE_MASK 0x1ff
#define GEN8_PDE_SHIFT 21
#define GEN8_PDE_MASK 0x1ff
#define GEN8_PTE_SHIFT 12
@@ -98,8 +106,8 @@ typedef uint64_t gen8_pde_t;
#define GEN8_LEGACY_PDPES 4
#define GEN8_PTES I915_PTES(sizeof(gen8_pte_t))
-/* FIXME: Next patch will use dev */
-#define I915_PDPES_PER_PDP(dev) GEN8_LEGACY_PDPES
+#define I915_PDPES_PER_PDP(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
+ GEN8_PML4ES_PER_PML4 : GEN8_LEGACY_PDPES)
#define PPAT_UNCACHED_INDEX (_PAGE_PWT | _PAGE_PCD)
#define PPAT_CACHED_PDE_INDEX 0 /* WB LLC */
@@ -250,6 +258,13 @@ struct i915_page_directory_pointer {
struct i915_page_directory **page_directory;
};
+struct i915_pml4 {
+ struct i915_page_dma base;
+
+ DECLARE_BITMAP(used_pml4es, GEN8_PML4ES_PER_PML4);
+ struct i915_page_directory_pointer *pdps[GEN8_PML4ES_PER_PML4];
+};
+
struct i915_address_space {
struct drm_mm mm;
struct drm_device *dev;
@@ -345,8 +360,9 @@ struct i915_hw_ppgtt {
struct drm_mm_node node;
unsigned long pd_dirty_rings;
union {
- struct i915_page_directory_pointer pdp;
- struct i915_page_directory pd;
+ struct i915_pml4 pml4; /* GEN8+ & 48b PPGTT */
+ struct i915_page_directory_pointer pdp; /* GEN8+ */
+ struct i915_page_directory pd; /* GEN6-7 */
};
struct drm_i915_file_private *file_priv;
--
2.5.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v7 07/19] drm/i915/gen8: implement alloc/free for 4lvl
2015-07-29 16:23 ` [PATCH v6 07/19] drm/i915/gen8: implement alloc/free for 4lvl Michel Thierry
@ 2015-07-30 10:05 ` Michel Thierry
2015-07-31 4:20 ` Goel, Akash
0 siblings, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-07-30 10:05 UTC (permalink / raw)
To: intel-gfx; +Cc: Akash Goel
PML4 has no special attributes, and there will always be a PML4.
So simply initialize it at creation, and destroy it at the end.
The code for 4lvl is able to call into the existing 3lvl page table code
to handle all of the lower levels.
v2: Return something at the end of gen8_alloc_va_range_4lvl to keep the
compiler happy. And define ret only in one place.
Updated gen8_ppgtt_unmap_pages and gen8_ppgtt_free to handle 4lvl.
v3: Use i915_dma_unmap_single instead of pci API. Fix a
couple of incorrect checks when unmapping pdp and pd pages (Akash).
v4: Call __pdp_fini also for 32b PPGTT. Clean up alloc_pdp param list.
v5: Prevent (harmless) out of range access in gen8_for_each_pml4e.
v6: Simplify alloc_vma_range_4lvl and gen8_ppgtt_init_common error
paths. (Akash)
v7: Rebase, s/gen8_ppgtt_free_*/gen8_ppgtt_cleanup_*/.
v8: Change location of pml4_init/fini. It will make next patches
cleaner.
v9: Rebase after Mika's ppgtt cleanup / scratch merge patch series, while
trying to reuse as much as possible for pdp alloc. pml4_init/fini
replaced by setup/cleanup_px macros.
v10: Rebase after Mika's merged ppgtt cleanup patch series.
v11: Rebase after final merged version of Mika's ppgtt/scratch
patches.
v12: Fix pdpe start value in trace (Akash)
v13: Define all 4lvl functions in this patch directly, instead of
previous patches, add i915_page_directory_pointer_entry_alloc here,
use test_bit to detect when pdp is already allocated (Akash).
v14: Move pdp allocation into a new gen8_ppgtt_alloc_page_dirpointers
funtion, as we do for pds and pts; move pd and pdp setup functions to
this patch (Akash).
v15: Added kfree(pdp) from previous patch to this (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 239 +++++++++++++++++++++++++++++++++---
drivers/gpu/drm/i915/i915_gem_gtt.h | 15 ++-
drivers/gpu/drm/i915/i915_trace.h | 8 ++
3 files changed, 246 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 3288154..c498eaa 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -210,6 +210,9 @@ static gen8_pde_t gen8_pde_encode(const dma_addr_t addr,
return pde;
}
+#define gen8_pdpe_encode gen8_pde_encode
+#define gen8_pml4e_encode gen8_pde_encode
+
static gen6_pte_t snb_pte_encode(dma_addr_t addr,
enum i915_cache_level level,
bool valid, u32 unused)
@@ -559,10 +562,73 @@ static void __pdp_fini(struct i915_page_directory_pointer *pdp)
pdp->page_directory = NULL;
}
+static struct
+i915_page_directory_pointer *alloc_pdp(struct drm_device *dev)
+{
+ struct i915_page_directory_pointer *pdp;
+ int ret = -ENOMEM;
+
+ WARN_ON(!USES_FULL_48BIT_PPGTT(dev));
+
+ pdp = kzalloc(sizeof(*pdp), GFP_KERNEL);
+ if (!pdp)
+ return ERR_PTR(-ENOMEM);
+
+ ret = __pdp_init(dev, pdp);
+ if (ret)
+ goto fail_bitmap;
+
+ ret = setup_px(dev, pdp);
+ if (ret)
+ goto fail_page_m;
+
+ return pdp;
+
+fail_page_m:
+ __pdp_fini(pdp);
+fail_bitmap:
+ kfree(pdp);
+
+ return ERR_PTR(ret);
+}
+
static void free_pdp(struct drm_device *dev,
struct i915_page_directory_pointer *pdp)
{
__pdp_fini(pdp);
+ if (USES_FULL_48BIT_PPGTT(dev)) {
+ cleanup_px(dev, pdp);
+ kfree(pdp);
+ }
+}
+
+static void
+gen8_setup_page_directory(struct i915_hw_ppgtt *ppgtt,
+ struct i915_page_directory_pointer *pdp,
+ struct i915_page_directory *pd,
+ int index)
+{
+ gen8_ppgtt_pdpe_t *page_directorypo;
+
+ if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
+ return;
+
+ page_directorypo = kmap_px(pdp);
+ page_directorypo[index] = gen8_pdpe_encode(px_dma(pd), I915_CACHE_LLC);
+ kunmap_px(ppgtt, page_directorypo);
+}
+
+static void
+gen8_setup_page_directory_pointer(struct i915_hw_ppgtt *ppgtt,
+ struct i915_pml4 *pml4,
+ struct i915_page_directory_pointer *pdp,
+ int index)
+{
+ gen8_ppgtt_pml4e_t *pagemap = kmap_px(pml4);
+
+ WARN_ON(!USES_FULL_48BIT_PPGTT(ppgtt->base.dev));
+ pagemap[index] = gen8_pml4e_encode(px_dma(pdp), I915_CACHE_LLC);
+ kunmap_px(ppgtt, pagemap);
}
/* Broadwell Page Directory Pointer Descriptors */
@@ -785,12 +851,9 @@ static void gen8_free_scratch(struct i915_address_space *vm)
free_scratch_page(dev, vm->scratch_page);
}
-static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
+static void gen8_ppgtt_cleanup_3lvl(struct drm_device *dev,
+ struct i915_page_directory_pointer *pdp)
{
- struct i915_hw_ppgtt *ppgtt =
- container_of(vm, struct i915_hw_ppgtt, base);
- struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
- struct drm_device *dev = ppgtt->base.dev;
int i;
for_each_set_bit(i, pdp->used_pdpes, I915_PDPES_PER_PDP(dev)) {
@@ -802,6 +865,31 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
}
free_pdp(dev, pdp);
+}
+
+static void gen8_ppgtt_cleanup_4lvl(struct i915_hw_ppgtt *ppgtt)
+{
+ int i;
+
+ for_each_set_bit(i, ppgtt->pml4.used_pml4es, GEN8_PML4ES_PER_PML4) {
+ if (WARN_ON(!ppgtt->pml4.pdps[i]))
+ continue;
+
+ gen8_ppgtt_cleanup_3lvl(ppgtt->base.dev, ppgtt->pml4.pdps[i]);
+ }
+
+ cleanup_px(ppgtt->base.dev, &ppgtt->pml4);
+}
+
+static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
+{
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+
+ if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
+ gen8_ppgtt_cleanup_3lvl(ppgtt->base.dev, &ppgtt->pdp);
+ else
+ gen8_ppgtt_cleanup_4lvl(ppgtt);
gen8_free_scratch(vm);
}
@@ -923,6 +1011,60 @@ unwind_out:
return -ENOMEM;
}
+/**
+ * gen8_ppgtt_alloc_page_dirpointers() - Allocate pdps for VA range.
+ * @vm: Master vm structure.
+ * @pml4: Page map level 4 for this address range.
+ * @start: Starting virtual address to begin allocations.
+ * @length: Size of the allocations.
+ * @new_pdps: Bitmap set by function with new allocations. Likely used by the
+ * caller to free on error.
+ *
+ * Allocate the required number of page directory pointers. Extremely similar to
+ * gen8_ppgtt_alloc_page_directories() and gen8_ppgtt_alloc_pagetabs().
+ * The main difference is here we are limited by the pml4 boundary (instead of
+ * the page directory pointer).
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int
+gen8_ppgtt_alloc_page_dirpointers(struct i915_address_space *vm,
+ struct i915_pml4 *pml4,
+ uint64_t start,
+ uint64_t length,
+ unsigned long *new_pdps)
+{
+ struct drm_device *dev = vm->dev;
+ struct i915_page_directory_pointer *pdp;
+ uint64_t temp;
+ uint32_t pml4e;
+
+ WARN_ON(!bitmap_empty(new_pdps, GEN8_PML4ES_PER_PML4));
+
+ gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+ if (!test_bit(pml4e, pml4->used_pml4es)) {
+ pdp = alloc_pdp(dev);
+ if (IS_ERR(pdp))
+ goto unwind_out;
+
+ pml4->pdps[pml4e] = pdp;
+ __set_bit(pml4e, new_pdps);
+ trace_i915_page_directory_pointer_entry_alloc(vm,
+ pml4e,
+ start,
+ GEN8_PML4E_SHIFT);
+ }
+ }
+
+ return 0;
+
+unwind_out:
+ for_each_set_bit(pml4e, new_pdps, GEN8_PML4ES_PER_PML4)
+ free_pdp(dev, pml4->pdps[pml4e]);
+
+ return -ENOMEM;
+}
+
static void
free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
uint32_t pdpes)
@@ -984,14 +1126,15 @@ static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
}
-static int gen8_alloc_va_range(struct i915_address_space *vm,
- uint64_t start, uint64_t length)
+static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
+ struct i915_page_directory_pointer *pdp,
+ uint64_t start,
+ uint64_t length)
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
unsigned long *new_page_dirs, **new_page_tables;
struct drm_device *dev = vm->dev;
- struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
struct i915_page_directory *pd;
const uint64_t orig_start = start;
const uint64_t orig_length = length;
@@ -1072,6 +1215,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
kunmap_px(ppgtt, page_directory);
__set_bit(pdpe, pdp->used_pdpes);
+ gen8_setup_page_directory(ppgtt, pdp, pd, pdpe);
}
free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1092,6 +1236,68 @@ err_out:
return ret;
}
+static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
+ struct i915_pml4 *pml4,
+ uint64_t start,
+ uint64_t length)
+{
+ DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4);
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+ struct i915_page_directory_pointer *pdp;
+ uint64_t temp, pml4e;
+ int ret = 0;
+
+ /* Do the pml4 allocations first, so we don't need to track the newly
+ * allocated tables below the pdp */
+ bitmap_zero(new_pdps, GEN8_PML4ES_PER_PML4);
+
+ /* The pagedirectory and pagetable allocations are done in the shared 3
+ * and 4 level code. Just allocate the pdps.
+ */
+ ret = gen8_ppgtt_alloc_page_dirpointers(vm, pml4, start, length,
+ new_pdps);
+ if (ret)
+ return ret;
+
+ WARN(bitmap_weight(new_pdps, GEN8_PML4ES_PER_PML4) > 2,
+ "The allocation has spanned more than 512GB. "
+ "It is highly likely this is incorrect.");
+
+ gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+ WARN_ON(!pdp);
+
+ ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
+ if (ret)
+ goto err_out;
+
+ gen8_setup_page_directory_pointer(ppgtt, pml4, pdp, pml4e);
+ }
+
+ bitmap_or(pml4->used_pml4es, new_pdps, pml4->used_pml4es,
+ GEN8_PML4ES_PER_PML4);
+
+ return 0;
+
+err_out:
+ for_each_set_bit(pml4e, new_pdps, GEN8_PML4ES_PER_PML4)
+ gen8_ppgtt_cleanup_3lvl(vm->dev, pml4->pdps[pml4e]);
+
+ return ret;
+}
+
+static int gen8_alloc_va_range(struct i915_address_space *vm,
+ uint64_t start, uint64_t length)
+{
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+
+ if (USES_FULL_48BIT_PPGTT(vm->dev))
+ return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+ else
+ return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
+}
+
/*
* GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
* with a net effect resembling a 2-level page table in normal x86 terms. Each
@@ -1117,9 +1323,14 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->switch_mm = gen8_mm_switch;
- if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
- ret = __pdp_init(false, &ppgtt->pdp);
+ if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+ ret = setup_px(ppgtt->base.dev, &ppgtt->pml4);
+ if (ret)
+ goto free_scratch;
+ ppgtt->base.total = 1ULL << 48;
+ } else {
+ ret = __pdp_init(false, &ppgtt->pdp);
if (ret)
goto free_scratch;
@@ -1131,10 +1342,10 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
* 2GiB).
*/
ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
- } else {
- ppgtt->base.total = 1ULL << 48;
- ret = -EPERM; /* Not yet implemented */
- goto free_scratch;
+
+ trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base,
+ 0, 0,
+ GEN8_PML4E_SHIFT);
}
return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 04bc66f..11d44b3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -39,6 +39,8 @@ struct drm_i915_file_private;
typedef uint32_t gen6_pte_t;
typedef uint64_t gen8_pte_t;
typedef uint64_t gen8_pde_t;
+typedef uint64_t gen8_ppgtt_pdpe_t;
+typedef uint64_t gen8_ppgtt_pml4e_t;
#define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
@@ -95,6 +97,7 @@ typedef uint64_t gen8_pde_t;
*/
#define GEN8_PML4ES_PER_PML4 512
#define GEN8_PML4E_SHIFT 39
+#define GEN8_PML4E_MASK (GEN8_PML4ES_PER_PML4 - 1)
#define GEN8_PDPE_SHIFT 30
/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
* tables */
@@ -465,6 +468,15 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
temp = min(temp, length), \
start += temp, length -= temp)
+#define gen8_for_each_pml4e(pdp, pml4, start, length, temp, iter) \
+ for (iter = gen8_pml4e_index(start); \
+ pdp = (pml4)->pdps[iter], \
+ length > 0 && iter < GEN8_PML4ES_PER_PML4; \
+ iter++, \
+ temp = ALIGN(start+1, 1ULL << GEN8_PML4E_SHIFT) - start, \
+ temp = min(temp, length), \
+ start += temp, length -= temp)
+
static inline uint32_t gen8_pte_index(uint64_t address)
{
return i915_pte_index(address, GEN8_PDE_SHIFT);
@@ -482,8 +494,7 @@ static inline uint32_t gen8_pdpe_index(uint64_t address)
static inline uint32_t gen8_pml4e_index(uint64_t address)
{
- WARN_ON(1); /* For 64B */
- return 0;
+ return (address >> GEN8_PML4E_SHIFT) & GEN8_PML4E_MASK;
}
static inline size_t gen8_pte_count(uint64_t address, uint64_t length)
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index f230d76..e6b5c74 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -221,6 +221,14 @@ DEFINE_EVENT_PRINT(i915_px_entry, i915_page_directory_entry_alloc,
__entry->vm, __entry->px, __entry->start, __entry->end)
);
+DEFINE_EVENT_PRINT(i915_px_entry, i915_page_directory_pointer_entry_alloc,
+ TP_PROTO(struct i915_address_space *vm, u32 pml4e, u64 start, u64 pml4e_shift),
+ TP_ARGS(vm, pml4e, start, pml4e_shift),
+
+ TP_printk("vm=%p, pml4e=%d (0x%llx-0x%llx)",
+ __entry->vm, __entry->px, __entry->start, __entry->end)
+);
+
/* Avoid extra math because we only support two sizes. The format is defined by
* bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
#define TRACE_PT_SIZE(bits) \
--
2.5.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v7 08/19] drm/i915/gen8: Add 4 level switching infrastructure and lrc support
2015-07-29 16:23 ` [PATCH v6 08/19] drm/i915/gen8: Add 4 level switching infrastructure and lrc support Michel Thierry
2015-07-30 4:14 ` Goel, Akash
@ 2015-07-30 10:06 ` Michel Thierry
2015-07-31 4:23 ` Goel, Akash
1 sibling, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-07-30 10:06 UTC (permalink / raw)
To: intel-gfx; +Cc: Akash Goel
In 64b (48bit canonical) PPGTT addressing, the PDP0 register contains
the base address to PML4, while the other PDP registers are ignored.
In LRC, the addressing mode must be specified in every context
descriptor, and the base address to PML4 is stored in the reg state.
v2: PML4 update in legacy context switch is left for historic reasons,
the preferred mode of operation is with lrc context based submission.
v3: s/gen8_map_page_directory/gen8_setup_page_directory and
s/gen8_map_page_directory_pointer/gen8_setup_page_directory_pointer.
Also, clflush will be needed for bxt. (Akash)
v4: Squashed lrc-specific code and use a macro to set PML4 register.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
PDP update in bb_start is only for legacy 32b mode.
v6: Rebase after final merged version of Mika's ppgtt/scratch
patches.
v7: There is no need to update the pml4 register value in
execlists_update_context. (Akash)
v8: Move pd and pdp setup functions to a previous patch, they do not
belong here. (Akash)
v9: Check USES_FULL_48BIT_PPGTT instead of GEN8_CTX_ADDRESSING_MODE in
gen8_emit_bb_start to check if emit pdps is needed. (Akash)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 17 +++++++----
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/intel_lrc.c | 60 ++++++++++++++++++++++++++-----------
3 files changed, 55 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index c498eaa..ae2e082 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -656,8 +656,8 @@ static int gen8_write_pdp(struct drm_i915_gem_request *req,
return 0;
}
-static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
- struct drm_i915_gem_request *req)
+static int gen8_legacy_mm_switch(struct i915_hw_ppgtt *ppgtt,
+ struct drm_i915_gem_request *req)
{
int i, ret;
@@ -672,6 +672,12 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
return 0;
}
+static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
+ struct drm_i915_gem_request *req)
+{
+ return gen8_write_pdp(req, 0, px_dma(&ppgtt->pml4));
+}
+
static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
struct i915_page_directory_pointer *pdp,
uint64_t start,
@@ -1321,14 +1327,13 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->base.unbind_vma = ppgtt_unbind_vma;
ppgtt->base.bind_vma = ppgtt_bind_vma;
- ppgtt->switch_mm = gen8_mm_switch;
-
if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
ret = setup_px(ppgtt->base.dev, &ppgtt->pml4);
if (ret)
goto free_scratch;
ppgtt->base.total = 1ULL << 48;
+ ppgtt->switch_mm = gen8_48b_mm_switch;
} else {
ret = __pdp_init(false, &ppgtt->pdp);
if (ret)
@@ -1343,6 +1348,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
*/
ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
+ ppgtt->switch_mm = gen8_legacy_mm_switch;
trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base,
0, 0,
GEN8_PML4E_SHIFT);
@@ -1540,8 +1546,9 @@ static void gen8_ppgtt_enable(struct drm_device *dev)
int j;
for_each_ring(ring, dev_priv, j) {
+ u32 four_level = USES_FULL_48BIT_PPGTT(dev) ? GEN8_GFX_PPGTT_48B : 0;
I915_WRITE(RING_MODE_GEN7(ring),
- _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
+ _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE | four_level));
}
}
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 3a77678..5bd1b6a 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1670,6 +1670,7 @@ enum skl_disp_power_wells {
#define GFX_REPLAY_MODE (1<<11)
#define GFX_PSMI_GRANULARITY (1<<10)
#define GFX_PPGTT_ENABLE (1<<9)
+#define GEN8_GFX_PPGTT_48B (1<<7)
#define VLV_DISPLAY_BASE 0x180000
#define VLV_MIPI_BASE VLV_DISPLAY_BASE
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 99bba8e..4c40614 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -196,13 +196,21 @@
reg_state[CTX_PDP ## n ## _LDW+1] = lower_32_bits(_addr); \
}
+#define ASSIGN_CTX_PML4(ppgtt, reg_state) { \
+ reg_state[CTX_PDP0_UDW + 1] = upper_32_bits(px_dma(&ppgtt->pml4)); \
+ reg_state[CTX_PDP0_LDW + 1] = lower_32_bits(px_dma(&ppgtt->pml4)); \
+}
+
enum {
ADVANCED_CONTEXT = 0,
- LEGACY_CONTEXT,
+ LEGACY_32B_CONTEXT,
ADVANCED_AD_CONTEXT,
LEGACY_64B_CONTEXT
};
-#define GEN8_CTX_MODE_SHIFT 3
+#define GEN8_CTX_ADDRESSING_MODE_SHIFT 3
+#define GEN8_CTX_ADDRESSING_MODE(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
+ LEGACY_64B_CONTEXT :\
+ LEGACY_32B_CONTEXT)
enum {
FAULT_AND_HANG = 0,
FAULT_AND_HALT, /* Debug only */
@@ -273,7 +281,7 @@ static uint64_t execlists_ctx_descriptor(struct drm_i915_gem_request *rq)
WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
desc = GEN8_CTX_VALID;
- desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
+ desc |= GEN8_CTX_ADDRESSING_MODE(dev) << GEN8_CTX_ADDRESSING_MODE_SHIFT;
if (IS_GEN8(ctx_obj->base.dev))
desc |= GEN8_CTX_L3LLC_COHERENT;
desc |= GEN8_CTX_PRIVILEGE;
@@ -348,10 +356,12 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
reg_state[CTX_RING_TAIL+1] = rq->tail;
reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(rb_obj);
- /* True PPGTT with dynamic page allocation: update PDP registers and
- * point the unallocated PDPs to the scratch page
- */
- if (ppgtt) {
+ if (ppgtt && !USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+ /* True 32b PPGTT with dynamic page allocation: update PDP
+ * registers and point the unallocated PDPs to scratch page.
+ * PML4 is allocated during ppgtt init, so this is not needed
+ * in 48-bit mode.
+ */
ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
@@ -1512,12 +1522,15 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
* Ideally, we should set Force PD Restore in ctx descriptor,
* but we can't. Force Restore would be a second option, but
* it is unsafe in case of lite-restore (because the ctx is
- * not idle). */
+ * not idle). PML4 is allocated during ppgtt init so this is
+ * not needed in 48-bit.*/
if (req->ctx->ppgtt &&
(intel_ring_flag(req->ring) & req->ctx->ppgtt->pd_dirty_rings)) {
- ret = intel_logical_ring_emit_pdps(req);
- if (ret)
- return ret;
+ if (!USES_FULL_48BIT_PPGTT(req->i915)) {
+ ret = intel_logical_ring_emit_pdps(req);
+ if (ret)
+ return ret;
+ }
req->ctx->ppgtt->pd_dirty_rings &= ~intel_ring_flag(req->ring);
}
@@ -2198,13 +2211,24 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
- /* With dynamic page allocation, PDPs may not be allocated at this point,
- * Point the unallocated PDPs to the scratch page
- */
- ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
- ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
- ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
- ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
+ if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+ /* 64b PPGTT (48bit canonical)
+ * PDP0_DESCRIPTOR contains the base address to PML4 and
+ * other PDP Descriptors are ignored.
+ */
+ ASSIGN_CTX_PML4(ppgtt, reg_state);
+ } else {
+ /* 32b PPGTT
+ * PDP*_DESCRIPTOR contains the base address of space supported.
+ * With dynamic page allocation, PDPs may not be allocated at
+ * this point. Point the unallocated PDPs to the scratch page
+ */
+ ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
+ ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
+ ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
+ ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
+ }
+
if (ring->id == RCS) {
reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
reg_state[CTX_R_PWR_CLK_STATE] = GEN8_R_PWR_CLK_STATE;
--
2.5.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v7 18/19] drm/i915/gen8: Flip the 48b switch
2015-07-29 16:24 ` [PATCH v6 18/19] drm/i915/gen8: Flip the 48b switch Michel Thierry
2015-07-30 5:49 ` Goel, Akash
@ 2015-07-30 10:09 ` Michel Thierry
2015-07-31 12:13 ` [PATCH v8 " Michel Thierry
1 sibling, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-07-30 10:09 UTC (permalink / raw)
To: intel-gfx; +Cc: Akash Goel
Use 48b addresses if hw supports it (i915.enable_ppgtt=3).
Note, aliasing PPGTT remains 32b only.
v2: s/full_64b/full_48b/. (Akash)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 9 ++++-----
drivers/gpu/drm/i915/i915_params.c | 2 +-
2 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index c792591..31d20c6 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -104,12 +104,11 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
{
bool has_aliasing_ppgtt;
bool has_full_ppgtt;
- bool has_full_64bit_ppgtt;
+ bool has_full_48bit_ppgtt;
has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
- has_full_64bit_ppgtt = (IS_BROADWELL(dev) ||
- INTEL_INFO(dev)->gen >= 9) && false; /* FIXME: 64b */
+ has_full_48bit_ppgtt = IS_BROADWELL(dev) || INTEL_INFO(dev)->gen >= 9;
if (intel_vgpu_active(dev))
has_full_ppgtt = false; /* emulation is too hard */
@@ -128,7 +127,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
if (enable_ppgtt == 2 && has_full_ppgtt)
return 2;
- if (enable_ppgtt == 3 && has_full_64bit_ppgtt)
+ if (enable_ppgtt == 3 && has_full_48bit_ppgtt)
return 3;
#ifdef CONFIG_INTEL_IOMMU
@@ -147,7 +146,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
}
if (INTEL_INFO(dev)->gen >= 8 && i915.enable_execlists)
- return 2;
+ return has_full_48bit_ppgtt ? 3 : 2;
else
return has_aliasing_ppgtt ? 1 : 0;
}
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 5ae4b0a..900e48a 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -111,7 +111,7 @@ MODULE_PARM_DESC(enable_hangcheck,
module_param_named_unsafe(enable_ppgtt, i915.enable_ppgtt, int, 0400);
MODULE_PARM_DESC(enable_ppgtt,
"Override PPGTT usage. "
- "(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
+ "(-1=auto [default], 0=disabled, 1=aliasing, 2=full, 3=full_48b)");
module_param_named(enable_execlists, i915.enable_execlists, int, 0400);
MODULE_PARM_DESC(enable_execlists,
--
2.5.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* Re: [PATCH v6 00/19] 48-bit PPGTT
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (18 preceding siblings ...)
2015-07-29 16:24 ` [PATCH v6 19/19] drm/i915: Save some page table setup on repeated binds Michel Thierry
@ 2015-07-30 11:26 ` Chris Wilson
2015-07-30 11:52 ` Michel Thierry
2015-08-03 9:51 ` Michel Thierry
20 siblings, 1 reply; 82+ messages in thread
From: Chris Wilson @ 2015-07-30 11:26 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx, akash.goel
On Wed, Jul 29, 2015 at 05:23:44PM +0100, Michel Thierry wrote:
> This clean-up version delays the 48-bit work to later patches and includes
> more review comments from Akash and Chris. The first 5 patches prepare the
> dynamic page allocation code to handle independent pdps, but no specific
> code for 48-bit mode is added before the 5th patch.
>
> In order expand the GPU address space, a 4th level translation is added,
> the Page Map Level 4 (PML4). This PML4 has 512 PML4 Entries (PML4E),
> PML4[0-511], each pointing to a PDP. All the existing "dynamic alloc
> ppgtt" functions are used, only adding the 4th level changes. I also
> updated some remaining variables that were 32b only.
>
> There are 2 hardware workarounds needed to allow correct operation with
> 48b addresses (Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset).
> A flag (EXEC_OBJECT_SUPPORTS_48B_ADDRESS) will indicate if a given object can
> be allocated outside the first 4 PDPs; if not, the end range is forced to 4GB.
> Also, more objects now use the DRM_MM_CREATE_TOP flag. To maintain
> compatibility, in libdrm I added a new drm_intel_bo_emit_reloc_48bit function
> that will flag these objects, while the existing drm_intel_bo_emit_reloc
> clears it.
>
> Finally, this feature is only available in BDW and Gen9, requires LRC
> submission mode (execlists) and it can be detected by i915.enable_ppgtt=3.
>
> Also note that this expanded address space is only available for full
> PPGTT, aliasing PPGTT and Global GTT remain 32-bit.
>
> I'll resend the userland patches (libdrm/mesa) in a different patchset, there
> haven't been changes on them, but they require a rebase. I will also expand the
> ppgtt igt test per Chris suggestions.
Just a head's up, I haven't root caused this yet, but with
i915.enable_ppgtt=2 I started getting GPU hangs that didn't happen
before this series...
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 00/19] 48-bit PPGTT
2015-07-30 11:26 ` [PATCH v6 00/19] 48-bit PPGTT Chris Wilson
@ 2015-07-30 11:52 ` Michel Thierry
2015-07-30 12:13 ` Chris Wilson
2015-07-30 19:02 ` Chris Wilson
0 siblings, 2 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-30 11:52 UTC (permalink / raw)
To: Chris Wilson, intel-gfx, akash.goel
On 7/30/2015 12:26 PM, Chris Wilson wrote:
> On Wed, Jul 29, 2015 at 05:23:44PM +0100, Michel Thierry wrote:
>> This clean-up version delays the 48-bit work to later patches and includes
>> more review comments from Akash and Chris. The first 5 patches prepare the
>> dynamic page allocation code to handle independent pdps, but no specific
>> code for 48-bit mode is added before the 5th patch.
>>
>> In order expand the GPU address space, a 4th level translation is added,
>> the Page Map Level 4 (PML4). This PML4 has 512 PML4 Entries (PML4E),
>> PML4[0-511], each pointing to a PDP. All the existing "dynamic alloc
>> ppgtt" functions are used, only adding the 4th level changes. I also
>> updated some remaining variables that were 32b only.
>>
>> There are 2 hardware workarounds needed to allow correct operation with
>> 48b addresses (Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset).
>> A flag (EXEC_OBJECT_SUPPORTS_48B_ADDRESS) will indicate if a given object can
>> be allocated outside the first 4 PDPs; if not, the end range is forced to 4GB.
>> Also, more objects now use the DRM_MM_CREATE_TOP flag. To maintain
>> compatibility, in libdrm I added a new drm_intel_bo_emit_reloc_48bit function
>> that will flag these objects, while the existing drm_intel_bo_emit_reloc
>> clears it.
>>
>> Finally, this feature is only available in BDW and Gen9, requires LRC
>> submission mode (execlists) and it can be detected by i915.enable_ppgtt=3.
>>
>> Also note that this expanded address space is only available for full
>> PPGTT, aliasing PPGTT and Global GTT remain 32-bit.
>>
>> I'll resend the userland patches (libdrm/mesa) in a different patchset, there
>> haven't been changes on them, but they require a rebase. I will also expand the
>> ppgtt igt test per Chris suggestions.
>
> Just a head's up, I haven't root caused this yet, but with
> i915.enable_ppgtt=2 I started getting GPU hangs that didn't happen
> before this series...
> -Chris
>
Sounds like I screwed up something in the first 4 patches or in the
Wa32bit one. The rest of the changes are contained to 48-bit code.
Have you find a way to reproduce it?
Thanks,
-Michel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 00/19] 48-bit PPGTT
2015-07-30 11:52 ` Michel Thierry
@ 2015-07-30 12:13 ` Chris Wilson
2015-07-30 19:02 ` Chris Wilson
1 sibling, 0 replies; 82+ messages in thread
From: Chris Wilson @ 2015-07-30 12:13 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx, akash.goel
On Thu, Jul 30, 2015 at 12:52:19PM +0100, Michel Thierry wrote:
> On 7/30/2015 12:26 PM, Chris Wilson wrote:
> >Just a head's up, I haven't root caused this yet, but with
> >i915.enable_ppgtt=2 I started getting GPU hangs that didn't happen
> >before this series...
>
> Sounds like I screwed up something in the first 4 patches or in the
> Wa32bit one. The rest of the changes are contained to 48-bit code.
It's also likely to be bdw specific since I've been running the same
kernel on snb/ivb/hsw without issue. I just thought I would do a quick
compare of pggtt=3 against pggtt=2 when the problems started.
> Have you find a way to reproduce it?
It was in the middle of the ue4 Reflections demo, though it had run
through a sample of other tests seemingly without issue.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 00/19] 48-bit PPGTT
2015-07-30 11:52 ` Michel Thierry
2015-07-30 12:13 ` Chris Wilson
@ 2015-07-30 19:02 ` Chris Wilson
1 sibling, 0 replies; 82+ messages in thread
From: Chris Wilson @ 2015-07-30 19:02 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx, akash.goel
On Thu, Jul 30, 2015 at 12:52:19PM +0100, Michel Thierry wrote:
> Sounds like I screwed up something in the first 4 patches or in the
> Wa32bit one. The rest of the changes are contained to 48-bit code.
>
> Have you find a way to reproduce it?
Seems like no. Whatever happened this morning, it hasn't happened since
preping the tree for a bisect (recompiling an retesting last known
bad/good).
Panic over for the time being.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v7 04/19] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
2015-07-30 10:02 ` [PATCH v7 " Michel Thierry
@ 2015-07-31 4:00 ` Goel, Akash
0 siblings, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-31 4:00 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 7/30/2015 3:32 PM, Michel Thierry wrote:
> The insert_entries function was the function used to write PTEs. For the
> PPGTT it was "hardcoded" to only understand two level page tables, which
> was the case for GEN7. We can reuse this for 4 level page tables, and
> remove the concept of insert_entries, which was never viable past 2
> level page tables anyway, but it requires a bit of rework to make the
> function a bit more generic.
>
> v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
> v3: Rebase after final merged version of Mika's ppgtt/scratch patches.
> v4: Check and warn for NULL value of pdp pointer (Akash).
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2)
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 53 ++++++++++++++++++++++++++++---------
> 1 file changed, 41 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index bd56979..740ad5b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -600,23 +600,23 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
> return 0;
> }
>
> -static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> - uint64_t start,
> - uint64_t length,
> - bool use_scratch)
> +static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
> + struct i915_page_directory_pointer *pdp,
> + uint64_t start,
> + uint64_t length,
> + gen8_pte_t scratch_pte)
> {
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> - struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> - gen8_pte_t *pt_vaddr, scratch_pte;
> + gen8_pte_t *pt_vaddr;
> unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> unsigned num_entries = length >> PAGE_SHIFT;
> unsigned last_pte, i;
>
> - scratch_pte = gen8_pte_encode(px_dma(ppgtt->base.scratch_page),
> - I915_CACHE_LLC, use_scratch);
> + if (WARN_ON(!pdp))
> + return;
>
> while (num_entries) {
> struct i915_page_directory *pd;
> @@ -656,14 +656,30 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> }
> }
>
> -static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> - struct sg_table *pages,
> - uint64_t start,
> - enum i915_cache_level cache_level, u32 unused)
> +static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> + uint64_t start,
> + uint64_t length,
> + bool use_scratch)
> {
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> +
> + gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
> + I915_CACHE_LLC, use_scratch);
> +
> + gen8_ppgtt_clear_pte_range(vm, pdp, start, length, scratch_pte);
> +}
> +
> +static void
> +gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
> + struct i915_page_directory_pointer *pdp,
> + struct sg_table *pages,
> + uint64_t start,
> + enum i915_cache_level cache_level)
> +{
> + struct i915_hw_ppgtt *ppgtt =
> + container_of(vm, struct i915_hw_ppgtt, base);
> gen8_pte_t *pt_vaddr;
> unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> @@ -700,6 +716,19 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> kunmap_px(ppgtt, pt_vaddr);
> }
>
> +static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> + struct sg_table *pages,
> + uint64_t start,
> + enum i915_cache_level cache_level,
> + u32 unused)
> +{
> + struct i915_hw_ppgtt *ppgtt =
> + container_of(vm, struct i915_hw_ppgtt, base);
> + struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> +
> + gen8_ppgtt_insert_pte_entries(vm, pdp, pages, start, cache_level);
> +}
> +
> static void gen8_free_page_tables(struct drm_device *dev,
> struct i915_page_directory *pd)
> {
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v7 03/19] drm/i915/gen8: Abstract PDP usage
2015-07-30 10:02 ` [PATCH v7 " Michel Thierry
@ 2015-07-31 4:11 ` Goel, Akash
2015-08-05 15:33 ` Daniel Vetter
0 siblings, 1 reply; 82+ messages in thread
From: Goel, Akash @ 2015-07-31 4:11 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 7/30/2015 3:32 PM, Michel Thierry wrote:
> Up until now, ppgtt->pdp has always been the root of our page tables.
> Legacy 32b addresses acted like it had 1 PDP with 4 PDPEs.
>
> In preparation for 4 level page tables, we need to stop using ppgtt->pdp
> directly unless we know it's what we want. The future structure will use
> ppgtt->pml4 for the top level, and the pdp is just one of the entries
> being pointed to by a pml4e. The temporal pdp local variable will be
> removed once the rest of the 4-level code lands.
>
> Also, start passing the vm pointer to the alloc functions, instead of
> ppgtt.
>
> v2: Updated after dynamic page allocation changes.
> v3: Rebase after s/page_tables/page_table/.
> v4: Rebase after changes in "Dynamic page table allocations" patch.
> v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
> v6: Rebase after final merged version of Mika's ppgtt/scratch patches.
> v7: Keep pagetable map in-line (and avoid unnecessary for_each_pde
> loops), remove redundant ppgtt pointer in _alloc_pagetabs (Akash)
> v8: Fix text indentation in _alloc_pagetabs/page_directories (Chris)
> v9: Defer gen8_alloc_va_range_4lvl definition until 4lvl is implemented,
> clean-up gen8_ppgtt_cleanup [pun intended] (Akash).
> v10: Clean-up commit message (Akash).
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 84 +++++++++++++++++++------------------
> 1 file changed, 44 insertions(+), 40 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 28f3227..bd56979 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -607,6 +607,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> {
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> + struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> gen8_pte_t *pt_vaddr, scratch_pte;
> unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> @@ -621,10 +622,10 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> struct i915_page_directory *pd;
> struct i915_page_table *pt;
>
> - if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
> + if (WARN_ON(!pdp->page_directory[pdpe]))
> break;
>
> - pd = ppgtt->pdp.page_directory[pdpe];
> + pd = pdp->page_directory[pdpe];
>
> if (WARN_ON(!pd->page_table[pde]))
> break;
> @@ -662,6 +663,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> {
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> + struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> gen8_pte_t *pt_vaddr;
> unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> @@ -675,7 +677,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> break;
>
> if (pt_vaddr == NULL) {
> - struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
> + struct i915_page_directory *pd = pdp->page_directory[pdpe];
> struct i915_page_table *pt = pd->page_table[pde];
> pt_vaddr = kmap_px(pt);
> }
> @@ -755,28 +757,29 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> {
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> + struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> + struct drm_device *dev = ppgtt->base.dev;
> int i;
>
> - for_each_set_bit(i, ppgtt->pdp.used_pdpes,
> - I915_PDPES_PER_PDP(ppgtt->base.dev)) {
> - if (WARN_ON(!ppgtt->pdp.page_directory[i]))
> + for_each_set_bit(i, pdp->used_pdpes, I915_PDPES_PER_PDP(dev)) {
> + if (WARN_ON(!pdp->page_directory[i]))
> continue;
>
> - gen8_free_page_tables(ppgtt->base.dev,
> - ppgtt->pdp.page_directory[i]);
> - free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
> + gen8_free_page_tables(dev, pdp->page_directory[i]);
> + free_pd(dev, pdp->page_directory[i]);
> }
>
> - free_pdp(ppgtt->base.dev, &ppgtt->pdp);
> + free_pdp(dev, pdp);
> +
> gen8_free_scratch(vm);
> }
>
> /**
> * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
> - * @ppgtt: Master ppgtt structure.
> - * @pd: Page directory for this address range.
> + * @vm: Master vm structure.
> + * @pd: Page directory for this address range.
> * @start: Starting virtual address to begin allocations.
> - * @length Size of the allocations.
> + * @length: Size of the allocations.
> * @new_pts: Bitmap set by function with new allocations. Likely used by the
> * caller to free on error.
> *
> @@ -789,13 +792,13 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> *
> * Return: 0 if success; negative error code otherwise.
> */
> -static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
> +static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
> struct i915_page_directory *pd,
> uint64_t start,
> uint64_t length,
> unsigned long *new_pts)
> {
> - struct drm_device *dev = ppgtt->base.dev;
> + struct drm_device *dev = vm->dev;
> struct i915_page_table *pt;
> uint64_t temp;
> uint32_t pde;
> @@ -804,7 +807,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
> /* Don't reallocate page tables */
> if (test_bit(pde, pd->used_pdes)) {
> /* Scratch is never allocated this way */
> - WARN_ON(pt == ppgtt->base.scratch_pt);
> + WARN_ON(pt == vm->scratch_pt);
> continue;
> }
>
> @@ -812,7 +815,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
> if (IS_ERR(pt))
> goto unwind_out;
>
> - gen8_initialize_pt(&ppgtt->base, pt);
> + gen8_initialize_pt(vm, pt);
> pd->page_table[pde] = pt;
> __set_bit(pde, new_pts);
> }
> @@ -828,11 +831,11 @@ unwind_out:
>
> /**
> * gen8_ppgtt_alloc_page_directories() - Allocate page directories for VA range.
> - * @ppgtt: Master ppgtt structure.
> + * @vm: Master vm structure.
> * @pdp: Page directory pointer for this address range.
> * @start: Starting virtual address to begin allocations.
> - * @length Size of the allocations.
> - * @new_pds Bitmap set by function with new allocations. Likely used by the
> + * @length: Size of the allocations.
> + * @new_pds: Bitmap set by function with new allocations. Likely used by the
> * caller to free on error.
> *
> * Allocate the required number of page directories starting at the pde index of
> @@ -849,13 +852,14 @@ unwind_out:
> *
> * Return: 0 if success; negative error code otherwise.
> */
> -static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
> - struct i915_page_directory_pointer *pdp,
> - uint64_t start,
> - uint64_t length,
> - unsigned long *new_pds)
> +static int
> +gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
> + struct i915_page_directory_pointer *pdp,
> + uint64_t start,
> + uint64_t length,
> + unsigned long *new_pds)
> {
> - struct drm_device *dev = ppgtt->base.dev;
> + struct drm_device *dev = vm->dev;
> struct i915_page_directory *pd;
> uint64_t temp;
> uint32_t pdpe;
> @@ -871,7 +875,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
> if (IS_ERR(pd))
> goto unwind_out;
>
> - gen8_initialize_pd(&ppgtt->base, pd);
> + gen8_initialize_pd(vm, pd);
> pdp->page_directory[pdpe] = pd;
> __set_bit(pdpe, new_pds);
> }
> @@ -947,18 +951,19 @@ static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
> }
>
> static int gen8_alloc_va_range(struct i915_address_space *vm,
> - uint64_t start,
> - uint64_t length)
> + uint64_t start, uint64_t length)
> {
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> unsigned long *new_page_dirs, **new_page_tables;
> + struct drm_device *dev = vm->dev;
> + struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> struct i915_page_directory *pd;
> const uint64_t orig_start = start;
> const uint64_t orig_length = length;
> uint64_t temp;
> uint32_t pdpe;
> - uint32_t pdpes = I915_PDPES_PER_PDP(ppgtt->base.dev);
> + uint32_t pdpes = I915_PDPES_PER_PDP(dev);
> int ret;
>
> /* Wrap is never okay since we can only represent 48b, and we don't
> @@ -967,7 +972,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> if (WARN_ON(start + length < start))
> return -ENODEV;
>
> - if (WARN_ON(start + length > ppgtt->base.total))
> + if (WARN_ON(start + length > vm->total))
> return -ENODEV;
>
> ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables, pdpes);
> @@ -975,16 +980,16 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> return ret;
>
> /* Do the allocations first so we can easily bail out */
> - ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
> - new_page_dirs);
> + ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length,
> + new_page_dirs);
> if (ret) {
> free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
> return ret;
> }
>
> /* For every page directory referenced, allocate page tables */
> - gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
> - ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
> + gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
> + ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length,
> new_page_tables[pdpe]);
> if (ret)
> goto err_out;
> @@ -995,7 +1000,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>
> /* Allocations have completed successfully, so set the bitmaps, and do
> * the mappings. */
> - gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
> + gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
> gen8_pde_t *const page_directory = kmap_px(pd);
> struct i915_page_table *pt;
> uint64_t pd_len = length;
> @@ -1028,8 +1033,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> }
>
> kunmap_px(ppgtt, page_directory);
> -
> - __set_bit(pdpe, ppgtt->pdp.used_pdpes);
> + __set_bit(pdpe, pdp->used_pdpes);
> }
>
> free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
> @@ -1039,11 +1043,11 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> err_out:
> while (pdpe--) {
> for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES)
> - free_pt(vm->dev, ppgtt->pdp.page_directory[pdpe]->page_table[temp]);
> + free_pt(dev, pdp->page_directory[pdpe]->page_table[temp]);
> }
>
> for_each_set_bit(pdpe, new_page_dirs, pdpes)
> - free_pd(vm->dev, ppgtt->pdp.page_directory[pdpe]);
> + free_pd(dev, pdp->page_directory[pdpe]);
>
> free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
> mark_tlbs_dirty(ppgtt);
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v7 07/19] drm/i915/gen8: implement alloc/free for 4lvl
2015-07-30 10:05 ` [PATCH v7 " Michel Thierry
@ 2015-07-31 4:20 ` Goel, Akash
0 siblings, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-31 4:20 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 7/30/2015 3:35 PM, Michel Thierry wrote:
> PML4 has no special attributes, and there will always be a PML4.
> So simply initialize it at creation, and destroy it at the end.
>
> The code for 4lvl is able to call into the existing 3lvl page table code
> to handle all of the lower levels.
>
> v2: Return something at the end of gen8_alloc_va_range_4lvl to keep the
> compiler happy. And define ret only in one place.
> Updated gen8_ppgtt_unmap_pages and gen8_ppgtt_free to handle 4lvl.
> v3: Use i915_dma_unmap_single instead of pci API. Fix a
> couple of incorrect checks when unmapping pdp and pd pages (Akash).
> v4: Call __pdp_fini also for 32b PPGTT. Clean up alloc_pdp param list.
> v5: Prevent (harmless) out of range access in gen8_for_each_pml4e.
> v6: Simplify alloc_vma_range_4lvl and gen8_ppgtt_init_common error
> paths. (Akash)
> v7: Rebase, s/gen8_ppgtt_free_*/gen8_ppgtt_cleanup_*/.
> v8: Change location of pml4_init/fini. It will make next patches
> cleaner.
> v9: Rebase after Mika's ppgtt cleanup / scratch merge patch series, while
> trying to reuse as much as possible for pdp alloc. pml4_init/fini
> replaced by setup/cleanup_px macros.
> v10: Rebase after Mika's merged ppgtt cleanup patch series.
> v11: Rebase after final merged version of Mika's ppgtt/scratch
> patches.
> v12: Fix pdpe start value in trace (Akash)
> v13: Define all 4lvl functions in this patch directly, instead of
> previous patches, add i915_page_directory_pointer_entry_alloc here,
> use test_bit to detect when pdp is already allocated (Akash).
> v14: Move pdp allocation into a new gen8_ppgtt_alloc_page_dirpointers
> funtion, as we do for pds and pts; move pd and pdp setup functions to
> this patch (Akash).
> v15: Added kfree(pdp) from previous patch to this (Akash).
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 239 +++++++++++++++++++++++++++++++++---
> drivers/gpu/drm/i915/i915_gem_gtt.h | 15 ++-
> drivers/gpu/drm/i915/i915_trace.h | 8 ++
> 3 files changed, 246 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 3288154..c498eaa 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -210,6 +210,9 @@ static gen8_pde_t gen8_pde_encode(const dma_addr_t addr,
> return pde;
> }
>
> +#define gen8_pdpe_encode gen8_pde_encode
> +#define gen8_pml4e_encode gen8_pde_encode
> +
> static gen6_pte_t snb_pte_encode(dma_addr_t addr,
> enum i915_cache_level level,
> bool valid, u32 unused)
> @@ -559,10 +562,73 @@ static void __pdp_fini(struct i915_page_directory_pointer *pdp)
> pdp->page_directory = NULL;
> }
>
> +static struct
> +i915_page_directory_pointer *alloc_pdp(struct drm_device *dev)
> +{
> + struct i915_page_directory_pointer *pdp;
> + int ret = -ENOMEM;
> +
> + WARN_ON(!USES_FULL_48BIT_PPGTT(dev));
> +
> + pdp = kzalloc(sizeof(*pdp), GFP_KERNEL);
> + if (!pdp)
> + return ERR_PTR(-ENOMEM);
> +
> + ret = __pdp_init(dev, pdp);
> + if (ret)
> + goto fail_bitmap;
> +
> + ret = setup_px(dev, pdp);
> + if (ret)
> + goto fail_page_m;
> +
> + return pdp;
> +
> +fail_page_m:
> + __pdp_fini(pdp);
> +fail_bitmap:
> + kfree(pdp);
> +
> + return ERR_PTR(ret);
> +}
> +
> static void free_pdp(struct drm_device *dev,
> struct i915_page_directory_pointer *pdp)
> {
> __pdp_fini(pdp);
> + if (USES_FULL_48BIT_PPGTT(dev)) {
> + cleanup_px(dev, pdp);
> + kfree(pdp);
> + }
> +}
> +
> +static void
> +gen8_setup_page_directory(struct i915_hw_ppgtt *ppgtt,
> + struct i915_page_directory_pointer *pdp,
> + struct i915_page_directory *pd,
> + int index)
> +{
> + gen8_ppgtt_pdpe_t *page_directorypo;
> +
> + if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
> + return;
> +
> + page_directorypo = kmap_px(pdp);
> + page_directorypo[index] = gen8_pdpe_encode(px_dma(pd), I915_CACHE_LLC);
> + kunmap_px(ppgtt, page_directorypo);
> +}
> +
> +static void
> +gen8_setup_page_directory_pointer(struct i915_hw_ppgtt *ppgtt,
> + struct i915_pml4 *pml4,
> + struct i915_page_directory_pointer *pdp,
> + int index)
> +{
> + gen8_ppgtt_pml4e_t *pagemap = kmap_px(pml4);
> +
> + WARN_ON(!USES_FULL_48BIT_PPGTT(ppgtt->base.dev));
> + pagemap[index] = gen8_pml4e_encode(px_dma(pdp), I915_CACHE_LLC);
> + kunmap_px(ppgtt, pagemap);
> }
>
> /* Broadwell Page Directory Pointer Descriptors */
> @@ -785,12 +851,9 @@ static void gen8_free_scratch(struct i915_address_space *vm)
> free_scratch_page(dev, vm->scratch_page);
> }
>
> -static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> +static void gen8_ppgtt_cleanup_3lvl(struct drm_device *dev,
> + struct i915_page_directory_pointer *pdp)
> {
> - struct i915_hw_ppgtt *ppgtt =
> - container_of(vm, struct i915_hw_ppgtt, base);
> - struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> - struct drm_device *dev = ppgtt->base.dev;
> int i;
>
> for_each_set_bit(i, pdp->used_pdpes, I915_PDPES_PER_PDP(dev)) {
> @@ -802,6 +865,31 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> }
>
> free_pdp(dev, pdp);
> +}
> +
> +static void gen8_ppgtt_cleanup_4lvl(struct i915_hw_ppgtt *ppgtt)
> +{
> + int i;
> +
> + for_each_set_bit(i, ppgtt->pml4.used_pml4es, GEN8_PML4ES_PER_PML4) {
> + if (WARN_ON(!ppgtt->pml4.pdps[i]))
> + continue;
> +
> + gen8_ppgtt_cleanup_3lvl(ppgtt->base.dev, ppgtt->pml4.pdps[i]);
> + }
> +
> + cleanup_px(ppgtt->base.dev, &ppgtt->pml4);
> +}
> +
> +static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> +{
> + struct i915_hw_ppgtt *ppgtt =
> + container_of(vm, struct i915_hw_ppgtt, base);
> +
> + if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
> + gen8_ppgtt_cleanup_3lvl(ppgtt->base.dev, &ppgtt->pdp);
> + else
> + gen8_ppgtt_cleanup_4lvl(ppgtt);
>
> gen8_free_scratch(vm);
> }
> @@ -923,6 +1011,60 @@ unwind_out:
> return -ENOMEM;
> }
>
> +/**
> + * gen8_ppgtt_alloc_page_dirpointers() - Allocate pdps for VA range.
> + * @vm: Master vm structure.
> + * @pml4: Page map level 4 for this address range.
> + * @start: Starting virtual address to begin allocations.
> + * @length: Size of the allocations.
> + * @new_pdps: Bitmap set by function with new allocations. Likely used by the
> + * caller to free on error.
> + *
> + * Allocate the required number of page directory pointers. Extremely similar to
> + * gen8_ppgtt_alloc_page_directories() and gen8_ppgtt_alloc_pagetabs().
> + * The main difference is here we are limited by the pml4 boundary (instead of
> + * the page directory pointer).
> + *
> + * Return: 0 if success; negative error code otherwise.
> + */
> +static int
> +gen8_ppgtt_alloc_page_dirpointers(struct i915_address_space *vm,
> + struct i915_pml4 *pml4,
> + uint64_t start,
> + uint64_t length,
> + unsigned long *new_pdps)
> +{
> + struct drm_device *dev = vm->dev;
> + struct i915_page_directory_pointer *pdp;
> + uint64_t temp;
> + uint32_t pml4e;
> +
> + WARN_ON(!bitmap_empty(new_pdps, GEN8_PML4ES_PER_PML4));
> +
> + gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
> + if (!test_bit(pml4e, pml4->used_pml4es)) {
> + pdp = alloc_pdp(dev);
> + if (IS_ERR(pdp))
> + goto unwind_out;
> +
> + pml4->pdps[pml4e] = pdp;
> + __set_bit(pml4e, new_pdps);
> + trace_i915_page_directory_pointer_entry_alloc(vm,
> + pml4e,
> + start,
> + GEN8_PML4E_SHIFT);
> + }
> + }
> +
> + return 0;
> +
> +unwind_out:
> + for_each_set_bit(pml4e, new_pdps, GEN8_PML4ES_PER_PML4)
> + free_pdp(dev, pml4->pdps[pml4e]);
> +
> + return -ENOMEM;
> +}
> +
> static void
> free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
> uint32_t pdpes)
> @@ -984,14 +1126,15 @@ static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
> ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->base.dev)->ring_mask;
> }
>
> -static int gen8_alloc_va_range(struct i915_address_space *vm,
> - uint64_t start, uint64_t length)
> +static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
> + struct i915_page_directory_pointer *pdp,
> + uint64_t start,
> + uint64_t length)
> {
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> unsigned long *new_page_dirs, **new_page_tables;
> struct drm_device *dev = vm->dev;
> - struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> struct i915_page_directory *pd;
> const uint64_t orig_start = start;
> const uint64_t orig_length = length;
> @@ -1072,6 +1215,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
>
> kunmap_px(ppgtt, page_directory);
> __set_bit(pdpe, pdp->used_pdpes);
> + gen8_setup_page_directory(ppgtt, pdp, pd, pdpe);
> }
>
> free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
> @@ -1092,6 +1236,68 @@ err_out:
> return ret;
> }
>
> +static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
> + struct i915_pml4 *pml4,
> + uint64_t start,
> + uint64_t length)
> +{
> + DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4);
> + struct i915_hw_ppgtt *ppgtt =
> + container_of(vm, struct i915_hw_ppgtt, base);
> + struct i915_page_directory_pointer *pdp;
> + uint64_t temp, pml4e;
> + int ret = 0;
> +
> + /* Do the pml4 allocations first, so we don't need to track the newly
> + * allocated tables below the pdp */
> + bitmap_zero(new_pdps, GEN8_PML4ES_PER_PML4);
> +
> + /* The pagedirectory and pagetable allocations are done in the shared 3
> + * and 4 level code. Just allocate the pdps.
> + */
> + ret = gen8_ppgtt_alloc_page_dirpointers(vm, pml4, start, length,
> + new_pdps);
> + if (ret)
> + return ret;
> +
> + WARN(bitmap_weight(new_pdps, GEN8_PML4ES_PER_PML4) > 2,
> + "The allocation has spanned more than 512GB. "
> + "It is highly likely this is incorrect.");
> +
> + gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
> + WARN_ON(!pdp);
> +
> + ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
> + if (ret)
> + goto err_out;
> +
> + gen8_setup_page_directory_pointer(ppgtt, pml4, pdp, pml4e);
> + }
> +
> + bitmap_or(pml4->used_pml4es, new_pdps, pml4->used_pml4es,
> + GEN8_PML4ES_PER_PML4);
> +
> + return 0;
> +
> +err_out:
> + for_each_set_bit(pml4e, new_pdps, GEN8_PML4ES_PER_PML4)
> + gen8_ppgtt_cleanup_3lvl(vm->dev, pml4->pdps[pml4e]);
> +
> + return ret;
> +}
> +
> +static int gen8_alloc_va_range(struct i915_address_space *vm,
> + uint64_t start, uint64_t length)
> +{
> + struct i915_hw_ppgtt *ppgtt =
> + container_of(vm, struct i915_hw_ppgtt, base);
> +
> + if (USES_FULL_48BIT_PPGTT(vm->dev))
> + return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
> + else
> + return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
> +}
> +
> /*
> * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
> * with a net effect resembling a 2-level page table in normal x86 terms. Each
> @@ -1117,9 +1323,14 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>
> ppgtt->switch_mm = gen8_mm_switch;
>
> - if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> - ret = __pdp_init(false, &ppgtt->pdp);
> + if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> + ret = setup_px(ppgtt->base.dev, &ppgtt->pml4);
> + if (ret)
> + goto free_scratch;
>
> + ppgtt->base.total = 1ULL << 48;
> + } else {
> + ret = __pdp_init(false, &ppgtt->pdp);
> if (ret)
> goto free_scratch;
>
> @@ -1131,10 +1342,10 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> * 2GiB).
> */
> ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
> - } else {
> - ppgtt->base.total = 1ULL << 48;
> - ret = -EPERM; /* Not yet implemented */
> - goto free_scratch;
> +
> + trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base,
> + 0, 0,
> + GEN8_PML4E_SHIFT);
> }
>
> return 0;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 04bc66f..11d44b3 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -39,6 +39,8 @@ struct drm_i915_file_private;
> typedef uint32_t gen6_pte_t;
> typedef uint64_t gen8_pte_t;
> typedef uint64_t gen8_pde_t;
> +typedef uint64_t gen8_ppgtt_pdpe_t;
> +typedef uint64_t gen8_ppgtt_pml4e_t;
>
> #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
>
> @@ -95,6 +97,7 @@ typedef uint64_t gen8_pde_t;
> */
> #define GEN8_PML4ES_PER_PML4 512
> #define GEN8_PML4E_SHIFT 39
> +#define GEN8_PML4E_MASK (GEN8_PML4ES_PER_PML4 - 1)
> #define GEN8_PDPE_SHIFT 30
> /* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
> * tables */
> @@ -465,6 +468,15 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
> temp = min(temp, length), \
> start += temp, length -= temp)
>
> +#define gen8_for_each_pml4e(pdp, pml4, start, length, temp, iter) \
> + for (iter = gen8_pml4e_index(start); \
> + pdp = (pml4)->pdps[iter], \
> + length > 0 && iter < GEN8_PML4ES_PER_PML4; \
> + iter++, \
> + temp = ALIGN(start+1, 1ULL << GEN8_PML4E_SHIFT) - start, \
> + temp = min(temp, length), \
> + start += temp, length -= temp)
> +
> static inline uint32_t gen8_pte_index(uint64_t address)
> {
> return i915_pte_index(address, GEN8_PDE_SHIFT);
> @@ -482,8 +494,7 @@ static inline uint32_t gen8_pdpe_index(uint64_t address)
>
> static inline uint32_t gen8_pml4e_index(uint64_t address)
> {
> - WARN_ON(1); /* For 64B */
> - return 0;
> + return (address >> GEN8_PML4E_SHIFT) & GEN8_PML4E_MASK;
> }
>
> static inline size_t gen8_pte_count(uint64_t address, uint64_t length)
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index f230d76..e6b5c74 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -221,6 +221,14 @@ DEFINE_EVENT_PRINT(i915_px_entry, i915_page_directory_entry_alloc,
> __entry->vm, __entry->px, __entry->start, __entry->end)
> );
>
> +DEFINE_EVENT_PRINT(i915_px_entry, i915_page_directory_pointer_entry_alloc,
> + TP_PROTO(struct i915_address_space *vm, u32 pml4e, u64 start, u64 pml4e_shift),
> + TP_ARGS(vm, pml4e, start, pml4e_shift),
> +
> + TP_printk("vm=%p, pml4e=%d (0x%llx-0x%llx)",
> + __entry->vm, __entry->px, __entry->start, __entry->end)
> +);
> +
> /* Avoid extra math because we only support two sizes. The format is defined by
> * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
> #define TRACE_PT_SIZE(bits) \
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v7 08/19] drm/i915/gen8: Add 4 level switching infrastructure and lrc support
2015-07-30 10:06 ` [PATCH v7 " Michel Thierry
@ 2015-07-31 4:23 ` Goel, Akash
0 siblings, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-31 4:23 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 7/30/2015 3:36 PM, Michel Thierry wrote:
> In 64b (48bit canonical) PPGTT addressing, the PDP0 register contains
> the base address to PML4, while the other PDP registers are ignored.
>
> In LRC, the addressing mode must be specified in every context
> descriptor, and the base address to PML4 is stored in the reg state.
>
> v2: PML4 update in legacy context switch is left for historic reasons,
> the preferred mode of operation is with lrc context based submission.
> v3: s/gen8_map_page_directory/gen8_setup_page_directory and
> s/gen8_map_page_directory_pointer/gen8_setup_page_directory_pointer.
> Also, clflush will be needed for bxt. (Akash)
> v4: Squashed lrc-specific code and use a macro to set PML4 register.
> v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
> PDP update in bb_start is only for legacy 32b mode.
> v6: Rebase after final merged version of Mika's ppgtt/scratch
> patches.
> v7: There is no need to update the pml4 register value in
> execlists_update_context. (Akash)
> v8: Move pd and pdp setup functions to a previous patch, they do not
> belong here. (Akash)
> v9: Check USES_FULL_48BIT_PPGTT instead of GEN8_CTX_ADDRESSING_MODE in
> gen8_emit_bb_start to check if emit pdps is needed. (Akash)
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 17 +++++++----
> drivers/gpu/drm/i915/i915_reg.h | 1 +
> drivers/gpu/drm/i915/intel_lrc.c | 60 ++++++++++++++++++++++++++-----------
> 3 files changed, 55 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index c498eaa..ae2e082 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -656,8 +656,8 @@ static int gen8_write_pdp(struct drm_i915_gem_request *req,
> return 0;
> }
>
> -static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
> - struct drm_i915_gem_request *req)
> +static int gen8_legacy_mm_switch(struct i915_hw_ppgtt *ppgtt,
> + struct drm_i915_gem_request *req)
> {
> int i, ret;
>
> @@ -672,6 +672,12 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
> return 0;
> }
>
> +static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
> + struct drm_i915_gem_request *req)
> +{
> + return gen8_write_pdp(req, 0, px_dma(&ppgtt->pml4));
> +}
> +
> static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
> struct i915_page_directory_pointer *pdp,
> uint64_t start,
> @@ -1321,14 +1327,13 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> ppgtt->base.unbind_vma = ppgtt_unbind_vma;
> ppgtt->base.bind_vma = ppgtt_bind_vma;
>
> - ppgtt->switch_mm = gen8_mm_switch;
> -
> if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> ret = setup_px(ppgtt->base.dev, &ppgtt->pml4);
> if (ret)
> goto free_scratch;
>
> ppgtt->base.total = 1ULL << 48;
> + ppgtt->switch_mm = gen8_48b_mm_switch;
> } else {
> ret = __pdp_init(false, &ppgtt->pdp);
> if (ret)
> @@ -1343,6 +1348,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> */
> ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
>
> + ppgtt->switch_mm = gen8_legacy_mm_switch;
> trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base,
> 0, 0,
> GEN8_PML4E_SHIFT);
> @@ -1540,8 +1546,9 @@ static void gen8_ppgtt_enable(struct drm_device *dev)
> int j;
>
> for_each_ring(ring, dev_priv, j) {
> + u32 four_level = USES_FULL_48BIT_PPGTT(dev) ? GEN8_GFX_PPGTT_48B : 0;
> I915_WRITE(RING_MODE_GEN7(ring),
> - _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
> + _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE | four_level));
> }
> }
>
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 3a77678..5bd1b6a 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -1670,6 +1670,7 @@ enum skl_disp_power_wells {
> #define GFX_REPLAY_MODE (1<<11)
> #define GFX_PSMI_GRANULARITY (1<<10)
> #define GFX_PPGTT_ENABLE (1<<9)
> +#define GEN8_GFX_PPGTT_48B (1<<7)
>
> #define VLV_DISPLAY_BASE 0x180000
> #define VLV_MIPI_BASE VLV_DISPLAY_BASE
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 99bba8e..4c40614 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -196,13 +196,21 @@
> reg_state[CTX_PDP ## n ## _LDW+1] = lower_32_bits(_addr); \
> }
>
> +#define ASSIGN_CTX_PML4(ppgtt, reg_state) { \
> + reg_state[CTX_PDP0_UDW + 1] = upper_32_bits(px_dma(&ppgtt->pml4)); \
> + reg_state[CTX_PDP0_LDW + 1] = lower_32_bits(px_dma(&ppgtt->pml4)); \
> +}
> +
> enum {
> ADVANCED_CONTEXT = 0,
> - LEGACY_CONTEXT,
> + LEGACY_32B_CONTEXT,
> ADVANCED_AD_CONTEXT,
> LEGACY_64B_CONTEXT
> };
> -#define GEN8_CTX_MODE_SHIFT 3
> +#define GEN8_CTX_ADDRESSING_MODE_SHIFT 3
> +#define GEN8_CTX_ADDRESSING_MODE(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
> + LEGACY_64B_CONTEXT :\
> + LEGACY_32B_CONTEXT)
> enum {
> FAULT_AND_HANG = 0,
> FAULT_AND_HALT, /* Debug only */
> @@ -273,7 +281,7 @@ static uint64_t execlists_ctx_descriptor(struct drm_i915_gem_request *rq)
> WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
>
> desc = GEN8_CTX_VALID;
> - desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
> + desc |= GEN8_CTX_ADDRESSING_MODE(dev) << GEN8_CTX_ADDRESSING_MODE_SHIFT;
> if (IS_GEN8(ctx_obj->base.dev))
> desc |= GEN8_CTX_L3LLC_COHERENT;
> desc |= GEN8_CTX_PRIVILEGE;
> @@ -348,10 +356,12 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
> reg_state[CTX_RING_TAIL+1] = rq->tail;
> reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(rb_obj);
>
> - /* True PPGTT with dynamic page allocation: update PDP registers and
> - * point the unallocated PDPs to the scratch page
> - */
> - if (ppgtt) {
> + if (ppgtt && !USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> + /* True 32b PPGTT with dynamic page allocation: update PDP
> + * registers and point the unallocated PDPs to scratch page.
> + * PML4 is allocated during ppgtt init, so this is not needed
> + * in 48-bit mode.
> + */
> ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
> ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
> ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
> @@ -1512,12 +1522,15 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
> * Ideally, we should set Force PD Restore in ctx descriptor,
> * but we can't. Force Restore would be a second option, but
> * it is unsafe in case of lite-restore (because the ctx is
> - * not idle). */
> + * not idle). PML4 is allocated during ppgtt init so this is
> + * not needed in 48-bit.*/
> if (req->ctx->ppgtt &&
> (intel_ring_flag(req->ring) & req->ctx->ppgtt->pd_dirty_rings)) {
> - ret = intel_logical_ring_emit_pdps(req);
> - if (ret)
> - return ret;
> + if (!USES_FULL_48BIT_PPGTT(req->i915)) {
> + ret = intel_logical_ring_emit_pdps(req);
> + if (ret)
> + return ret;
> + }
>
> req->ctx->ppgtt->pd_dirty_rings &= ~intel_ring_flag(req->ring);
> }
> @@ -2198,13 +2211,24 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
> reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
> reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
>
> - /* With dynamic page allocation, PDPs may not be allocated at this point,
> - * Point the unallocated PDPs to the scratch page
> - */
> - ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
> - ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
> - ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
> - ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
> + if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> + /* 64b PPGTT (48bit canonical)
> + * PDP0_DESCRIPTOR contains the base address to PML4 and
> + * other PDP Descriptors are ignored.
> + */
> + ASSIGN_CTX_PML4(ppgtt, reg_state);
> + } else {
> + /* 32b PPGTT
> + * PDP*_DESCRIPTOR contains the base address of space supported.
> + * With dynamic page allocation, PDPs may not be allocated at
> + * this point. Point the unallocated PDPs to the scratch page
> + */
> + ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
> + ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
> + ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
> + ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
> + }
> +
> if (ring->id == RCS) {
> reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
> reg_state[CTX_R_PWR_CLK_STATE] = GEN8_R_PWR_CLK_STATE;
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v7 06/19] drm/i915/gen8: Add PML4 structure
2015-07-30 10:04 ` [PATCH v7 " Michel Thierry
@ 2015-07-31 4:35 ` Goel, Akash
2015-07-31 12:12 ` [PATCH v8 " Michel Thierry
1 sibling, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-31 4:35 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
On 7/30/2015 3:34 PM, Michel Thierry wrote:
> Introduces the Page Map Level 4 (PML4), ie. the new top level structure
> of the page tables.
>
> To facilitate testing, 48b mode will be available on Broadwell and
> GEN9+, when i915.enable_ppgtt = 3.
>
> v2: Remove unnecessary CONFIG_X86_64 checks, ppgtt code is already
> 32/64-bit safe (Chris).
> v3: Add goto free_scratch in temp 48-bit mode init code (Akash).
> v4: kfree the pdp until the 4lvl alloc/free patch (Akash).
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 3 ++-
> drivers/gpu/drm/i915/i915_gem_gtt.c | 36 +++++++++++++++++++++++-------------
> drivers/gpu/drm/i915/i915_gem_gtt.h | 26 +++++++++++++++++++++-----
> 3 files changed, 46 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 04aa34a..4729eaf 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2498,7 +2498,8 @@ struct drm_i915_cmd_table {
> #define HAS_HW_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 6)
> #define HAS_LOGICAL_RING_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 8)
> #define USES_PPGTT(dev) (i915.enable_ppgtt)
> -#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt == 2)
> +#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt >= 2)
> +#define USES_FULL_48BIT_PPGTT(dev) (i915.enable_ppgtt == 3)
>
> #define HAS_OVERLAY(dev) (INTEL_INFO(dev)->has_overlay)
> #define OVERLAY_NEEDS_PHYSICAL(dev) (INTEL_INFO(dev)->overlay_needs_physical)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 7f71746..3288154 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -104,9 +104,12 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
> {
> bool has_aliasing_ppgtt;
> bool has_full_ppgtt;
> + bool has_full_64bit_ppgtt;
>
> has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
> has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
> + has_full_64bit_ppgtt = (IS_BROADWELL(dev) ||
> + INTEL_INFO(dev)->gen >= 9) && false; /* FIXME: 64b */
Sorry for the late comment.
Would it be better to move the changes done in this sanitize function to
the later patch only 'Flip the 48b switch' ?
As even with the removal of these changes, setting of enable_ppgtt to 3
would still be equivalent to the default mode (-1).
What is really needed now is the definition of 'USES_FULL_48BIT_PPGTT'
macro.
Best regards
Akash
>
> if (intel_vgpu_active(dev))
> has_full_ppgtt = false; /* emulation is too hard */
> @@ -125,6 +128,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
> if (enable_ppgtt == 2 && has_full_ppgtt)
> return 2;
>
> + if (enable_ppgtt == 3 && has_full_64bit_ppgtt)
> + return 3;
> +
> #ifdef CONFIG_INTEL_IOMMU
> /* Disable ppgtt on SNB if VT-d is on. */
> if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) {
> @@ -689,9 +695,6 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
> pt_vaddr = NULL;
>
> for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
> - if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
> - break;
> -
> if (pt_vaddr == NULL) {
> struct i915_page_directory *pd = pdp->page_directory[pdpe];
> struct i915_page_table *pt = pd->page_table[pde];
> @@ -1105,14 +1108,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> return ret;
>
> ppgtt->base.start = 0;
> - ppgtt->base.total = 1ULL << 32;
> - if (IS_ENABLED(CONFIG_X86_32))
> - /* While we have a proliferation of size_t variables
> - * we cannot represent the full ppgtt size on 32bit,
> - * so limit it to the same size as the GGTT (currently
> - * 2GiB).
> - */
> - ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
> ppgtt->base.cleanup = gen8_ppgtt_cleanup;
> ppgtt->base.allocate_va_range = gen8_alloc_va_range;
> ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
> @@ -1122,10 +1117,25 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>
> ppgtt->switch_mm = gen8_mm_switch;
>
> - ret = __pdp_init(false, &ppgtt->pdp);
> + if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> + ret = __pdp_init(false, &ppgtt->pdp);
>
> - if (ret)
> + if (ret)
> + goto free_scratch;
> +
> + ppgtt->base.total = 1ULL << 32;
> + if (IS_ENABLED(CONFIG_X86_32))
> + /* While we have a proliferation of size_t variables
> + * we cannot represent the full ppgtt size on 32bit,
> + * so limit it to the same size as the GGTT (currently
> + * 2GiB).
> + */
> + ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
> + } else {
> + ppgtt->base.total = 1ULL << 48;
> + ret = -EPERM; /* Not yet implemented */
> goto free_scratch;
> + }
>
> return 0;
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 87e389c..04bc66f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -88,9 +88,17 @@ typedef uint64_t gen8_pde_t;
> * PDPE | PDE | PTE | offset
> * The difference as compared to normal x86 3 level page table is the PDPEs are
> * programmed via register.
> + *
> + * GEN8 48b legacy style address is defined as a 4 level page table:
> + * 47:39 | 38:30 | 29:21 | 20:12 | 11:0
> + * PML4E | PDPE | PDE | PTE | offset
> */
> +#define GEN8_PML4ES_PER_PML4 512
> +#define GEN8_PML4E_SHIFT 39
> #define GEN8_PDPE_SHIFT 30
> -#define GEN8_PDPE_MASK 0x3
> +/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
> + * tables */
> +#define GEN8_PDPE_MASK 0x1ff
> #define GEN8_PDE_SHIFT 21
> #define GEN8_PDE_MASK 0x1ff
> #define GEN8_PTE_SHIFT 12
> @@ -98,8 +106,8 @@ typedef uint64_t gen8_pde_t;
> #define GEN8_LEGACY_PDPES 4
> #define GEN8_PTES I915_PTES(sizeof(gen8_pte_t))
>
> -/* FIXME: Next patch will use dev */
> -#define I915_PDPES_PER_PDP(dev) GEN8_LEGACY_PDPES
> +#define I915_PDPES_PER_PDP(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
> + GEN8_PML4ES_PER_PML4 : GEN8_LEGACY_PDPES)
>
> #define PPAT_UNCACHED_INDEX (_PAGE_PWT | _PAGE_PCD)
> #define PPAT_CACHED_PDE_INDEX 0 /* WB LLC */
> @@ -250,6 +258,13 @@ struct i915_page_directory_pointer {
> struct i915_page_directory **page_directory;
> };
>
> +struct i915_pml4 {
> + struct i915_page_dma base;
> +
> + DECLARE_BITMAP(used_pml4es, GEN8_PML4ES_PER_PML4);
> + struct i915_page_directory_pointer *pdps[GEN8_PML4ES_PER_PML4];
> +};
> +
> struct i915_address_space {
> struct drm_mm mm;
> struct drm_device *dev;
> @@ -345,8 +360,9 @@ struct i915_hw_ppgtt {
> struct drm_mm_node node;
> unsigned long pd_dirty_rings;
> union {
> - struct i915_page_directory_pointer pdp;
> - struct i915_page_directory pd;
> + struct i915_pml4 pml4; /* GEN8+ & 48b PPGTT */
> + struct i915_page_directory_pointer pdp; /* GEN8+ */
> + struct i915_page_directory pd; /* GEN6-7 */
> };
>
> struct drm_i915_file_private *file_priv;
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* [PATCH v8 06/19] drm/i915/gen8: Add PML4 structure
2015-07-30 10:04 ` [PATCH v7 " Michel Thierry
2015-07-31 4:35 ` Goel, Akash
@ 2015-07-31 12:12 ` Michel Thierry
2015-07-31 17:35 ` Goel, Akash
2015-08-03 8:52 ` [PATCH v9 " Michel Thierry
1 sibling, 2 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-31 12:12 UTC (permalink / raw)
To: intel-gfx; +Cc: Akash Goel
Introduces the Page Map Level 4 (PML4), ie. the new top level structure
of the page tables.
To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.
v2: Remove unnecessary CONFIG_X86_64 checks, ppgtt code is already
32/64-bit safe (Chris).
v3: Add goto free_scratch in temp 48-bit mode init code (Akash).
v4: kfree the pdp until the 4lvl alloc/free patch (Akash).
v5: Postpone 48-bit code in sanitize_enable_ppgtt (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 3 ++-
drivers/gpu/drm/i915/i915_gem_gtt.c | 30 +++++++++++++++++-------------
drivers/gpu/drm/i915/i915_gem_gtt.h | 26 +++++++++++++++++++++-----
3 files changed, 40 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 04aa34a..4729eaf 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2498,7 +2498,8 @@ struct drm_i915_cmd_table {
#define HAS_HW_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 6)
#define HAS_LOGICAL_RING_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 8)
#define USES_PPGTT(dev) (i915.enable_ppgtt)
-#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt == 2)
+#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt >= 2)
+#define USES_FULL_48BIT_PPGTT(dev) (i915.enable_ppgtt == 3)
#define HAS_OVERLAY(dev) (INTEL_INFO(dev)->has_overlay)
#define OVERLAY_NEEDS_PHYSICAL(dev) (INTEL_INFO(dev)->overlay_needs_physical)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7f71746..ba99b67 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -689,9 +689,6 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
pt_vaddr = NULL;
for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
- if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
- break;
-
if (pt_vaddr == NULL) {
struct i915_page_directory *pd = pdp->page_directory[pdpe];
struct i915_page_table *pt = pd->page_table[pde];
@@ -1105,14 +1102,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
return ret;
ppgtt->base.start = 0;
- ppgtt->base.total = 1ULL << 32;
- if (IS_ENABLED(CONFIG_X86_32))
- /* While we have a proliferation of size_t variables
- * we cannot represent the full ppgtt size on 32bit,
- * so limit it to the same size as the GGTT (currently
- * 2GiB).
- */
- ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
ppgtt->base.cleanup = gen8_ppgtt_cleanup;
ppgtt->base.allocate_va_range = gen8_alloc_va_range;
ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
@@ -1122,10 +1111,25 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->switch_mm = gen8_mm_switch;
- ret = __pdp_init(false, &ppgtt->pdp);
+ if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+ ret = __pdp_init(false, &ppgtt->pdp);
- if (ret)
+ if (ret)
+ goto free_scratch;
+
+ ppgtt->base.total = 1ULL << 32;
+ if (IS_ENABLED(CONFIG_X86_32))
+ /* While we have a proliferation of size_t variables
+ * we cannot represent the full ppgtt size on 32bit,
+ * so limit it to the same size as the GGTT (currently
+ * 2GiB).
+ */
+ ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
+ } else {
+ ppgtt->base.total = 1ULL << 48;
+ ret = -EPERM; /* Not yet implemented */
goto free_scratch;
+ }
return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 87e389c..04bc66f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -88,9 +88,17 @@ typedef uint64_t gen8_pde_t;
* PDPE | PDE | PTE | offset
* The difference as compared to normal x86 3 level page table is the PDPEs are
* programmed via register.
+ *
+ * GEN8 48b legacy style address is defined as a 4 level page table:
+ * 47:39 | 38:30 | 29:21 | 20:12 | 11:0
+ * PML4E | PDPE | PDE | PTE | offset
*/
+#define GEN8_PML4ES_PER_PML4 512
+#define GEN8_PML4E_SHIFT 39
#define GEN8_PDPE_SHIFT 30
-#define GEN8_PDPE_MASK 0x3
+/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
+ * tables */
+#define GEN8_PDPE_MASK 0x1ff
#define GEN8_PDE_SHIFT 21
#define GEN8_PDE_MASK 0x1ff
#define GEN8_PTE_SHIFT 12
@@ -98,8 +106,8 @@ typedef uint64_t gen8_pde_t;
#define GEN8_LEGACY_PDPES 4
#define GEN8_PTES I915_PTES(sizeof(gen8_pte_t))
-/* FIXME: Next patch will use dev */
-#define I915_PDPES_PER_PDP(dev) GEN8_LEGACY_PDPES
+#define I915_PDPES_PER_PDP(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
+ GEN8_PML4ES_PER_PML4 : GEN8_LEGACY_PDPES)
#define PPAT_UNCACHED_INDEX (_PAGE_PWT | _PAGE_PCD)
#define PPAT_CACHED_PDE_INDEX 0 /* WB LLC */
@@ -250,6 +258,13 @@ struct i915_page_directory_pointer {
struct i915_page_directory **page_directory;
};
+struct i915_pml4 {
+ struct i915_page_dma base;
+
+ DECLARE_BITMAP(used_pml4es, GEN8_PML4ES_PER_PML4);
+ struct i915_page_directory_pointer *pdps[GEN8_PML4ES_PER_PML4];
+};
+
struct i915_address_space {
struct drm_mm mm;
struct drm_device *dev;
@@ -345,8 +360,9 @@ struct i915_hw_ppgtt {
struct drm_mm_node node;
unsigned long pd_dirty_rings;
union {
- struct i915_page_directory_pointer pdp;
- struct i915_page_directory pd;
+ struct i915_pml4 pml4; /* GEN8+ & 48b PPGTT */
+ struct i915_page_directory_pointer pdp; /* GEN8+ */
+ struct i915_page_directory pd; /* GEN6-7 */
};
struct drm_i915_file_private *file_priv;
--
2.5.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v8 18/19] drm/i915/gen8: Flip the 48b switch
2015-07-30 10:09 ` [PATCH v7 " Michel Thierry
@ 2015-07-31 12:13 ` Michel Thierry
2015-07-31 12:19 ` Chris Wilson
2015-07-31 12:35 ` Michel Thierry
0 siblings, 2 replies; 82+ messages in thread
From: Michel Thierry @ 2015-07-31 12:13 UTC (permalink / raw)
To: intel-gfx; +Cc: Akash Goel
Use 48b addresses if hw supports it (i915.enable_ppgtt=3).
Update the sanitize_enable_ppgtt for 48 bit PPGTT mode.
Note, aliasing PPGTT remains 32b only.
v2: s/full_64b/full_48b/. (Akash)
v3: Add sanitize_enable_ppgtt changes until here. (Akash)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 7 ++++++-
drivers/gpu/drm/i915/i915_params.c | 2 +-
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7a526f9..31d20c6 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -104,9 +104,11 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
{
bool has_aliasing_ppgtt;
bool has_full_ppgtt;
+ bool has_full_48bit_ppgtt;
has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
+ has_full_48bit_ppgtt = IS_BROADWELL(dev) || INTEL_INFO(dev)->gen >= 9;
if (intel_vgpu_active(dev))
has_full_ppgtt = false; /* emulation is too hard */
@@ -125,6 +127,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
if (enable_ppgtt == 2 && has_full_ppgtt)
return 2;
+ if (enable_ppgtt == 3 && has_full_48bit_ppgtt)
+ return 3;
+
#ifdef CONFIG_INTEL_IOMMU
/* Disable ppgtt on SNB if VT-d is on. */
if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) {
@@ -141,7 +146,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
}
if (INTEL_INFO(dev)->gen >= 8 && i915.enable_execlists)
- return 2;
+ return has_full_48bit_ppgtt ? 3 : 2;
else
return has_aliasing_ppgtt ? 1 : 0;
}
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 5ae4b0a..900e48a 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -111,7 +111,7 @@ MODULE_PARM_DESC(enable_hangcheck,
module_param_named_unsafe(enable_ppgtt, i915.enable_ppgtt, int, 0400);
MODULE_PARM_DESC(enable_ppgtt,
"Override PPGTT usage. "
- "(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
+ "(-1=auto [default], 0=disabled, 1=aliasing, 2=full, 3=full_48b)");
module_param_named(enable_execlists, i915.enable_execlists, int, 0400);
MODULE_PARM_DESC(enable_execlists,
--
2.5.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* Re: [PATCH v8 18/19] drm/i915/gen8: Flip the 48b switch
2015-07-31 12:13 ` [PATCH v8 " Michel Thierry
@ 2015-07-31 12:19 ` Chris Wilson
2015-07-31 12:35 ` Michel Thierry
1 sibling, 0 replies; 82+ messages in thread
From: Chris Wilson @ 2015-07-31 12:19 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx, Akash Goel
On Fri, Jul 31, 2015 at 01:13:00PM +0100, Michel Thierry wrote:
> diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> index 5ae4b0a..900e48a 100644
> --- a/drivers/gpu/drm/i915/i915_params.c
> +++ b/drivers/gpu/drm/i915/i915_params.c
> @@ -111,7 +111,7 @@ MODULE_PARM_DESC(enable_hangcheck,
> module_param_named_unsafe(enable_ppgtt, i915.enable_ppgtt, int, 0400);
> MODULE_PARM_DESC(enable_ppgtt,
> "Override PPGTT usage. "
> - "(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
> + "(-1=auto [default], 0=disabled, 1=aliasing, 2=full, 3=full_48b)");
3=full with extended address space
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* [PATCH v8 18/19] drm/i915/gen8: Flip the 48b switch
2015-07-31 12:13 ` [PATCH v8 " Michel Thierry
2015-07-31 12:19 ` Chris Wilson
@ 2015-07-31 12:35 ` Michel Thierry
2015-07-31 17:21 ` Goel, Akash
1 sibling, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-07-31 12:35 UTC (permalink / raw)
To: intel-gfx; +Cc: Akash Goel
Use 48b addresses if hw supports it (i915.enable_ppgtt=3).
Update the sanitize_enable_ppgtt for 48 bit PPGTT mode.
Note, aliasing PPGTT remains 32b only.
v2: s/full_64b/full_48b/. (Akash)
v3: Add sanitize_enable_ppgtt changes until here. (Akash)
v4: Update param description (Chris)
Cc: Akash Goel <akash.goel@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 7 ++++++-
drivers/gpu/drm/i915/i915_params.c | 2 +-
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7a526f9..31d20c6 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -104,9 +104,11 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
{
bool has_aliasing_ppgtt;
bool has_full_ppgtt;
+ bool has_full_48bit_ppgtt;
has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
+ has_full_48bit_ppgtt = IS_BROADWELL(dev) || INTEL_INFO(dev)->gen >= 9;
if (intel_vgpu_active(dev))
has_full_ppgtt = false; /* emulation is too hard */
@@ -125,6 +127,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
if (enable_ppgtt == 2 && has_full_ppgtt)
return 2;
+ if (enable_ppgtt == 3 && has_full_48bit_ppgtt)
+ return 3;
+
#ifdef CONFIG_INTEL_IOMMU
/* Disable ppgtt on SNB if VT-d is on. */
if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) {
@@ -141,7 +146,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
}
if (INTEL_INFO(dev)->gen >= 8 && i915.enable_execlists)
- return 2;
+ return has_full_48bit_ppgtt ? 3 : 2;
else
return has_aliasing_ppgtt ? 1 : 0;
}
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 5ae4b0a..900e48a 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -111,7 +111,7 @@ MODULE_PARM_DESC(enable_hangcheck,
module_param_named_unsafe(enable_ppgtt, i915.enable_ppgtt, int, 0400);
MODULE_PARM_DESC(enable_ppgtt,
"Override PPGTT usage. "
- "(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
+ "(-1=auto [default], 0=disabled, 1=aliasing, 2=full, 3=full with extended address space)");
module_param_named(enable_execlists, i915.enable_execlists, int, 0400);
MODULE_PARM_DESC(enable_execlists,
--
2.5.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* Re: [PATCH v8 18/19] drm/i915/gen8: Flip the 48b switch
2015-07-31 12:35 ` Michel Thierry
@ 2015-07-31 17:21 ` Goel, Akash
0 siblings, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-07-31 17:21 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 7/31/2015 6:05 PM, Michel Thierry wrote:
> Use 48b addresses if hw supports it (i915.enable_ppgtt=3).
> Update the sanitize_enable_ppgtt for 48 bit PPGTT mode.
>
> Note, aliasing PPGTT remains 32b only.
>
> v2: s/full_64b/full_48b/. (Akash)
> v3: Add sanitize_enable_ppgtt changes until here. (Akash)
> v4: Update param description (Chris)
>
> Cc: Akash Goel <akash.goel@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 7 ++++++-
> drivers/gpu/drm/i915/i915_params.c | 2 +-
> 2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 7a526f9..31d20c6 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -104,9 +104,11 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
> {
> bool has_aliasing_ppgtt;
> bool has_full_ppgtt;
> + bool has_full_48bit_ppgtt;
>
> has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
> has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
> + has_full_48bit_ppgtt = IS_BROADWELL(dev) || INTEL_INFO(dev)->gen >= 9;
>
> if (intel_vgpu_active(dev))
> has_full_ppgtt = false; /* emulation is too hard */
> @@ -125,6 +127,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
> if (enable_ppgtt == 2 && has_full_ppgtt)
> return 2;
>
> + if (enable_ppgtt == 3 && has_full_48bit_ppgtt)
> + return 3;
> +
> #ifdef CONFIG_INTEL_IOMMU
> /* Disable ppgtt on SNB if VT-d is on. */
> if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) {
> @@ -141,7 +146,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
> }
>
> if (INTEL_INFO(dev)->gen >= 8 && i915.enable_execlists)
> - return 2;
> + return has_full_48bit_ppgtt ? 3 : 2;
> else
> return has_aliasing_ppgtt ? 1 : 0;
> }
> diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> index 5ae4b0a..900e48a 100644
> --- a/drivers/gpu/drm/i915/i915_params.c
> +++ b/drivers/gpu/drm/i915/i915_params.c
> @@ -111,7 +111,7 @@ MODULE_PARM_DESC(enable_hangcheck,
> module_param_named_unsafe(enable_ppgtt, i915.enable_ppgtt, int, 0400);
> MODULE_PARM_DESC(enable_ppgtt,
> "Override PPGTT usage. "
> - "(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
> + "(-1=auto [default], 0=disabled, 1=aliasing, 2=full, 3=full with extended address space)");
>
> module_param_named(enable_execlists, i915.enable_execlists, int, 0400);
> MODULE_PARM_DESC(enable_execlists,
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v8 06/19] drm/i915/gen8: Add PML4 structure
2015-07-31 12:12 ` [PATCH v8 " Michel Thierry
@ 2015-07-31 17:35 ` Goel, Akash
2015-08-03 8:34 ` Michel Thierry
2015-08-03 8:52 ` [PATCH v9 " Michel Thierry
1 sibling, 1 reply; 82+ messages in thread
From: Goel, Akash @ 2015-07-31 17:35 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
On 7/31/2015 5:42 PM, Michel Thierry wrote:
> Introduces the Page Map Level 4 (PML4), ie. the new top level structure
> of the page tables.
>
> To facilitate testing, 48b mode will be available on Broadwell and
> GEN9+, when i915.enable_ppgtt = 3.
>
> v2: Remove unnecessary CONFIG_X86_64 checks, ppgtt code is already
> 32/64-bit safe (Chris).
> v3: Add goto free_scratch in temp 48-bit mode init code (Akash).
> v4: kfree the pdp until the 4lvl alloc/free patch (Akash).
> v5: Postpone 48-bit code in sanitize_enable_ppgtt (Akash).
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 3 ++-
> drivers/gpu/drm/i915/i915_gem_gtt.c | 30 +++++++++++++++++-------------
> drivers/gpu/drm/i915/i915_gem_gtt.h | 26 +++++++++++++++++++++-----
> 3 files changed, 40 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 04aa34a..4729eaf 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2498,7 +2498,8 @@ struct drm_i915_cmd_table {
> #define HAS_HW_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 6)
> #define HAS_LOGICAL_RING_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 8)
> #define USES_PPGTT(dev) (i915.enable_ppgtt)
> -#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt == 2)
> +#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt >= 2)
> +#define USES_FULL_48BIT_PPGTT(dev) (i915.enable_ppgtt == 3)
>
> #define HAS_OVERLAY(dev) (INTEL_INFO(dev)->has_overlay)
> #define OVERLAY_NEEDS_PHYSICAL(dev) (INTEL_INFO(dev)->overlay_needs_physical)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 7f71746..ba99b67 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -689,9 +689,6 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
> pt_vaddr = NULL;
>
> for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
> - if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
> - break;
> -
Apologize for this eleventh hour comment.
Would this change be better off in the later patch "Add 4 level support
in insert_entries and clear_range".
Best regards
Akash
> if (pt_vaddr == NULL) {
> struct i915_page_directory *pd = pdp->page_directory[pdpe];
> struct i915_page_table *pt = pd->page_table[pde];
> @@ -1105,14 +1102,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> return ret;
>
> ppgtt->base.start = 0;
> - ppgtt->base.total = 1ULL << 32;
> - if (IS_ENABLED(CONFIG_X86_32))
> - /* While we have a proliferation of size_t variables
> - * we cannot represent the full ppgtt size on 32bit,
> - * so limit it to the same size as the GGTT (currently
> - * 2GiB).
> - */
> - ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
> ppgtt->base.cleanup = gen8_ppgtt_cleanup;
> ppgtt->base.allocate_va_range = gen8_alloc_va_range;
> ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
> @@ -1122,10 +1111,25 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>
> ppgtt->switch_mm = gen8_mm_switch;
>
> - ret = __pdp_init(false, &ppgtt->pdp);
> + if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> + ret = __pdp_init(false, &ppgtt->pdp);
>
> - if (ret)
> + if (ret)
> + goto free_scratch;
> +
> + ppgtt->base.total = 1ULL << 32;
> + if (IS_ENABLED(CONFIG_X86_32))
> + /* While we have a proliferation of size_t variables
> + * we cannot represent the full ppgtt size on 32bit,
> + * so limit it to the same size as the GGTT (currently
> + * 2GiB).
> + */
> + ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
> + } else {
> + ppgtt->base.total = 1ULL << 48;
> + ret = -EPERM; /* Not yet implemented */
> goto free_scratch;
> + }
>
> return 0;
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 87e389c..04bc66f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -88,9 +88,17 @@ typedef uint64_t gen8_pde_t;
> * PDPE | PDE | PTE | offset
> * The difference as compared to normal x86 3 level page table is the PDPEs are
> * programmed via register.
> + *
> + * GEN8 48b legacy style address is defined as a 4 level page table:
> + * 47:39 | 38:30 | 29:21 | 20:12 | 11:0
> + * PML4E | PDPE | PDE | PTE | offset
> */
> +#define GEN8_PML4ES_PER_PML4 512
> +#define GEN8_PML4E_SHIFT 39
> #define GEN8_PDPE_SHIFT 30
> -#define GEN8_PDPE_MASK 0x3
> +/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
> + * tables */
> +#define GEN8_PDPE_MASK 0x1ff
> #define GEN8_PDE_SHIFT 21
> #define GEN8_PDE_MASK 0x1ff
> #define GEN8_PTE_SHIFT 12
> @@ -98,8 +106,8 @@ typedef uint64_t gen8_pde_t;
> #define GEN8_LEGACY_PDPES 4
> #define GEN8_PTES I915_PTES(sizeof(gen8_pte_t))
>
> -/* FIXME: Next patch will use dev */
> -#define I915_PDPES_PER_PDP(dev) GEN8_LEGACY_PDPES
> +#define I915_PDPES_PER_PDP(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
> + GEN8_PML4ES_PER_PML4 : GEN8_LEGACY_PDPES)
>
> #define PPAT_UNCACHED_INDEX (_PAGE_PWT | _PAGE_PCD)
> #define PPAT_CACHED_PDE_INDEX 0 /* WB LLC */
> @@ -250,6 +258,13 @@ struct i915_page_directory_pointer {
> struct i915_page_directory **page_directory;
> };
>
> +struct i915_pml4 {
> + struct i915_page_dma base;
> +
> + DECLARE_BITMAP(used_pml4es, GEN8_PML4ES_PER_PML4);
> + struct i915_page_directory_pointer *pdps[GEN8_PML4ES_PER_PML4];
> +};
> +
> struct i915_address_space {
> struct drm_mm mm;
> struct drm_device *dev;
> @@ -345,8 +360,9 @@ struct i915_hw_ppgtt {
> struct drm_mm_node node;
> unsigned long pd_dirty_rings;
> union {
> - struct i915_page_directory_pointer pdp;
> - struct i915_page_directory pd;
> + struct i915_pml4 pml4; /* GEN8+ & 48b PPGTT */
> + struct i915_page_directory_pointer pdp; /* GEN8+ */
> + struct i915_page_directory pd; /* GEN6-7 */
> };
>
> struct drm_i915_file_private *file_priv;
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v8 06/19] drm/i915/gen8: Add PML4 structure
2015-07-31 17:35 ` Goel, Akash
@ 2015-08-03 8:34 ` Michel Thierry
0 siblings, 0 replies; 82+ messages in thread
From: Michel Thierry @ 2015-08-03 8:34 UTC (permalink / raw)
To: Goel, Akash, intel-gfx
On 7/31/2015 6:35 PM, Goel, Akash wrote:
> On 7/31/2015 5:42 PM, Michel Thierry wrote:
>> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
>> @@ -689,9 +689,6 @@ gen8_ppgtt_insert_pte_entries(struct
>> i915_address_space *vm,
>> pt_vaddr = NULL;
>>
>> for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
>> - if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
>> - break;
>> -
> Apologize for this eleventh hour comment.
> Would this change be better off in the later patch "Add 4 level support
> in insert_entries and clear_range".
Makes sense.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* [PATCH v9 06/19] drm/i915/gen8: Add PML4 structure
2015-07-31 12:12 ` [PATCH v8 " Michel Thierry
2015-07-31 17:35 ` Goel, Akash
@ 2015-08-03 8:52 ` Michel Thierry
2015-08-03 9:20 ` Goel, Akash
1 sibling, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-08-03 8:52 UTC (permalink / raw)
To: intel-gfx; +Cc: Akash Goel
Introduces the Page Map Level 4 (PML4), ie. the new top level structure
of the page tables.
To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.
v2: Remove unnecessary CONFIG_X86_64 checks, ppgtt code is already
32/64-bit safe (Chris).
v3: Add goto free_scratch in temp 48-bit mode init code (Akash).
v4: kfree the pdp until the 4lvl alloc/free patch (Akash).
v5: Postpone 48-bit code in sanitize_enable_ppgtt (Akash).
v6: Keep _insert_pte_entries changes outside this patch (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 3 ++-
drivers/gpu/drm/i915/i915_gem_gtt.c | 27 +++++++++++++++++----------
drivers/gpu/drm/i915/i915_gem_gtt.h | 26 +++++++++++++++++++++-----
3 files changed, 40 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 04aa34a..4729eaf 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2498,7 +2498,8 @@ struct drm_i915_cmd_table {
#define HAS_HW_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 6)
#define HAS_LOGICAL_RING_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 8)
#define USES_PPGTT(dev) (i915.enable_ppgtt)
-#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt == 2)
+#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt >= 2)
+#define USES_FULL_48BIT_PPGTT(dev) (i915.enable_ppgtt == 3)
#define HAS_OVERLAY(dev) (INTEL_INFO(dev)->has_overlay)
#define OVERLAY_NEEDS_PHYSICAL(dev) (INTEL_INFO(dev)->overlay_needs_physical)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7f71746..e099c18 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1105,14 +1105,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
return ret;
ppgtt->base.start = 0;
- ppgtt->base.total = 1ULL << 32;
- if (IS_ENABLED(CONFIG_X86_32))
- /* While we have a proliferation of size_t variables
- * we cannot represent the full ppgtt size on 32bit,
- * so limit it to the same size as the GGTT (currently
- * 2GiB).
- */
- ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
ppgtt->base.cleanup = gen8_ppgtt_cleanup;
ppgtt->base.allocate_va_range = gen8_alloc_va_range;
ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
@@ -1122,10 +1114,25 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->switch_mm = gen8_mm_switch;
- ret = __pdp_init(false, &ppgtt->pdp);
+ if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+ ret = __pdp_init(false, &ppgtt->pdp);
- if (ret)
+ if (ret)
+ goto free_scratch;
+
+ ppgtt->base.total = 1ULL << 32;
+ if (IS_ENABLED(CONFIG_X86_32))
+ /* While we have a proliferation of size_t variables
+ * we cannot represent the full ppgtt size on 32bit,
+ * so limit it to the same size as the GGTT (currently
+ * 2GiB).
+ */
+ ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
+ } else {
+ ppgtt->base.total = 1ULL << 48;
+ ret = -EPERM; /* Not yet implemented */
goto free_scratch;
+ }
return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 87e389c..04bc66f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -88,9 +88,17 @@ typedef uint64_t gen8_pde_t;
* PDPE | PDE | PTE | offset
* The difference as compared to normal x86 3 level page table is the PDPEs are
* programmed via register.
+ *
+ * GEN8 48b legacy style address is defined as a 4 level page table:
+ * 47:39 | 38:30 | 29:21 | 20:12 | 11:0
+ * PML4E | PDPE | PDE | PTE | offset
*/
+#define GEN8_PML4ES_PER_PML4 512
+#define GEN8_PML4E_SHIFT 39
#define GEN8_PDPE_SHIFT 30
-#define GEN8_PDPE_MASK 0x3
+/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
+ * tables */
+#define GEN8_PDPE_MASK 0x1ff
#define GEN8_PDE_SHIFT 21
#define GEN8_PDE_MASK 0x1ff
#define GEN8_PTE_SHIFT 12
@@ -98,8 +106,8 @@ typedef uint64_t gen8_pde_t;
#define GEN8_LEGACY_PDPES 4
#define GEN8_PTES I915_PTES(sizeof(gen8_pte_t))
-/* FIXME: Next patch will use dev */
-#define I915_PDPES_PER_PDP(dev) GEN8_LEGACY_PDPES
+#define I915_PDPES_PER_PDP(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
+ GEN8_PML4ES_PER_PML4 : GEN8_LEGACY_PDPES)
#define PPAT_UNCACHED_INDEX (_PAGE_PWT | _PAGE_PCD)
#define PPAT_CACHED_PDE_INDEX 0 /* WB LLC */
@@ -250,6 +258,13 @@ struct i915_page_directory_pointer {
struct i915_page_directory **page_directory;
};
+struct i915_pml4 {
+ struct i915_page_dma base;
+
+ DECLARE_BITMAP(used_pml4es, GEN8_PML4ES_PER_PML4);
+ struct i915_page_directory_pointer *pdps[GEN8_PML4ES_PER_PML4];
+};
+
struct i915_address_space {
struct drm_mm mm;
struct drm_device *dev;
@@ -345,8 +360,9 @@ struct i915_hw_ppgtt {
struct drm_mm_node node;
unsigned long pd_dirty_rings;
union {
- struct i915_page_directory_pointer pdp;
- struct i915_page_directory pd;
+ struct i915_pml4 pml4; /* GEN8+ & 48b PPGTT */
+ struct i915_page_directory_pointer pdp; /* GEN8+ */
+ struct i915_page_directory pd; /* GEN6-7 */
};
struct drm_i915_file_private *file_priv;
--
2.5.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v9 09/19] drm/i915/gen8: Pass sg_iter through pte inserts
2015-07-29 16:23 ` [PATCH v6 09/19] drm/i915/gen8: Pass sg_iter through pte inserts Michel Thierry
2015-07-30 4:19 ` Goel, Akash
@ 2015-08-03 8:52 ` Michel Thierry
1 sibling, 0 replies; 82+ messages in thread
From: Michel Thierry @ 2015-08-03 8:52 UTC (permalink / raw)
To: intel-gfx
As a step towards implementing 4 levels, while not discarding the
existing pte insert functions, we need to pass the sg_iter through.
The current function understands to the page directory granularity.
An object's pages may span the page directory, and so using the iter
directly as we write the PTEs allows the iterator to stay coherent
through a VMA insert operation spanning multiple page table levels.
v2: Rebase after s/page_tables/page_table/.
v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series;
updated commit message (s/map/insert).
v4: Rebase.
Reviewed-by: Akash Goel <akash.goel@intel.com> (v3)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ea35935..31fc672 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -746,7 +746,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
static void
gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
struct i915_page_directory_pointer *pdp,
- struct sg_table *pages,
+ struct sg_page_iter *sg_iter,
uint64_t start,
enum i915_cache_level cache_level)
{
@@ -756,11 +756,10 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
- struct sg_page_iter sg_iter;
pt_vaddr = NULL;
- for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
+ while (__sg_page_iter_next(sg_iter)) {
if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
break;
@@ -771,7 +770,7 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
}
pt_vaddr[pte] =
- gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
+ gen8_pte_encode(sg_page_iter_dma_address(sg_iter),
cache_level, true);
if (++pte == GEN8_PTES) {
kunmap_px(ppgtt, pt_vaddr);
@@ -797,8 +796,10 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+ struct sg_page_iter sg_iter;
- gen8_ppgtt_insert_pte_entries(vm, pdp, pages, start, cache_level);
+ __sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
+ gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter, start, cache_level);
}
static void gen8_free_page_tables(struct drm_device *dev,
--
2.5.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v9 10/19] drm/i915/gen8: Add 4 level support in insert_entries and clear_range
2015-07-29 16:23 ` [PATCH v6 10/19] drm/i915/gen8: Add 4 level support in insert_entries and clear_range Michel Thierry
2015-07-30 4:50 ` Goel, Akash
@ 2015-08-03 8:53 ` Michel Thierry
2015-08-03 9:23 ` Goel, Akash
2015-08-05 15:46 ` Daniel Vetter
1 sibling, 2 replies; 82+ messages in thread
From: Michel Thierry @ 2015-08-03 8:53 UTC (permalink / raw)
To: intel-gfx; +Cc: Akash Goel
When 48b is enabled, gen8_ppgtt_insert_entries needs to read the Page Map
Level 4 (PML4), before it selects which Page Directory Pointer (PDP)
it will write to.
Similarly, gen8_ppgtt_clear_range needs to get the correct PDP/PD range.
This patch was inspired by Ben's "Depend exclusively on map and
unmap_vma".
v2: Rebase after s/page_tables/page_table/.
v3: Remove unnecessary pdpe loop in gen8_ppgtt_clear_range_4lvl and use
clamp_pdp in gen8_ppgtt_insert_entries (Akash).
v4: Merge gen8_ppgtt_clear_range_4lvl into gen8_ppgtt_clear_range to
maintain symmetry with gen8_ppgtt_insert_entries (Akash).
v5: Do not mix pages and bytes in insert_entries (Akash).
v6: Prevent overflow in sg_nents << PAGE_SHIFT, when inserting 4GB at
once.
v7: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
Use gen8_px_index functions, and remove unnecessary number of pages
parameter in insert_pte_entries.
v8: Change gen8_ppgtt_clear_pte_range to stop at PDP boundary, instead of
adding and extra clamp function; remove unnecessary pdp_start/pdp_len
variables (Akash).
v9: pages->orig_nents instead of sg_nents(pages->sgl) to get the
length (Akash).
v10: Remove pdp warning check ingen8_ppgtt_insert_pte_entries until this
commit (Akash).
Reviewed-by: Akash Goel <akash.goel@intel.com> (v9)
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 52 +++++++++++++++++++++++++------------
1 file changed, 36 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 31fc672..d5ae5de 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -681,9 +681,9 @@ static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
gen8_pte_t *pt_vaddr;
- unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
- unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
- unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
+ unsigned pdpe = gen8_pdpe_index(start);
+ unsigned pde = gen8_pde_index(start);
+ unsigned pte = gen8_pte_index(start);
unsigned num_entries = length >> PAGE_SHIFT;
unsigned last_pte, i;
@@ -722,7 +722,8 @@ static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
pte = 0;
if (++pde == I915_PDES) {
- pdpe++;
+ if (++pdpe == I915_PDPES_PER_PDP(vm->dev))
+ break;
pde = 0;
}
}
@@ -735,12 +736,21 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
- struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
-
gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
I915_CACHE_LLC, use_scratch);
- gen8_ppgtt_clear_pte_range(vm, pdp, start, length, scratch_pte);
+ if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+ gen8_ppgtt_clear_pte_range(vm, &ppgtt->pdp, start, length,
+ scratch_pte);
+ } else {
+ uint64_t templ4, pml4e;
+ struct i915_page_directory_pointer *pdp;
+
+ gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
+ gen8_ppgtt_clear_pte_range(vm, pdp, start, length,
+ scratch_pte);
+ }
+ }
}
static void
@@ -753,16 +763,13 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
gen8_pte_t *pt_vaddr;
- unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
- unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
- unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
+ unsigned pdpe = gen8_pdpe_index(start);
+ unsigned pde = gen8_pde_index(start);
+ unsigned pte = gen8_pte_index(start);
pt_vaddr = NULL;
while (__sg_page_iter_next(sg_iter)) {
- if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
- break;
-
if (pt_vaddr == NULL) {
struct i915_page_directory *pd = pdp->page_directory[pdpe];
struct i915_page_table *pt = pd->page_table[pde];
@@ -776,7 +783,8 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
kunmap_px(ppgtt, pt_vaddr);
pt_vaddr = NULL;
if (++pde == I915_PDES) {
- pdpe++;
+ if (++pdpe == I915_PDPES_PER_PDP(vm->dev))
+ break;
pde = 0;
}
pte = 0;
@@ -795,11 +803,23 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
- struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
struct sg_page_iter sg_iter;
__sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
- gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter, start, cache_level);
+
+ if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+ gen8_ppgtt_insert_pte_entries(vm, &ppgtt->pdp, &sg_iter, start,
+ cache_level);
+ } else {
+ struct i915_page_directory_pointer *pdp;
+ uint64_t templ4, pml4e;
+ uint64_t length = (uint64_t)pages->orig_nents << PAGE_SHIFT;
+
+ gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
+ gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter,
+ start, cache_level);
+ }
+ }
}
static void gen8_free_page_tables(struct drm_device *dev,
--
2.5.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 82+ messages in thread
* Re: [PATCH v9 06/19] drm/i915/gen8: Add PML4 structure
2015-08-03 8:52 ` [PATCH v9 " Michel Thierry
@ 2015-08-03 9:20 ` Goel, Akash
0 siblings, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-08-03 9:20 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 8/3/2015 2:22 PM, Michel Thierry wrote:
> Introduces the Page Map Level 4 (PML4), ie. the new top level structure
> of the page tables.
>
> To facilitate testing, 48b mode will be available on Broadwell and
> GEN9+, when i915.enable_ppgtt = 3.
>
> v2: Remove unnecessary CONFIG_X86_64 checks, ppgtt code is already
> 32/64-bit safe (Chris).
> v3: Add goto free_scratch in temp 48-bit mode init code (Akash).
> v4: kfree the pdp until the 4lvl alloc/free patch (Akash).
> v5: Postpone 48-bit code in sanitize_enable_ppgtt (Akash).
> v6: Keep _insert_pte_entries changes outside this patch (Akash).
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 3 ++-
> drivers/gpu/drm/i915/i915_gem_gtt.c | 27 +++++++++++++++++----------
> drivers/gpu/drm/i915/i915_gem_gtt.h | 26 +++++++++++++++++++++-----
> 3 files changed, 40 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 04aa34a..4729eaf 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2498,7 +2498,8 @@ struct drm_i915_cmd_table {
> #define HAS_HW_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 6)
> #define HAS_LOGICAL_RING_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 8)
> #define USES_PPGTT(dev) (i915.enable_ppgtt)
> -#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt == 2)
> +#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt >= 2)
> +#define USES_FULL_48BIT_PPGTT(dev) (i915.enable_ppgtt == 3)
>
> #define HAS_OVERLAY(dev) (INTEL_INFO(dev)->has_overlay)
> #define OVERLAY_NEEDS_PHYSICAL(dev) (INTEL_INFO(dev)->overlay_needs_physical)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 7f71746..e099c18 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1105,14 +1105,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
> return ret;
>
> ppgtt->base.start = 0;
> - ppgtt->base.total = 1ULL << 32;
> - if (IS_ENABLED(CONFIG_X86_32))
> - /* While we have a proliferation of size_t variables
> - * we cannot represent the full ppgtt size on 32bit,
> - * so limit it to the same size as the GGTT (currently
> - * 2GiB).
> - */
> - ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
> ppgtt->base.cleanup = gen8_ppgtt_cleanup;
> ppgtt->base.allocate_va_range = gen8_alloc_va_range;
> ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
> @@ -1122,10 +1114,25 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>
> ppgtt->switch_mm = gen8_mm_switch;
>
> - ret = __pdp_init(false, &ppgtt->pdp);
> + if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> + ret = __pdp_init(false, &ppgtt->pdp);
>
> - if (ret)
> + if (ret)
> + goto free_scratch;
> +
> + ppgtt->base.total = 1ULL << 32;
> + if (IS_ENABLED(CONFIG_X86_32))
> + /* While we have a proliferation of size_t variables
> + * we cannot represent the full ppgtt size on 32bit,
> + * so limit it to the same size as the GGTT (currently
> + * 2GiB).
> + */
> + ppgtt->base.total = to_i915(ppgtt->base.dev)->gtt.base.total;
> + } else {
> + ppgtt->base.total = 1ULL << 48;
> + ret = -EPERM; /* Not yet implemented */
> goto free_scratch;
> + }
>
> return 0;
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 87e389c..04bc66f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -88,9 +88,17 @@ typedef uint64_t gen8_pde_t;
> * PDPE | PDE | PTE | offset
> * The difference as compared to normal x86 3 level page table is the PDPEs are
> * programmed via register.
> + *
> + * GEN8 48b legacy style address is defined as a 4 level page table:
> + * 47:39 | 38:30 | 29:21 | 20:12 | 11:0
> + * PML4E | PDPE | PDE | PTE | offset
> */
> +#define GEN8_PML4ES_PER_PML4 512
> +#define GEN8_PML4E_SHIFT 39
> #define GEN8_PDPE_SHIFT 30
> -#define GEN8_PDPE_MASK 0x3
> +/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
> + * tables */
> +#define GEN8_PDPE_MASK 0x1ff
> #define GEN8_PDE_SHIFT 21
> #define GEN8_PDE_MASK 0x1ff
> #define GEN8_PTE_SHIFT 12
> @@ -98,8 +106,8 @@ typedef uint64_t gen8_pde_t;
> #define GEN8_LEGACY_PDPES 4
> #define GEN8_PTES I915_PTES(sizeof(gen8_pte_t))
>
> -/* FIXME: Next patch will use dev */
> -#define I915_PDPES_PER_PDP(dev) GEN8_LEGACY_PDPES
> +#define I915_PDPES_PER_PDP(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
> + GEN8_PML4ES_PER_PML4 : GEN8_LEGACY_PDPES)
>
> #define PPAT_UNCACHED_INDEX (_PAGE_PWT | _PAGE_PCD)
> #define PPAT_CACHED_PDE_INDEX 0 /* WB LLC */
> @@ -250,6 +258,13 @@ struct i915_page_directory_pointer {
> struct i915_page_directory **page_directory;
> };
>
> +struct i915_pml4 {
> + struct i915_page_dma base;
> +
> + DECLARE_BITMAP(used_pml4es, GEN8_PML4ES_PER_PML4);
> + struct i915_page_directory_pointer *pdps[GEN8_PML4ES_PER_PML4];
> +};
> +
> struct i915_address_space {
> struct drm_mm mm;
> struct drm_device *dev;
> @@ -345,8 +360,9 @@ struct i915_hw_ppgtt {
> struct drm_mm_node node;
> unsigned long pd_dirty_rings;
> union {
> - struct i915_page_directory_pointer pdp;
> - struct i915_page_directory pd;
> + struct i915_pml4 pml4; /* GEN8+ & 48b PPGTT */
> + struct i915_page_directory_pointer pdp; /* GEN8+ */
> + struct i915_page_directory pd; /* GEN6-7 */
> };
>
> struct drm_i915_file_private *file_priv;
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v9 10/19] drm/i915/gen8: Add 4 level support in insert_entries and clear_range
2015-08-03 8:53 ` [PATCH v9 " Michel Thierry
@ 2015-08-03 9:23 ` Goel, Akash
2015-08-05 15:46 ` Daniel Vetter
1 sibling, 0 replies; 82+ messages in thread
From: Goel, Akash @ 2015-08-03 9:23 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Reviewed the patch & it looks fine.
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
On 8/3/2015 2:23 PM, Michel Thierry wrote:
> When 48b is enabled, gen8_ppgtt_insert_entries needs to read the Page Map
> Level 4 (PML4), before it selects which Page Directory Pointer (PDP)
> it will write to.
>
> Similarly, gen8_ppgtt_clear_range needs to get the correct PDP/PD range.
>
> This patch was inspired by Ben's "Depend exclusively on map and
> unmap_vma".
>
> v2: Rebase after s/page_tables/page_table/.
> v3: Remove unnecessary pdpe loop in gen8_ppgtt_clear_range_4lvl and use
> clamp_pdp in gen8_ppgtt_insert_entries (Akash).
> v4: Merge gen8_ppgtt_clear_range_4lvl into gen8_ppgtt_clear_range to
> maintain symmetry with gen8_ppgtt_insert_entries (Akash).
> v5: Do not mix pages and bytes in insert_entries (Akash).
> v6: Prevent overflow in sg_nents << PAGE_SHIFT, when inserting 4GB at
> once.
> v7: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
> Use gen8_px_index functions, and remove unnecessary number of pages
> parameter in insert_pte_entries.
> v8: Change gen8_ppgtt_clear_pte_range to stop at PDP boundary, instead of
> adding and extra clamp function; remove unnecessary pdp_start/pdp_len
> variables (Akash).
> v9: pages->orig_nents instead of sg_nents(pages->sgl) to get the
> length (Akash).
> v10: Remove pdp warning check ingen8_ppgtt_insert_pte_entries until this
> commit (Akash).
>
> Reviewed-by: Akash Goel <akash.goel@intel.com> (v9)
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 52 +++++++++++++++++++++++++------------
> 1 file changed, 36 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 31fc672..d5ae5de 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -681,9 +681,9 @@ static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> gen8_pte_t *pt_vaddr;
> - unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> - unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> - unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> + unsigned pdpe = gen8_pdpe_index(start);
> + unsigned pde = gen8_pde_index(start);
> + unsigned pte = gen8_pte_index(start);
> unsigned num_entries = length >> PAGE_SHIFT;
> unsigned last_pte, i;
>
> @@ -722,7 +722,8 @@ static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
>
> pte = 0;
> if (++pde == I915_PDES) {
> - pdpe++;
> + if (++pdpe == I915_PDPES_PER_PDP(vm->dev))
> + break;
> pde = 0;
> }
> }
> @@ -735,12 +736,21 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> {
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> - struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> -
> gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
> I915_CACHE_LLC, use_scratch);
>
> - gen8_ppgtt_clear_pte_range(vm, pdp, start, length, scratch_pte);
> + if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
> + gen8_ppgtt_clear_pte_range(vm, &ppgtt->pdp, start, length,
> + scratch_pte);
> + } else {
> + uint64_t templ4, pml4e;
> + struct i915_page_directory_pointer *pdp;
> +
> + gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
> + gen8_ppgtt_clear_pte_range(vm, pdp, start, length,
> + scratch_pte);
> + }
> + }
> }
>
> static void
> @@ -753,16 +763,13 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> gen8_pte_t *pt_vaddr;
> - unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> - unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> - unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> + unsigned pdpe = gen8_pdpe_index(start);
> + unsigned pde = gen8_pde_index(start);
> + unsigned pte = gen8_pte_index(start);
>
> pt_vaddr = NULL;
>
> while (__sg_page_iter_next(sg_iter)) {
> - if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
> - break;
> -
> if (pt_vaddr == NULL) {
> struct i915_page_directory *pd = pdp->page_directory[pdpe];
> struct i915_page_table *pt = pd->page_table[pde];
> @@ -776,7 +783,8 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
> kunmap_px(ppgtt, pt_vaddr);
> pt_vaddr = NULL;
> if (++pde == I915_PDES) {
> - pdpe++;
> + if (++pdpe == I915_PDPES_PER_PDP(vm->dev))
> + break;
> pde = 0;
> }
> pte = 0;
> @@ -795,11 +803,23 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> {
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> - struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> struct sg_page_iter sg_iter;
>
> __sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
> - gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter, start, cache_level);
> +
> + if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
> + gen8_ppgtt_insert_pte_entries(vm, &ppgtt->pdp, &sg_iter, start,
> + cache_level);
> + } else {
> + struct i915_page_directory_pointer *pdp;
> + uint64_t templ4, pml4e;
> + uint64_t length = (uint64_t)pages->orig_nents << PAGE_SHIFT;
> +
> + gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
> + gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter,
> + start, cache_level);
> + }
> + }
> }
>
> static void gen8_free_page_tables(struct drm_device *dev,
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 00/19] 48-bit PPGTT
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
` (19 preceding siblings ...)
2015-07-30 11:26 ` [PATCH v6 00/19] 48-bit PPGTT Chris Wilson
@ 2015-08-03 9:51 ` Michel Thierry
20 siblings, 0 replies; 82+ messages in thread
From: Michel Thierry @ 2015-08-03 9:51 UTC (permalink / raw)
To: intel-gfx@lists.freedesktop.org, Daniel Vetter; +Cc: Goel, Akash
On 7/29/2015 5:23 PM, Michel Thierry wrote:
> Michel Thierry (19):
> drm/i915: Remove unnecessary gen8_clamp_pd
> drm/i915/gen8: Make pdp allocation more dynamic
> drm/i915/gen8: Abstract PDP usage
> drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
> drm/i915/gen8: Add dynamic page trace events
> drm/i915/gen8: Add PML4 structure
> drm/i915/gen8: implement alloc/free for 4lvl
> drm/i915/gen8: Add 4 level switching infrastructure and lrc support
> drm/i915/gen8: Pass sg_iter through pte inserts
> drm/i915/gen8: Add 4 level support in insert_entries and clear_range
> drm/i915/gen8: Initialize PDPs and PML4
> drm/i915: Expand error state's address width to 64b
> drm/i915/gen8: Add ppgtt info and debug_dump
> drm/i915: object size needs to be u64
> drm/i915: batch_obj vm offset must be u64
> drm/i915/userptr: Kill user_size limit check
> drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
> drm/i915/gen8: Flip the 48b switch
> drm/i915: Save some page table setup on repeated binds
>
> drivers/gpu/drm/i915/i915_debugfs.c | 18 +-
> drivers/gpu/drm/i915/i915_drv.h | 11 +-
> drivers/gpu/drm/i915/i915_gem.c | 30 +-
> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 +
> drivers/gpu/drm/i915/i915_gem_gtt.c | 665 ++++++++++++++++++++++++-----
> drivers/gpu/drm/i915/i915_gem_gtt.h | 64 ++-
> drivers/gpu/drm/i915/i915_gem_userptr.c | 4 -
> drivers/gpu/drm/i915/i915_gpu_error.c | 24 +-
> drivers/gpu/drm/i915/i915_params.c | 2 +-
> drivers/gpu/drm/i915/i915_reg.h | 1 +
> drivers/gpu/drm/i915/i915_trace.h | 32 +-
> drivers/gpu/drm/i915/intel_lrc.c | 60 ++-
> include/uapi/drm/i915_drm.h | 3 +-
> 13 files changed, 747 insertions(+), 180 deletions(-)
>
> --
> 2.4.5
>
Hi Daniel,
Finally all the patches have Akash's r-b.
Since there were still some small changes by him and Chris, I addressed
them individually (instead of resending the whole series one more time).
Below are the msg-id of the last versions of each of them, in case there
are some doubts about which patches to merge.
Note, the last patch (drm/i915: Save some page table setup on repeated
binds) is an optimization Akash recommended. That's why he didn't review
it. Do you have someone in mind to check it? Or should I ask around for
volunteers?
Thanks,
-Michel
[01/19] drm/i915: Remove unnecessary gen8_clamp_pd
1438187043-34267-2-git-send-email-michel.thierry@intel.com
[02/19] drm/i915/gen8: Make pdp allocation more dynamic
1438187043-34267-3-git-send-email-michel.thierry@intel.com
[03/19] drm/i915/gen8: Abstract PDP usage
1438250523-22533-1-git-send-email-michel.thierry@intel.com
[04/19] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
1438250569-22618-1-git-send-email-michel.thierry@intel.com
[05/19] drm/i915/gen8: Add dynamic page trace events
1438187043-34267-6-git-send-email-michel.thierry@intel.com
[06/19] drm/i915/gen8: Add PML4 structure
1438591921-3087-1-git-send-email-michel.thierry@intel.com
[07/19] drm/i915/gen8: implement alloc/free for 4lvl
1438250729-22955-1-git-send-email-michel.thierry@intel.com
[08/19] drm/i915/gen8: Add 4 level switching infrastructure and lrc
support
1438250783-23118-1-git-send-email-michel.thierry@intel.com
[09/19] drm/i915/gen8: Pass sg_iter through pte inserts
1438591967-3249-1-git-send-email-michel.thierry@intel.com
[10/19] drm/i915/gen8: Add 4 level support in insert_entries and
clear_range
1438592007-3354-1-git-send-email-michel.thierry@intel.com
[11/19] drm/i915/gen8: Initialize PDPs and PML4
1438187043-34267-12-git-send-email-michel.thierry@intel.com
[12/19] drm/i915: Expand error state's address width to 64b
1438187043-34267-13-git-send-email-michel.thierry@intel.com
[13/19] drm/i915/gen8: Add ppgtt info and debug_dump
1438187043-34267-14-git-send-email-michel.thierry@intel.com
[14/19] drm/i915: object size needs to be u64
1438187043-34267-15-git-send-email-michel.thierry@intel.com
[15/19] drm/i915: batch_obj vm offset must be u64
1438187043-34267-16-git-send-email-michel.thierry@intel.com
[16/19] drm/i915/userptr: Kill user_size limit check
1438187043-34267-17-git-send-email-michel.thierry@intel.com
[17/19] drm/i915: Wa32bitGeneralStateOffset &
Wa32bitInstructionBaseOffset
1438187043-34267-18-git-send-email-michel.thierry@intel.com
[18/19] drm/i915/gen8: Flip the 48b switch
1438346110-18985-1-git-send-email-michel.thierry@intel.com
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 02/19] drm/i915/gen8: Make pdp allocation more dynamic
2015-07-29 16:23 ` [PATCH v6 02/19] drm/i915/gen8: Make pdp allocation more dynamic Michel Thierry
2015-07-30 3:18 ` Goel, Akash
@ 2015-08-05 15:31 ` Daniel Vetter
2015-08-05 15:49 ` Michel Thierry
1 sibling, 1 reply; 82+ messages in thread
From: Daniel Vetter @ 2015-08-05 15:31 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx, akash.goel
On Wed, Jul 29, 2015 at 05:23:46PM +0100, Michel Thierry wrote:
> This transitional patch doesn't do much for the existing code. However,
> it should make upcoming patches to use the full 48b address space a bit
> easier.
commit message should also mention how exactly it's more dynamic and why
exactly that's useful ... It's ofc possible to infer that from the
context, but that won't be the case any more if you look at the patch
alone (with git blame or after a bisect). Please follow up with a few
words so I can add them to the commit message.
-Daniel
>
> v2: Renamed pdp_free to be similar to pd/pt (unmap_and_free_pdp).
> v3: To facilitate testing, 48b mode will be available on Broadwell and
> GEN9+, when i915.enable_ppgtt = 3.
> v4: Rebase after s/page_tables/page_table/, added extra information
> about 4-level page table formats and use IS_ENABLED macro.
> v5: Check CONFIG_X86_64 instead of CONFIG_64BIT.
> v6: Rebase after Mika's ppgtt cleanup / scratch merge patch series, and
> follow
> his nomenclature in pdp functions (there is no alloc_pdp yet).
> v7: Rebase after merged version of Mika's ppgtt cleanup patch series.
> v8: Rebase after final merged version of Mika's ppgtt/scratch patches.
> v9: Introduce PML4 (and 48-bit checks) until next patch (Akash).
> v10: Also use test_bit to detect when pd/pt are already allocated (Akash)
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 86 +++++++++++++++++++++++++++++--------
> drivers/gpu/drm/i915/i915_gem_gtt.h | 17 +++++---
> 2 files changed, 80 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 189572d..28f3227 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -522,6 +522,43 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
> fill_px(vm->dev, pd, scratch_pde);
> }
>
> +static int __pdp_init(struct drm_device *dev,
> + struct i915_page_directory_pointer *pdp)
> +{
> + size_t pdpes = I915_PDPES_PER_PDP(dev);
> +
> + pdp->used_pdpes = kcalloc(BITS_TO_LONGS(pdpes),
> + sizeof(unsigned long),
> + GFP_KERNEL);
> + if (!pdp->used_pdpes)
> + return -ENOMEM;
> +
> + pdp->page_directory = kcalloc(pdpes, sizeof(*pdp->page_directory),
> + GFP_KERNEL);
> + if (!pdp->page_directory) {
> + kfree(pdp->used_pdpes);
> + /* the PDP might be the statically allocated top level. Keep it
> + * as clean as possible */
> + pdp->used_pdpes = NULL;
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +
> +static void __pdp_fini(struct i915_page_directory_pointer *pdp)
> +{
> + kfree(pdp->used_pdpes);
> + kfree(pdp->page_directory);
> + pdp->page_directory = NULL;
> +}
> +
> +static void free_pdp(struct drm_device *dev,
> + struct i915_page_directory_pointer *pdp)
> +{
> + __pdp_fini(pdp);
> +}
> +
> /* Broadwell Page Directory Pointer Descriptors */
> static int gen8_write_pdp(struct drm_i915_gem_request *req,
> unsigned entry,
> @@ -720,7 +757,8 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> container_of(vm, struct i915_hw_ppgtt, base);
> int i;
>
> - for_each_set_bit(i, ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES) {
> + for_each_set_bit(i, ppgtt->pdp.used_pdpes,
> + I915_PDPES_PER_PDP(ppgtt->base.dev)) {
> if (WARN_ON(!ppgtt->pdp.page_directory[i]))
> continue;
>
> @@ -729,6 +767,7 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
> }
>
> + free_pdp(ppgtt->base.dev, &ppgtt->pdp);
> gen8_free_scratch(vm);
> }
>
> @@ -763,7 +802,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
>
> gen8_for_each_pde(pt, pd, start, length, temp, pde) {
> /* Don't reallocate page tables */
> - if (pt) {
> + if (test_bit(pde, pd->used_pdes)) {
> /* Scratch is never allocated this way */
> WARN_ON(pt == ppgtt->base.scratch_pt);
> continue;
> @@ -820,11 +859,12 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
> struct i915_page_directory *pd;
> uint64_t temp;
> uint32_t pdpe;
> + uint32_t pdpes = I915_PDPES_PER_PDP(dev);
>
> - WARN_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
> + WARN_ON(!bitmap_empty(new_pds, pdpes));
>
> gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
> - if (pd)
> + if (test_bit(pdpe, pdp->used_pdpes))
> continue;
>
> pd = alloc_pd(dev);
> @@ -839,18 +879,19 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
> return 0;
>
> unwind_out:
> - for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
> + for_each_set_bit(pdpe, new_pds, pdpes)
> free_pd(dev, pdp->page_directory[pdpe]);
>
> return -ENOMEM;
> }
>
> static void
> -free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
> +free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
> + uint32_t pdpes)
> {
> int i;
>
> - for (i = 0; i < GEN8_LEGACY_PDPES; i++)
> + for (i = 0; i < pdpes; i++)
> kfree(new_pts[i]);
> kfree(new_pts);
> kfree(new_pds);
> @@ -861,23 +902,24 @@ free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
> */
> static
> int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
> - unsigned long ***new_pts)
> + unsigned long ***new_pts,
> + uint32_t pdpes)
> {
> int i;
> unsigned long *pds;
> unsigned long **pts;
>
> - pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES), sizeof(unsigned long), GFP_KERNEL);
> + pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL);
> if (!pds)
> return -ENOMEM;
>
> - pts = kcalloc(GEN8_LEGACY_PDPES, sizeof(unsigned long *), GFP_KERNEL);
> + pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL);
> if (!pts) {
> kfree(pds);
> return -ENOMEM;
> }
>
> - for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
> + for (i = 0; i < pdpes; i++) {
> pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES),
> sizeof(unsigned long), GFP_KERNEL);
> if (!pts[i])
> @@ -890,7 +932,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
> return 0;
>
> err_out:
> - free_gen8_temp_bitmaps(pds, pts);
> + free_gen8_temp_bitmaps(pds, pts, pdpes);
> return -ENOMEM;
> }
>
> @@ -916,6 +958,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> const uint64_t orig_length = length;
> uint64_t temp;
> uint32_t pdpe;
> + uint32_t pdpes = I915_PDPES_PER_PDP(ppgtt->base.dev);
> int ret;
>
> /* Wrap is never okay since we can only represent 48b, and we don't
> @@ -927,7 +970,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> if (WARN_ON(start + length > ppgtt->base.total))
> return -ENODEV;
>
> - ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
> + ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables, pdpes);
> if (ret)
> return ret;
>
> @@ -935,7 +978,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
> new_page_dirs);
> if (ret) {
> - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
> return ret;
> }
>
> @@ -989,7 +1032,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> __set_bit(pdpe, ppgtt->pdp.used_pdpes);
> }
>
> - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
> mark_tlbs_dirty(ppgtt);
> return 0;
>
> @@ -999,10 +1042,10 @@ err_out:
> free_pt(vm->dev, ppgtt->pdp.page_directory[pdpe]->page_table[temp]);
> }
>
> - for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
> + for_each_set_bit(pdpe, new_page_dirs, pdpes)
> free_pd(vm->dev, ppgtt->pdp.page_directory[pdpe]);
>
> - free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
> + free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
> mark_tlbs_dirty(ppgtt);
> return ret;
> }
> @@ -1040,7 +1083,16 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
>
> ppgtt->switch_mm = gen8_mm_switch;
>
> + ret = __pdp_init(false, &ppgtt->pdp);
> +
> + if (ret)
> + goto free_scratch;
> +
> return 0;
> +
> +free_scratch:
> + gen8_free_scratch(&ppgtt->base);
> + return ret;
> }
>
> static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index d5bf953..87e389c 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -98,6 +98,9 @@ typedef uint64_t gen8_pde_t;
> #define GEN8_LEGACY_PDPES 4
> #define GEN8_PTES I915_PTES(sizeof(gen8_pte_t))
>
> +/* FIXME: Next patch will use dev */
> +#define I915_PDPES_PER_PDP(dev) GEN8_LEGACY_PDPES
> +
> #define PPAT_UNCACHED_INDEX (_PAGE_PWT | _PAGE_PCD)
> #define PPAT_CACHED_PDE_INDEX 0 /* WB LLC */
> #define PPAT_CACHED_INDEX _PAGE_PAT /* WB LLCeLLC */
> @@ -241,9 +244,10 @@ struct i915_page_directory {
> };
>
> struct i915_page_directory_pointer {
> - /* struct page *page; */
> - DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
> - struct i915_page_directory *page_directory[GEN8_LEGACY_PDPES];
> + struct i915_page_dma base;
> +
> + unsigned long *used_pdpes;
> + struct i915_page_directory **page_directory;
> };
>
> struct i915_address_space {
> @@ -436,9 +440,10 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
> temp = min(temp, length), \
> start += temp, length -= temp)
>
> -#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter) \
> - for (iter = gen8_pdpe_index(start); \
> - pd = (pdp)->page_directory[iter], length > 0 && iter < GEN8_LEGACY_PDPES; \
> +#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter) \
> + for (iter = gen8_pdpe_index(start); \
> + pd = (pdp)->page_directory[iter], \
> + length > 0 && (iter < I915_PDPES_PER_PDP(dev)); \
> iter++, \
> temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start, \
> temp = min(temp, length), \
> --
> 2.4.5
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v7 03/19] drm/i915/gen8: Abstract PDP usage
2015-07-31 4:11 ` Goel, Akash
@ 2015-08-05 15:33 ` Daniel Vetter
0 siblings, 0 replies; 82+ messages in thread
From: Daniel Vetter @ 2015-08-05 15:33 UTC (permalink / raw)
To: Goel, Akash; +Cc: intel-gfx
On Fri, Jul 31, 2015 at 09:41:11AM +0530, Goel, Akash wrote:
> Reviewed the patch & it looks fine.
> Reviewed-by: "Akash Goel <akash.goel@intel.com>"
Just an aside "" is not required in your tag here and actually breaks it
as a proper mail address - "" if needed should only wrap the name, but
must not include the mail address itself.
-Daniel
>
>
> On 7/30/2015 3:32 PM, Michel Thierry wrote:
> >Up until now, ppgtt->pdp has always been the root of our page tables.
> >Legacy 32b addresses acted like it had 1 PDP with 4 PDPEs.
> >
> >In preparation for 4 level page tables, we need to stop using ppgtt->pdp
> >directly unless we know it's what we want. The future structure will use
> >ppgtt->pml4 for the top level, and the pdp is just one of the entries
> >being pointed to by a pml4e. The temporal pdp local variable will be
> >removed once the rest of the 4-level code lands.
> >
> >Also, start passing the vm pointer to the alloc functions, instead of
> >ppgtt.
> >
> >v2: Updated after dynamic page allocation changes.
> >v3: Rebase after s/page_tables/page_table/.
> >v4: Rebase after changes in "Dynamic page table allocations" patch.
> >v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
> >v6: Rebase after final merged version of Mika's ppgtt/scratch patches.
> >v7: Keep pagetable map in-line (and avoid unnecessary for_each_pde
> >loops), remove redundant ppgtt pointer in _alloc_pagetabs (Akash)
> >v8: Fix text indentation in _alloc_pagetabs/page_directories (Chris)
> >v9: Defer gen8_alloc_va_range_4lvl definition until 4lvl is implemented,
> >clean-up gen8_ppgtt_cleanup [pun intended] (Akash).
> >v10: Clean-up commit message (Akash).
> >
> >Cc: Akash Goel <akash.goel@intel.com>
> >Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> >Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
> >---
> > drivers/gpu/drm/i915/i915_gem_gtt.c | 84 +++++++++++++++++++------------------
> > 1 file changed, 44 insertions(+), 40 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >index 28f3227..bd56979 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >@@ -607,6 +607,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> > {
> > struct i915_hw_ppgtt *ppgtt =
> > container_of(vm, struct i915_hw_ppgtt, base);
> >+ struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> > gen8_pte_t *pt_vaddr, scratch_pte;
> > unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> > unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> >@@ -621,10 +622,10 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> > struct i915_page_directory *pd;
> > struct i915_page_table *pt;
> >
> >- if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
> >+ if (WARN_ON(!pdp->page_directory[pdpe]))
> > break;
> >
> >- pd = ppgtt->pdp.page_directory[pdpe];
> >+ pd = pdp->page_directory[pdpe];
> >
> > if (WARN_ON(!pd->page_table[pde]))
> > break;
> >@@ -662,6 +663,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> > {
> > struct i915_hw_ppgtt *ppgtt =
> > container_of(vm, struct i915_hw_ppgtt, base);
> >+ struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> > gen8_pte_t *pt_vaddr;
> > unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> > unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> >@@ -675,7 +677,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> > break;
> >
> > if (pt_vaddr == NULL) {
> >- struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
> >+ struct i915_page_directory *pd = pdp->page_directory[pdpe];
> > struct i915_page_table *pt = pd->page_table[pde];
> > pt_vaddr = kmap_px(pt);
> > }
> >@@ -755,28 +757,29 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> > {
> > struct i915_hw_ppgtt *ppgtt =
> > container_of(vm, struct i915_hw_ppgtt, base);
> >+ struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> >+ struct drm_device *dev = ppgtt->base.dev;
> > int i;
> >
> >- for_each_set_bit(i, ppgtt->pdp.used_pdpes,
> >- I915_PDPES_PER_PDP(ppgtt->base.dev)) {
> >- if (WARN_ON(!ppgtt->pdp.page_directory[i]))
> >+ for_each_set_bit(i, pdp->used_pdpes, I915_PDPES_PER_PDP(dev)) {
> >+ if (WARN_ON(!pdp->page_directory[i]))
> > continue;
> >
> >- gen8_free_page_tables(ppgtt->base.dev,
> >- ppgtt->pdp.page_directory[i]);
> >- free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
> >+ gen8_free_page_tables(dev, pdp->page_directory[i]);
> >+ free_pd(dev, pdp->page_directory[i]);
> > }
> >
> >- free_pdp(ppgtt->base.dev, &ppgtt->pdp);
> >+ free_pdp(dev, pdp);
> >+
> > gen8_free_scratch(vm);
> > }
> >
> > /**
> > * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
> >- * @ppgtt: Master ppgtt structure.
> >- * @pd: Page directory for this address range.
> >+ * @vm: Master vm structure.
> >+ * @pd: Page directory for this address range.
> > * @start: Starting virtual address to begin allocations.
> >- * @length Size of the allocations.
> >+ * @length: Size of the allocations.
> > * @new_pts: Bitmap set by function with new allocations. Likely used by the
> > * caller to free on error.
> > *
> >@@ -789,13 +792,13 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
> > *
> > * Return: 0 if success; negative error code otherwise.
> > */
> >-static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
> >+static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
> > struct i915_page_directory *pd,
> > uint64_t start,
> > uint64_t length,
> > unsigned long *new_pts)
> > {
> >- struct drm_device *dev = ppgtt->base.dev;
> >+ struct drm_device *dev = vm->dev;
> > struct i915_page_table *pt;
> > uint64_t temp;
> > uint32_t pde;
> >@@ -804,7 +807,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
> > /* Don't reallocate page tables */
> > if (test_bit(pde, pd->used_pdes)) {
> > /* Scratch is never allocated this way */
> >- WARN_ON(pt == ppgtt->base.scratch_pt);
> >+ WARN_ON(pt == vm->scratch_pt);
> > continue;
> > }
> >
> >@@ -812,7 +815,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
> > if (IS_ERR(pt))
> > goto unwind_out;
> >
> >- gen8_initialize_pt(&ppgtt->base, pt);
> >+ gen8_initialize_pt(vm, pt);
> > pd->page_table[pde] = pt;
> > __set_bit(pde, new_pts);
> > }
> >@@ -828,11 +831,11 @@ unwind_out:
> >
> > /**
> > * gen8_ppgtt_alloc_page_directories() - Allocate page directories for VA range.
> >- * @ppgtt: Master ppgtt structure.
> >+ * @vm: Master vm structure.
> > * @pdp: Page directory pointer for this address range.
> > * @start: Starting virtual address to begin allocations.
> >- * @length Size of the allocations.
> >- * @new_pds Bitmap set by function with new allocations. Likely used by the
> >+ * @length: Size of the allocations.
> >+ * @new_pds: Bitmap set by function with new allocations. Likely used by the
> > * caller to free on error.
> > *
> > * Allocate the required number of page directories starting at the pde index of
> >@@ -849,13 +852,14 @@ unwind_out:
> > *
> > * Return: 0 if success; negative error code otherwise.
> > */
> >-static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
> >- struct i915_page_directory_pointer *pdp,
> >- uint64_t start,
> >- uint64_t length,
> >- unsigned long *new_pds)
> >+static int
> >+gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
> >+ struct i915_page_directory_pointer *pdp,
> >+ uint64_t start,
> >+ uint64_t length,
> >+ unsigned long *new_pds)
> > {
> >- struct drm_device *dev = ppgtt->base.dev;
> >+ struct drm_device *dev = vm->dev;
> > struct i915_page_directory *pd;
> > uint64_t temp;
> > uint32_t pdpe;
> >@@ -871,7 +875,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
> > if (IS_ERR(pd))
> > goto unwind_out;
> >
> >- gen8_initialize_pd(&ppgtt->base, pd);
> >+ gen8_initialize_pd(vm, pd);
> > pdp->page_directory[pdpe] = pd;
> > __set_bit(pdpe, new_pds);
> > }
> >@@ -947,18 +951,19 @@ static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
> > }
> >
> > static int gen8_alloc_va_range(struct i915_address_space *vm,
> >- uint64_t start,
> >- uint64_t length)
> >+ uint64_t start, uint64_t length)
> > {
> > struct i915_hw_ppgtt *ppgtt =
> > container_of(vm, struct i915_hw_ppgtt, base);
> > unsigned long *new_page_dirs, **new_page_tables;
> >+ struct drm_device *dev = vm->dev;
> >+ struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> > struct i915_page_directory *pd;
> > const uint64_t orig_start = start;
> > const uint64_t orig_length = length;
> > uint64_t temp;
> > uint32_t pdpe;
> >- uint32_t pdpes = I915_PDPES_PER_PDP(ppgtt->base.dev);
> >+ uint32_t pdpes = I915_PDPES_PER_PDP(dev);
> > int ret;
> >
> > /* Wrap is never okay since we can only represent 48b, and we don't
> >@@ -967,7 +972,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> > if (WARN_ON(start + length < start))
> > return -ENODEV;
> >
> >- if (WARN_ON(start + length > ppgtt->base.total))
> >+ if (WARN_ON(start + length > vm->total))
> > return -ENODEV;
> >
> > ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables, pdpes);
> >@@ -975,16 +980,16 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> > return ret;
> >
> > /* Do the allocations first so we can easily bail out */
> >- ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
> >- new_page_dirs);
> >+ ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length,
> >+ new_page_dirs);
> > if (ret) {
> > free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
> > return ret;
> > }
> >
> > /* For every page directory referenced, allocate page tables */
> >- gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
> >- ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
> >+ gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
> >+ ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length,
> > new_page_tables[pdpe]);
> > if (ret)
> > goto err_out;
> >@@ -995,7 +1000,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> >
> > /* Allocations have completed successfully, so set the bitmaps, and do
> > * the mappings. */
> >- gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
> >+ gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
> > gen8_pde_t *const page_directory = kmap_px(pd);
> > struct i915_page_table *pt;
> > uint64_t pd_len = length;
> >@@ -1028,8 +1033,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> > }
> >
> > kunmap_px(ppgtt, page_directory);
> >-
> >- __set_bit(pdpe, ppgtt->pdp.used_pdpes);
> >+ __set_bit(pdpe, pdp->used_pdpes);
> > }
> >
> > free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
> >@@ -1039,11 +1043,11 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
> > err_out:
> > while (pdpe--) {
> > for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES)
> >- free_pt(vm->dev, ppgtt->pdp.page_directory[pdpe]->page_table[temp]);
> >+ free_pt(dev, pdp->page_directory[pdpe]->page_table[temp]);
> > }
> >
> > for_each_set_bit(pdpe, new_page_dirs, pdpes)
> >- free_pd(vm->dev, ppgtt->pdp.page_directory[pdpe]);
> >+ free_pd(dev, pdp->page_directory[pdpe]);
> >
> > free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
> > mark_tlbs_dirty(ppgtt);
> >
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v9 10/19] drm/i915/gen8: Add 4 level support in insert_entries and clear_range
2015-08-03 8:53 ` [PATCH v9 " Michel Thierry
2015-08-03 9:23 ` Goel, Akash
@ 2015-08-05 15:46 ` Daniel Vetter
2015-08-05 16:13 ` Michel Thierry
1 sibling, 1 reply; 82+ messages in thread
From: Daniel Vetter @ 2015-08-05 15:46 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx, Akash Goel
On Mon, Aug 03, 2015 at 09:53:27AM +0100, Michel Thierry wrote:
> When 48b is enabled, gen8_ppgtt_insert_entries needs to read the Page Map
> Level 4 (PML4), before it selects which Page Directory Pointer (PDP)
> it will write to.
>
> Similarly, gen8_ppgtt_clear_range needs to get the correct PDP/PD range.
>
> This patch was inspired by Ben's "Depend exclusively on map and
> unmap_vma".
>
> v2: Rebase after s/page_tables/page_table/.
> v3: Remove unnecessary pdpe loop in gen8_ppgtt_clear_range_4lvl and use
> clamp_pdp in gen8_ppgtt_insert_entries (Akash).
> v4: Merge gen8_ppgtt_clear_range_4lvl into gen8_ppgtt_clear_range to
> maintain symmetry with gen8_ppgtt_insert_entries (Akash).
> v5: Do not mix pages and bytes in insert_entries (Akash).
> v6: Prevent overflow in sg_nents << PAGE_SHIFT, when inserting 4GB at
> once.
> v7: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
> Use gen8_px_index functions, and remove unnecessary number of pages
> parameter in insert_pte_entries.
> v8: Change gen8_ppgtt_clear_pte_range to stop at PDP boundary, instead of
> adding and extra clamp function; remove unnecessary pdp_start/pdp_len
> variables (Akash).
> v9: pages->orig_nents instead of sg_nents(pages->sgl) to get the
> length (Akash).
> v10: Remove pdp warning check ingen8_ppgtt_insert_pte_entries until this
> commit (Akash).
>
> Reviewed-by: Akash Goel <akash.goel@intel.com> (v9)
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 52 +++++++++++++++++++++++++------------
> 1 file changed, 36 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 31fc672..d5ae5de 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -681,9 +681,9 @@ static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> gen8_pte_t *pt_vaddr;
> - unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> - unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> - unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> + unsigned pdpe = gen8_pdpe_index(start);
> + unsigned pde = gen8_pde_index(start);
> + unsigned pte = gen8_pte_index(start);
> unsigned num_entries = length >> PAGE_SHIFT;
> unsigned last_pte, i;
>
> @@ -722,7 +722,8 @@ static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
>
> pte = 0;
> if (++pde == I915_PDES) {
> - pdpe++;
> + if (++pdpe == I915_PDPES_PER_PDP(vm->dev))
> + break;
> pde = 0;
> }
> }
> @@ -735,12 +736,21 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
> {
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> - struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> -
> gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
> I915_CACHE_LLC, use_scratch);
>
> - gen8_ppgtt_clear_pte_range(vm, pdp, start, length, scratch_pte);
> + if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
Hm, this isn't pretty, and looking through earlier patches you have a lot
of if (48BIT) functions right at the topmost level where we have vfuncs,
e.g. gen8_ppgtt_cleanup. Imo much better to just do a
gen8_legacy_ppgtt_cleanup and gen8_4lvl_ppgtt_cleanup. Yeah means we
duplicate the call to free_scracth but really that's meh - we committed to
that abstraction so let's use it.
But reworking all the patches to get rid of all the 48bit ifs and
exploiting the vfunc abstraction we have will be a bit of work, so I'll
keep merging and sign you up for that follow-up task. The usual design
when we have vfuncs should be:
- do per-platform vfuncs everywhere you need a split
- for functionality shared between different vfuncs extract common helper
functions and call them from both places.
E.g. for this case here I think we need a new 4lvl insert_entries
function which calls the existing one in a loop, and a 3lvl inser_entries
function which calls the existing one for the single legacy pdp. Make 2
copies of this and pull out the if to the top-level of each, then
simplify.
If we have abstraction in the form of vfuncs _and_ pile in lots of ifs at
low level then we pay both the price for the abstraction and the price for
tightly nit code, i.e. both downsides without an upside.
Anway, I expect follow-up work here ;-)
Thanks, Daniel
> + gen8_ppgtt_clear_pte_range(vm, &ppgtt->pdp, start, length,
> + scratch_pte);
> + } else {
> + uint64_t templ4, pml4e;
> + struct i915_page_directory_pointer *pdp;
> +
> + gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
> + gen8_ppgtt_clear_pte_range(vm, pdp, start, length,
> + scratch_pte);
> + }
> + }
> }
>
> static void
> @@ -753,16 +763,13 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> gen8_pte_t *pt_vaddr;
> - unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
> - unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
> - unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
> + unsigned pdpe = gen8_pdpe_index(start);
> + unsigned pde = gen8_pde_index(start);
> + unsigned pte = gen8_pte_index(start);
>
> pt_vaddr = NULL;
>
> while (__sg_page_iter_next(sg_iter)) {
> - if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
> - break;
> -
> if (pt_vaddr == NULL) {
> struct i915_page_directory *pd = pdp->page_directory[pdpe];
> struct i915_page_table *pt = pd->page_table[pde];
> @@ -776,7 +783,8 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
> kunmap_px(ppgtt, pt_vaddr);
> pt_vaddr = NULL;
> if (++pde == I915_PDES) {
> - pdpe++;
> + if (++pdpe == I915_PDPES_PER_PDP(vm->dev))
> + break;
> pde = 0;
> }
> pte = 0;
> @@ -795,11 +803,23 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> {
> struct i915_hw_ppgtt *ppgtt =
> container_of(vm, struct i915_hw_ppgtt, base);
> - struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> struct sg_page_iter sg_iter;
>
> __sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
> - gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter, start, cache_level);
> +
> + if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
> + gen8_ppgtt_insert_pte_entries(vm, &ppgtt->pdp, &sg_iter, start,
> + cache_level);
> + } else {
> + struct i915_page_directory_pointer *pdp;
> + uint64_t templ4, pml4e;
> + uint64_t length = (uint64_t)pages->orig_nents << PAGE_SHIFT;
> +
> + gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
> + gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter,
> + start, cache_level);
> + }
> + }
> }
>
> static void gen8_free_page_tables(struct drm_device *dev,
> --
> 2.5.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 02/19] drm/i915/gen8: Make pdp allocation more dynamic
2015-08-05 15:31 ` Daniel Vetter
@ 2015-08-05 15:49 ` Michel Thierry
2015-08-05 15:51 ` Michel Thierry
2015-08-06 12:28 ` Daniel Vetter
0 siblings, 2 replies; 82+ messages in thread
From: Michel Thierry @ 2015-08-05 15:49 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx, akash.goel
On 8/5/2015 4:31 PM, Daniel Vetter wrote:
> On Wed, Jul 29, 2015 at 05:23:46PM +0100, Michel Thierry wrote:
>> This transitional patch doesn't do much for the existing code. However,
>> it should make upcoming patches to use the full 48b address space a bit
>> easier.
>
> commit message should also mention how exactly it's more dynamic and why
> exactly that's useful ... It's ofc possible to infer that from the
> context, but that won't be the case any more if you look at the patch
> alone (with git blame or after a bisect). Please follow up with a few
> words so I can add them to the commit message.
> -Daniel
>
Hi Daniel,
Agree the description is vague. Here's the updated commit message, let
me know what you think (and if you want a new patch):
drm/i915/gen8: Make pdp allocation more dynamic
This transitional patch doesn't do much for the existing code. However,
it should make upcoming patches to use the full 48b address space a bit
easier.
32-bit ppgtt uses just 4 PDPs, while 48-bit ppgtt will have up-to 512;
this patch prepares the existing functions to query the right number of
pdps at run-time. This also means that used_pdpes should also be
allocated during ppgtt_init, as the bitmap size will depend on the ppgtt
address range selected.
v2: Renamed pdp_free to be similar to pd/pt (unmap_and_free_pdp).
v3: To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.
v4: Rebase after s/page_tables/page_table/, added extra information
about 4-level page table formats and use IS_ENABLED macro.
v5: Check CONFIG_X86_64 instead of CONFIG_64BIT.
v6: Rebase after Mika's ppgtt cleanup / scratch merge patch series, and
follow
his nomenclature in pdp functions (there is no alloc_pdp yet).
v7: Rebase after merged version of Mika's ppgtt cleanup patch series.
v8: Rebase after final merged version of Mika's ppgtt/scratch patches.
v9: Introduce PML4 (and 48-bit checks) until next patch (Akash).
v10: Also use test_bit to detect when pd/pt are already allocated (Akash)
v11:
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 02/19] drm/i915/gen8: Make pdp allocation more dynamic
2015-08-05 15:49 ` Michel Thierry
@ 2015-08-05 15:51 ` Michel Thierry
2015-08-06 12:28 ` Daniel Vetter
1 sibling, 0 replies; 82+ messages in thread
From: Michel Thierry @ 2015-08-05 15:51 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx, akash.goel
On 8/5/2015 4:49 PM, Michel Thierry wrote:
> On 8/5/2015 4:31 PM, Daniel Vetter wrote:
>> On Wed, Jul 29, 2015 at 05:23:46PM +0100, Michel Thierry wrote:
> v8: Rebase after final merged version of Mika's ppgtt/scratch patches.
> v9: Introduce PML4 (and 48-bit checks) until next patch (Akash).
> v10: Also use test_bit to detect when pd/pt are already allocated (Akash)
> v11:
Press _sent_ too fast,
v11: Expand commit message (Daniel).
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 17/19] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
2015-07-29 16:24 ` [PATCH v6 17/19] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
2015-07-30 5:39 ` Goel, Akash
@ 2015-08-05 15:58 ` Daniel Vetter
2015-08-05 16:14 ` Michel Thierry
1 sibling, 1 reply; 82+ messages in thread
From: Daniel Vetter @ 2015-08-05 15:58 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx, akash.goel
On Wed, Jul 29, 2015 at 05:24:01PM +0100, Michel Thierry wrote:
> There are some allocations that must be only referenced by 32-bit
> offsets. To limit the chances of having the first 4GB already full,
> objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
> DRM_MM_CREATE_TOP flags
>
> In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
> General State Heap (GSH) or Instruction State Heap (ISH) must be in a
> 32-bit range, because the General State Offset and Instruction State
> Offset are limited to 32-bits.
>
> Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
> they can be allocated above the 32-bit address range. To limit the
> chances of having the first 4GB already full, objects will use
> DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.
>
> v2: Changed flag logic from neeeds_32b, to supports_48b.
> v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
> v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
> to use last PIN_ defined instead of hard-coded value; use correct limit
> check in eb_vma_misplaced. (Chris)
> v5: Don't touch PIN_OFFSET_MASK and update workaround comment (Chris)
> v6: Apply pin-high for ggtt too (Chris)
> v7: Handle simultaneous pin-high and pin-mappable end correctly (Akash)
> Fix check for entries currently using +4GB addresses, use min_t and
> other polish in object_bind_to_vm (Chris)
>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Akash Goel <akash.goel@intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
For the record, where can I find the mesa patches for this? I think for
simple things like this a References: line point to the relevant UMD
patches in mailing-list archives would be great.
-Daniel
> ---
> drivers/gpu/drm/i915/i915_drv.h | 2 ++
> drivers/gpu/drm/i915/i915_gem.c | 25 +++++++++++++++++++------
> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 +++++++++++++
> include/uapi/drm/i915_drm.h | 3 ++-
> 4 files changed, 36 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index ed2fbcd..c344805 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2775,6 +2775,8 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
> #define PIN_OFFSET_BIAS (1<<3)
> #define PIN_USER (1<<4)
> #define PIN_UPDATE (1<<5)
> +#define PIN_ZONE_4G (1<<6)
> +#define PIN_HIGH (1<<7)
> #define PIN_OFFSET_MASK (~4095)
> int __must_check
> i915_gem_object_pin(struct drm_i915_gem_object *obj,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 80f5d97..e1ca63f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3349,11 +3349,9 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
> struct drm_device *dev = obj->base.dev;
> struct drm_i915_private *dev_priv = dev->dev_private;
> u32 fence_alignment, unfenced_alignment;
> + u32 search_flag, alloc_flag;
> + u64 start, end;
> u64 size, fence_size;
> - u64 start =
> - flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
> - u64 end =
> - flags & PIN_MAPPABLE ? dev_priv->gtt.mappable_end : vm->total;
> struct i915_vma *vma;
> int ret;
>
> @@ -3393,6 +3391,13 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
> size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
> }
>
> + start = flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
> + end = vm->total;
> + if (flags & PIN_MAPPABLE)
> + end = min_t(u64, end, dev_priv->gtt.mappable_end);
> + if (flags & PIN_ZONE_4G)
> + end = min_t(u64, end, (1ULL << 32));
> +
> if (alignment == 0)
> alignment = flags & PIN_MAPPABLE ? fence_alignment :
> unfenced_alignment;
> @@ -3428,13 +3433,21 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
> if (IS_ERR(vma))
> goto err_unpin;
>
> + if (flags & PIN_HIGH) {
> + search_flag = DRM_MM_SEARCH_BELOW;
> + alloc_flag = DRM_MM_CREATE_TOP;
> + } else {
> + search_flag = DRM_MM_SEARCH_DEFAULT;
> + alloc_flag = DRM_MM_CREATE_DEFAULT;
> + }
> +
> search_free:
> ret = drm_mm_insert_node_in_range_generic(&vm->mm, &vma->node,
> size, alignment,
> obj->cache_level,
> start, end,
> - DRM_MM_SEARCH_DEFAULT,
> - DRM_MM_CREATE_DEFAULT);
> + search_flag,
> + alloc_flag);
> if (ret) {
> ret = i915_gem_evict_something(dev, vm, size, alignment,
> obj->cache_level,
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 923a3c4..78fc881 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -589,11 +589,20 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
> if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
> flags |= PIN_GLOBAL;
>
> + /* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
> + * limit address to the first 4GBs for unflagged objects.
> + */
> + flags |= PIN_ZONE_4G;
> + if (entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS)
> + flags &= ~PIN_ZONE_4G;
> +
> if (!drm_mm_node_allocated(&vma->node)) {
> if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
> flags |= PIN_GLOBAL | PIN_MAPPABLE;
> if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
> flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
> + if ((flags & PIN_MAPPABLE) == 0)
> + flags |= PIN_HIGH;
> }
>
> ret = i915_gem_object_pin(obj, vma->vm, entry->alignment, flags);
> @@ -671,6 +680,10 @@ eb_vma_misplaced(struct i915_vma *vma)
> if (entry->flags & __EXEC_OBJECT_NEEDS_MAP && !obj->map_and_fenceable)
> return !only_mappable_for_reloc(entry->flags);
>
> + if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) == 0 &&
> + (vma->node.start + vma->node.size - 1) >> 32)
> + return true;
> +
> return false;
> }
>
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index dbd16a2..08e047c 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -690,7 +690,8 @@ struct drm_i915_gem_exec_object2 {
> #define EXEC_OBJECT_NEEDS_FENCE (1<<0)
> #define EXEC_OBJECT_NEEDS_GTT (1<<1)
> #define EXEC_OBJECT_WRITE (1<<2)
> -#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_WRITE<<1)
> +#define EXEC_OBJECT_SUPPORTS_48B_ADDRESS (1<<3)
> +#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_SUPPORTS_48B_ADDRESS<<1)
> __u64 flags;
>
> __u64 rsvd1;
> --
> 2.4.5
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 15/19] drm/i915: batch_obj vm offset must be u64
2015-07-29 16:23 ` [PATCH v6 15/19] drm/i915: batch_obj vm offset must " Michel Thierry
2015-07-30 5:23 ` Goel, Akash
@ 2015-08-05 16:01 ` Daniel Vetter
2015-08-05 16:14 ` Michel Thierry
1 sibling, 1 reply; 82+ messages in thread
From: Daniel Vetter @ 2015-08-05 16:01 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx, akash.goel
On Wed, Jul 29, 2015 at 05:23:59PM +0100, Michel Thierry wrote:
> Otherwise it can overflow in 48-bit mode, and cause an incorrect
> exec_start.
>
> Before commit 5f19e2bffa63a91cd4ac1adcec648e14a44277ce ("drm/i915: Merged
> the many do_execbuf() parameters into a structure"), it was already an u64.
>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
So we have a few more u64, but the i915_gem_obj_offset is still unsigned
long. Am I missing a patch?
-Daniel
> ---
> drivers/gpu/drm/i915/i915_drv.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 33926d9..ed2fbcd 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1674,7 +1674,7 @@ struct i915_execbuffer_params {
> struct drm_file *file;
> uint32_t dispatch_flags;
> uint32_t args_batch_start_offset;
> - uint32_t batch_obj_vm_offset;
> + uint64_t batch_obj_vm_offset;
> struct intel_engine_cs *ring;
> struct drm_i915_gem_object *batch_obj;
> struct intel_context *ctx;
> --
> 2.4.5
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v9 10/19] drm/i915/gen8: Add 4 level support in insert_entries and clear_range
2015-08-05 15:46 ` Daniel Vetter
@ 2015-08-05 16:13 ` Michel Thierry
0 siblings, 0 replies; 82+ messages in thread
From: Michel Thierry @ 2015-08-05 16:13 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx, Akash Goel
On 8/5/2015 4:46 PM, Daniel Vetter wrote:
> On Mon, Aug 03, 2015 at 09:53:27AM +0100, Michel Thierry wrote:
>> @@ -735,12 +736,21 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
>> {
>> struct i915_hw_ppgtt *ppgtt =
>> container_of(vm, struct i915_hw_ppgtt, base);
>> - struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
>> -
>> gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
>> I915_CACHE_LLC, use_scratch);
>>
>> - gen8_ppgtt_clear_pte_range(vm, pdp, start, length, scratch_pte);
>> + if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
>
> Hm, this isn't pretty, and looking through earlier patches you have a lot
> of if (48BIT) functions right at the topmost level where we have vfuncs,
> e.g. gen8_ppgtt_cleanup. Imo much better to just do a
> gen8_legacy_ppgtt_cleanup and gen8_4lvl_ppgtt_cleanup. Yeah means we
> duplicate the call to free_scracth but really that's meh - we committed to
> that abstraction so let's use it.
>
> But reworking all the patches to get rid of all the 48bit ifs and
> exploiting the vfunc abstraction we have will be a bit of work, so I'll
> keep merging and sign you up for that follow-up task. The usual design
> when we have vfuncs should be:
> - do per-platform vfuncs everywhere you need a split
> - for functionality shared between different vfuncs extract common helper
> functions and call them from both places.
>
> E.g. for this case here I think we need a new 4lvl insert_entries
> function which calls the existing one in a loop, and a 3lvl inser_entries
> function which calls the existing one for the single legacy pdp. Make 2
> copies of this and pull out the if to the top-level of each, then
> simplify.
>
> If we have abstraction in the form of vfuncs _and_ pile in lots of ifs at
> low level then we pay both the price for the abstraction and the price for
> tightly nit code, i.e. both downsides without an upside.
>
> Anway, I expect follow-up work here ;-)
>
> Thanks, Daniel
>
Yes, all the main functions (alloc, clear, cleanup, dump, insert) have
if (USES_FULL_48BIT_PPGTT)
pml4 function
else
pdp function
I'll make a patch to set the correct ones as vfuncs in gen8_ppgtt_init.
-Michel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 15/19] drm/i915: batch_obj vm offset must be u64
2015-08-05 16:01 ` Daniel Vetter
@ 2015-08-05 16:14 ` Michel Thierry
2015-08-06 12:30 ` Daniel Vetter
0 siblings, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-08-05 16:14 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx, akash.goel
On 8/5/2015 5:01 PM, Daniel Vetter wrote:
> On Wed, Jul 29, 2015 at 05:23:59PM +0100, Michel Thierry wrote:
>> Otherwise it can overflow in 48-bit mode, and cause an incorrect
>> exec_start.
>>
>> Before commit 5f19e2bffa63a91cd4ac1adcec648e14a44277ce ("drm/i915: Merged
>> the many do_execbuf() parameters into a structure"), it was already an u64.
>>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>
> So we have a few more u64, but the i915_gem_obj_offset is still unsigned
> long. Am I missing a patch?
http://news.gmane.org/find-root.php?message_id=1437063498-31930-1-git-send-email-michel.thierry@intel.com
Which I need to re-send with the comments I got.
Thanks for remind me.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 17/19] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
2015-08-05 15:58 ` Daniel Vetter
@ 2015-08-05 16:14 ` Michel Thierry
2015-08-06 12:47 ` Daniel Vetter
0 siblings, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-08-05 16:14 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx, akash.goel
On 8/5/2015 4:58 PM, Daniel Vetter wrote:
> On Wed, Jul 29, 2015 at 05:24:01PM +0100, Michel Thierry wrote:
>> There are some allocations that must be only referenced by 32-bit
>> offsets. To limit the chances of having the first 4GB already full,
>> objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
>> DRM_MM_CREATE_TOP flags
>>
>> In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
>> General State Heap (GSH) or Instruction State Heap (ISH) must be in a
>> 32-bit range, because the General State Offset and Instruction State
>> Offset are limited to 32-bits.
>>
>> Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
>> they can be allocated above the 32-bit address range. To limit the
>> chances of having the first 4GB already full, objects will use
>> DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.
>>
>> v2: Changed flag logic from neeeds_32b, to supports_48b.
>> v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
>> v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
>> to use last PIN_ defined instead of hard-coded value; use correct limit
>> check in eb_vma_misplaced. (Chris)
>> v5: Don't touch PIN_OFFSET_MASK and update workaround comment (Chris)
>> v6: Apply pin-high for ggtt too (Chris)
>> v7: Handle simultaneous pin-high and pin-mappable end correctly (Akash)
>> Fix check for entries currently using +4GB addresses, use min_t and
>> other polish in object_bind_to_vm (Chris)
>>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Akash Goel <akash.goel@intel.com>
>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>
> For the record, where can I find the mesa patches for this? I think for
> simple things like this a References: line point to the relevant UMD
> patches in mailing-list archives would be great.
> -Daniel
>
Here they are,
References:
http://lists.freedesktop.org/archives/dri-devel/2015-July/085501.html
and http://lists.freedesktop.org/archives/mesa-dev/2015-July/088003.html
The name for the macro will be OUT_RELOC64_INSIDE_4G, as suggested by Chris.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 02/19] drm/i915/gen8: Make pdp allocation more dynamic
2015-08-05 15:49 ` Michel Thierry
2015-08-05 15:51 ` Michel Thierry
@ 2015-08-06 12:28 ` Daniel Vetter
1 sibling, 0 replies; 82+ messages in thread
From: Daniel Vetter @ 2015-08-06 12:28 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx, akash.goel
On Wed, Aug 05, 2015 at 04:49:17PM +0100, Michel Thierry wrote:
> On 8/5/2015 4:31 PM, Daniel Vetter wrote:
> >On Wed, Jul 29, 2015 at 05:23:46PM +0100, Michel Thierry wrote:
> >>This transitional patch doesn't do much for the existing code. However,
> >>it should make upcoming patches to use the full 48b address space a bit
> >>easier.
> >
> >commit message should also mention how exactly it's more dynamic and why
> >exactly that's useful ... It's ofc possible to infer that from the
> >context, but that won't be the case any more if you look at the patch
> >alone (with git blame or after a bisect). Please follow up with a few
> >words so I can add them to the commit message.
> >-Daniel
> >
>
> Hi Daniel,
> Agree the description is vague. Here's the updated commit message, let me
> know what you think (and if you want a new patch):
>
> drm/i915/gen8: Make pdp allocation more dynamic
>
> This transitional patch doesn't do much for the existing code. However,
> it should make upcoming patches to use the full 48b address space a bit
> easier.
>
> 32-bit ppgtt uses just 4 PDPs, while 48-bit ppgtt will have up-to 512;
> this patch prepares the existing functions to query the right number of pdps
> at run-time. This also means that used_pdpes should also be allocated during
> ppgtt_init, as the bitmap size will depend on the ppgtt address range
> selected.
Existing patch amended, thanks.
-Daniel
>
> v2: Renamed pdp_free to be similar to pd/pt (unmap_and_free_pdp).
> v3: To facilitate testing, 48b mode will be available on Broadwell and
> GEN9+, when i915.enable_ppgtt = 3.
> v4: Rebase after s/page_tables/page_table/, added extra information
> about 4-level page table formats and use IS_ENABLED macro.
> v5: Check CONFIG_X86_64 instead of CONFIG_64BIT.
> v6: Rebase after Mika's ppgtt cleanup / scratch merge patch series, and
> follow
> his nomenclature in pdp functions (there is no alloc_pdp yet).
> v7: Rebase after merged version of Mika's ppgtt cleanup patch series.
> v8: Rebase after final merged version of Mika's ppgtt/scratch patches.
> v9: Introduce PML4 (and 48-bit checks) until next patch (Akash).
> v10: Also use test_bit to detect when pd/pt are already allocated (Akash)
> v11:
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 15/19] drm/i915: batch_obj vm offset must be u64
2015-08-05 16:14 ` Michel Thierry
@ 2015-08-06 12:30 ` Daniel Vetter
0 siblings, 0 replies; 82+ messages in thread
From: Daniel Vetter @ 2015-08-06 12:30 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx, akash.goel
On Wed, Aug 05, 2015 at 05:14:03PM +0100, Michel Thierry wrote:
> On 8/5/2015 5:01 PM, Daniel Vetter wrote:
> >On Wed, Jul 29, 2015 at 05:23:59PM +0100, Michel Thierry wrote:
> >>Otherwise it can overflow in 48-bit mode, and cause an incorrect
> >>exec_start.
> >>
> >>Before commit 5f19e2bffa63a91cd4ac1adcec648e14a44277ce ("drm/i915: Merged
> >>the many do_execbuf() parameters into a structure"), it was already an u64.
> >>
> >>Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> >
> >So we have a few more u64, but the i915_gem_obj_offset is still unsigned
> >long. Am I missing a patch?
>
> http://news.gmane.org/find-root.php?message_id=1437063498-31930-1-git-send-email-michel.thierry@intel.com
>
> Which I need to re-send with the comments I got.
> Thanks for remind me.
Process reminder: If your patch series has depencies either
- include them at the start (git will correctly keep authorship), which is
the preferred approach
- or at least mention your depencies in the cover letter
Relying on your maintainer's mind-reader to figure this out doesn't scale ;-)
Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 17/19] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
2015-08-05 16:14 ` Michel Thierry
@ 2015-08-06 12:47 ` Daniel Vetter
2015-08-06 16:27 ` Michel Thierry
0 siblings, 1 reply; 82+ messages in thread
From: Daniel Vetter @ 2015-08-06 12:47 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx, akash.goel
On Wed, Aug 05, 2015 at 05:14:33PM +0100, Michel Thierry wrote:
> On 8/5/2015 4:58 PM, Daniel Vetter wrote:
> >On Wed, Jul 29, 2015 at 05:24:01PM +0100, Michel Thierry wrote:
> >>There are some allocations that must be only referenced by 32-bit
> >>offsets. To limit the chances of having the first 4GB already full,
> >>objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
> >>DRM_MM_CREATE_TOP flags
> >>
> >>In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
> >>General State Heap (GSH) or Instruction State Heap (ISH) must be in a
> >>32-bit range, because the General State Offset and Instruction State
> >>Offset are limited to 32-bits.
> >>
> >>Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
> >>they can be allocated above the 32-bit address range. To limit the
> >>chances of having the first 4GB already full, objects will use
> >>DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.
> >>
> >>v2: Changed flag logic from neeeds_32b, to supports_48b.
> >>v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
> >>v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
> >>to use last PIN_ defined instead of hard-coded value; use correct limit
> >>check in eb_vma_misplaced. (Chris)
> >>v5: Don't touch PIN_OFFSET_MASK and update workaround comment (Chris)
> >>v6: Apply pin-high for ggtt too (Chris)
> >>v7: Handle simultaneous pin-high and pin-mappable end correctly (Akash)
> >> Fix check for entries currently using +4GB addresses, use min_t and
> >> other polish in object_bind_to_vm (Chris)
> >>
> >>Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >>Cc: Akash Goel <akash.goel@intel.com>
> >>Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
> >>Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> >
> >For the record, where can I find the mesa patches for this? I think for
> >simple things like this a References: line point to the relevant UMD
> >patches in mailing-list archives would be great.
> >-Daniel
> >
>
> Here they are,
>
> References:
> http://lists.freedesktop.org/archives/dri-devel/2015-July/085501.html and
> http://lists.freedesktop.org/archives/mesa-dev/2015-July/088003.html
Sounds like there's still another revision we need to do on those?
-Daniel
>
> The name for the macro will be OUT_RELOC64_INSIDE_4G, as suggested by Chris.
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 17/19] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
2015-08-06 12:47 ` Daniel Vetter
@ 2015-08-06 16:27 ` Michel Thierry
2015-08-07 7:55 ` Daniel Vetter
0 siblings, 1 reply; 82+ messages in thread
From: Michel Thierry @ 2015-08-06 16:27 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx, akash.goel
On 8/6/2015 1:47 PM, Daniel Vetter wrote:
> On Wed, Aug 05, 2015 at 05:14:33PM +0100, Michel Thierry wrote:
>> On 8/5/2015 4:58 PM, Daniel Vetter wrote:
>>> On Wed, Jul 29, 2015 at 05:24:01PM +0100, Michel Thierry wrote:
>>>> There are some allocations that must be only referenced by 32-bit
>>>> offsets. To limit the chances of having the first 4GB already full,
>>>> objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
>>>> DRM_MM_CREATE_TOP flags
>>>>
>>>> In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
>>>> General State Heap (GSH) or Instruction State Heap (ISH) must be in a
>>>> 32-bit range, because the General State Offset and Instruction State
>>>> Offset are limited to 32-bits.
>>>>
>>>> Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
>>>> they can be allocated above the 32-bit address range. To limit the
>>>> chances of having the first 4GB already full, objects will use
>>>> DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.
>>>>
>>>> v2: Changed flag logic from neeeds_32b, to supports_48b.
>>>> v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
>>>> v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
>>>> to use last PIN_ defined instead of hard-coded value; use correct limit
>>>> check in eb_vma_misplaced. (Chris)
>>>> v5: Don't touch PIN_OFFSET_MASK and update workaround comment (Chris)
>>>> v6: Apply pin-high for ggtt too (Chris)
>>>> v7: Handle simultaneous pin-high and pin-mappable end correctly (Akash)
>>>> Fix check for entries currently using +4GB addresses, use min_t and
>>>> other polish in object_bind_to_vm (Chris)
>>>>
>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Akash Goel <akash.goel@intel.com>
>>>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
>>>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>>>
>>> For the record, where can I find the mesa patches for this? I think for
>>> simple things like this a References: line point to the relevant UMD
>>> patches in mailing-list archives would be great.
>>> -Daniel
>>>
>>
>> Here they are,
>>
>> References:
>> http://lists.freedesktop.org/archives/dri-devel/2015-July/085501.html and
>> http://lists.freedesktop.org/archives/mesa-dev/2015-July/088003.html
>
> Sounds like there's still another revision we need to do on those?
Yes, a couple of changes, set/clear functions internal in libdrm and
update the symbol-check test.
I put it on hold, because I was also asked to not include the libdrm
changes until the updated kernel header
(EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag) was merged.
Then I also need to create a libdrm release, and update mesa's
dependency to this new version number.
-Michel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 17/19] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
2015-08-06 16:27 ` Michel Thierry
@ 2015-08-07 7:55 ` Daniel Vetter
0 siblings, 0 replies; 82+ messages in thread
From: Daniel Vetter @ 2015-08-07 7:55 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx, akash.goel
On Thu, Aug 06, 2015 at 05:27:38PM +0100, Michel Thierry wrote:
> On 8/6/2015 1:47 PM, Daniel Vetter wrote:
> >On Wed, Aug 05, 2015 at 05:14:33PM +0100, Michel Thierry wrote:
> >>On 8/5/2015 4:58 PM, Daniel Vetter wrote:
> >>>On Wed, Jul 29, 2015 at 05:24:01PM +0100, Michel Thierry wrote:
> >>>>There are some allocations that must be only referenced by 32-bit
> >>>>offsets. To limit the chances of having the first 4GB already full,
> >>>>objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
> >>>>DRM_MM_CREATE_TOP flags
> >>>>
> >>>>In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
> >>>>General State Heap (GSH) or Instruction State Heap (ISH) must be in a
> >>>>32-bit range, because the General State Offset and Instruction State
> >>>>Offset are limited to 32-bits.
> >>>>
> >>>>Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
> >>>>they can be allocated above the 32-bit address range. To limit the
> >>>>chances of having the first 4GB already full, objects will use
> >>>>DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.
> >>>>
> >>>>v2: Changed flag logic from neeeds_32b, to supports_48b.
> >>>>v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
> >>>>v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
> >>>>to use last PIN_ defined instead of hard-coded value; use correct limit
> >>>>check in eb_vma_misplaced. (Chris)
> >>>>v5: Don't touch PIN_OFFSET_MASK and update workaround comment (Chris)
> >>>>v6: Apply pin-high for ggtt too (Chris)
> >>>>v7: Handle simultaneous pin-high and pin-mappable end correctly (Akash)
> >>>> Fix check for entries currently using +4GB addresses, use min_t and
> >>>> other polish in object_bind_to_vm (Chris)
> >>>>
> >>>>Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>Cc: Akash Goel <akash.goel@intel.com>
> >>>>Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
> >>>>Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> >>>
> >>>For the record, where can I find the mesa patches for this? I think for
> >>>simple things like this a References: line point to the relevant UMD
> >>>patches in mailing-list archives would be great.
> >>>-Daniel
> >>>
> >>
> >>Here they are,
> >>
> >>References:
> >>http://lists.freedesktop.org/archives/dri-devel/2015-July/085501.html and
> >>http://lists.freedesktop.org/archives/mesa-dev/2015-July/088003.html
> >
> >Sounds like there's still another revision we need to do on those?
>
> Yes, a couple of changes, set/clear functions internal in libdrm and update
> the symbol-check test.
>
> I put it on hold, because I was also asked to not include the libdrm changes
> until the updated kernel header (EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag) was
> merged.
>
> Then I also need to create a libdrm release, and update mesa's dependency to
> this new version number.
Nope, we need everything before I can pull in the kernel patch. Once that
happens then you can do the release/depency dance (of course don't include
those bits in your proposed patches yet).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 82+ messages in thread
end of thread, other threads:[~2015-08-07 7:55 UTC | newest]
Thread overview: 82+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-29 16:23 [PATCH v6 00/19] 48-bit PPGTT Michel Thierry
2015-07-29 16:23 ` [PATCH v6 01/19] drm/i915: Remove unnecessary gen8_clamp_pd Michel Thierry
2015-07-30 3:06 ` Goel, Akash
2015-07-29 16:23 ` [PATCH v6 02/19] drm/i915/gen8: Make pdp allocation more dynamic Michel Thierry
2015-07-30 3:18 ` Goel, Akash
2015-08-05 15:31 ` Daniel Vetter
2015-08-05 15:49 ` Michel Thierry
2015-08-05 15:51 ` Michel Thierry
2015-08-06 12:28 ` Daniel Vetter
2015-07-29 16:23 ` [PATCH v6 03/19] drm/i915/gen8: Abstract PDP usage Michel Thierry
2015-07-30 10:02 ` [PATCH v7 " Michel Thierry
2015-07-31 4:11 ` Goel, Akash
2015-08-05 15:33 ` Daniel Vetter
2015-07-29 16:23 ` [PATCH v6 04/19] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT Michel Thierry
2015-07-30 4:46 ` Goel, Akash
2015-07-30 9:31 ` Michel Thierry
2015-07-30 10:02 ` [PATCH v7 " Michel Thierry
2015-07-31 4:00 ` Goel, Akash
2015-07-29 16:23 ` [PATCH v6 05/19] drm/i915/gen8: Add dynamic page trace events Michel Thierry
2015-07-30 3:48 ` Goel, Akash
2015-07-29 16:23 ` [PATCH v6 06/19] drm/i915/gen8: Add PML4 structure Michel Thierry
2015-07-30 4:01 ` Goel, Akash
2015-07-30 9:31 ` Michel Thierry
2015-07-30 10:04 ` [PATCH v7 " Michel Thierry
2015-07-31 4:35 ` Goel, Akash
2015-07-31 12:12 ` [PATCH v8 " Michel Thierry
2015-07-31 17:35 ` Goel, Akash
2015-08-03 8:34 ` Michel Thierry
2015-08-03 8:52 ` [PATCH v9 " Michel Thierry
2015-08-03 9:20 ` Goel, Akash
2015-07-29 16:23 ` [PATCH v6 07/19] drm/i915/gen8: implement alloc/free for 4lvl Michel Thierry
2015-07-30 10:05 ` [PATCH v7 " Michel Thierry
2015-07-31 4:20 ` Goel, Akash
2015-07-29 16:23 ` [PATCH v6 08/19] drm/i915/gen8: Add 4 level switching infrastructure and lrc support Michel Thierry
2015-07-30 4:14 ` Goel, Akash
2015-07-30 9:36 ` Michel Thierry
2015-07-30 10:06 ` [PATCH v7 " Michel Thierry
2015-07-31 4:23 ` Goel, Akash
2015-07-29 16:23 ` [PATCH v6 09/19] drm/i915/gen8: Pass sg_iter through pte inserts Michel Thierry
2015-07-30 4:19 ` Goel, Akash
2015-08-03 8:52 ` [PATCH v9 " Michel Thierry
2015-07-29 16:23 ` [PATCH v6 10/19] drm/i915/gen8: Add 4 level support in insert_entries and clear_range Michel Thierry
2015-07-30 4:50 ` Goel, Akash
2015-08-03 8:53 ` [PATCH v9 " Michel Thierry
2015-08-03 9:23 ` Goel, Akash
2015-08-05 15:46 ` Daniel Vetter
2015-08-05 16:13 ` Michel Thierry
2015-07-29 16:23 ` [PATCH v6 11/19] drm/i915/gen8: Initialize PDPs and PML4 Michel Thierry
2015-07-30 4:56 ` Goel, Akash
2015-07-29 16:23 ` [PATCH v6 12/19] drm/i915: Expand error state's address width to 64b Michel Thierry
2015-07-30 5:09 ` Goel, Akash
2015-07-29 16:23 ` [PATCH v6 13/19] drm/i915/gen8: Add ppgtt info and debug_dump Michel Thierry
2015-07-30 5:20 ` Goel, Akash
2015-07-29 16:23 ` [PATCH v6 14/19] drm/i915: object size needs to be u64 Michel Thierry
2015-07-30 5:22 ` Goel, Akash
2015-07-29 16:23 ` [PATCH v6 15/19] drm/i915: batch_obj vm offset must " Michel Thierry
2015-07-30 5:23 ` Goel, Akash
2015-08-05 16:01 ` Daniel Vetter
2015-08-05 16:14 ` Michel Thierry
2015-08-06 12:30 ` Daniel Vetter
2015-07-29 16:24 ` [PATCH v6 16/19] drm/i915/userptr: Kill user_size limit check Michel Thierry
2015-07-30 5:25 ` Goel, Akash
2015-07-29 16:24 ` [PATCH v6 17/19] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
2015-07-30 5:39 ` Goel, Akash
2015-08-05 15:58 ` Daniel Vetter
2015-08-05 16:14 ` Michel Thierry
2015-08-06 12:47 ` Daniel Vetter
2015-08-06 16:27 ` Michel Thierry
2015-08-07 7:55 ` Daniel Vetter
2015-07-29 16:24 ` [PATCH v6 18/19] drm/i915/gen8: Flip the 48b switch Michel Thierry
2015-07-30 5:49 ` Goel, Akash
2015-07-30 10:09 ` [PATCH v7 " Michel Thierry
2015-07-31 12:13 ` [PATCH v8 " Michel Thierry
2015-07-31 12:19 ` Chris Wilson
2015-07-31 12:35 ` Michel Thierry
2015-07-31 17:21 ` Goel, Akash
2015-07-29 16:24 ` [PATCH v6 19/19] drm/i915: Save some page table setup on repeated binds Michel Thierry
2015-07-30 11:26 ` [PATCH v6 00/19] 48-bit PPGTT Chris Wilson
2015-07-30 11:52 ` Michel Thierry
2015-07-30 12:13 ` Chris Wilson
2015-07-30 19:02 ` Chris Wilson
2015-08-03 9:51 ` Michel Thierry
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).