* [PATCH 00/16] 48b PPGTT
@ 2015-05-26 14:21 Michel Thierry
2015-05-26 14:21 ` [PATCH 01/16] drm/i915: Remove unnecessary gen8_clamp_pd Michel Thierry
` (16 more replies)
0 siblings, 17 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
In order expand the GPU address space, a 4th level translation is added, the
Page Map Level 4 (PML4). This PML4 has 256 PML4 Entries (PML4E), PML4[0-255],
each pointing to a PDP. All the existing "dynamic alloc ppgtt" functions are
used, only adding the 4th level changes. I also updated some remaining
variables that were 32b only.
There are 2 hardware workarounds needed to allow correct operation with 48b
addresses (Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset). I added a
flag (EXEC_OBJECT_NEEDS_32BADDRESS) that will indicate if a given object must be
allocated inside the first 4 PDPs, and in order to limit the chances of having
the first 4GB already full, objects not requiring this workaround start at an
offset of this range. Another option would be to send the DRM_MM_CREATE_TOP flag.
I'm also including an igt test for this change.
This feature is only available in BDW and Gen9, and requires LRC submission
mode (execlists) and setting i915.enable_ppgtt=3.
Also note that this expanded address space is only available for full PPGTT,
aliasing PPGTT remains 32b.
Finally, Mika has sent some PPGTT clean up patches, which will conflict with
these. I'm open to rebase these patches after Mika's, or update his patches
after the 48b ones. Please let me know which option is better.
Michel Thierry (16):
drm/i915: Remove unnecessary gen8_clamp_pd
drm/i915/gen8: Make pdp allocation more dynamic
drm/i915/gen8: Abstract PDP usage
drm/i915/gen8: Add dynamic page trace events
drm/i915/gen8: implement alloc/free for 4lvl
drm/i915/gen8: Add 4 level switching infrastructure and lrc support
drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
drm/i915: Plumb sg_iter through va allocation ->maps
drm/i915/gen8: Add 4 level support in insert_entries and clear_range
drm/i915/gen8: Initialize PDPs
drm/i915: Expand error state's address width to 64b
drm/i915/gen8: Add ppgtt info and debug_dump
drm/i915: object size needs to be u64
drm/i915: Check against correct user_size limit in 48b ppgtt mode
drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
drm/i915/gen8: Flip the 48b switch
drivers/gpu/drm/i915/i915_debugfs.c | 18 +-
drivers/gpu/drm/i915/i915_drv.h | 12 +-
drivers/gpu/drm/i915/i915_gem.c | 16 +-
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 +
drivers/gpu/drm/i915/i915_gem_gtt.c | 740 ++++++++++++++++++++++++-----
drivers/gpu/drm/i915/i915_gem_gtt.h | 70 ++-
drivers/gpu/drm/i915/i915_gem_userptr.c | 12 +-
drivers/gpu/drm/i915/i915_gpu_error.c | 17 +-
drivers/gpu/drm/i915/i915_params.c | 2 +-
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/i915_trace.h | 16 +
drivers/gpu/drm/i915/intel_lrc.c | 50 +-
include/uapi/drm/i915_drm.h | 3 +-
13 files changed, 797 insertions(+), 163 deletions(-)
--
2.4.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 01/16] drm/i915: Remove unnecessary gen8_clamp_pd
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
@ 2015-05-26 14:21 ` Michel Thierry
2015-05-26 14:21 ` [PATCH 02/16] drm/i915/gen8: Make pdp allocation more dynamic Michel Thierry
` (15 subsequent siblings)
16 siblings, 0 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
gen8_clamp_pd clamps to the next page directory boundary, but the macro
gen8_for_each_pde already has a check to stop at the page directory boundary.
Furthermore, i915_pte_count also restricts to the next page table
boundary.
Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
drivers/gpu/drm/i915/i915_gem_gtt.h | 11 -----------
2 files changed, 1 insertion(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 17b7df0..5036ca0 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -878,7 +878,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
gen8_pde_t *const page_directory = kmap_atomic(pd->page);
struct i915_page_table *pt;
- uint64_t pd_len = gen8_clamp_pd(start, length);
+ uint64_t pd_len = length;
uint64_t pd_start = start;
uint32_t pde;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 0d46dd2..15476e0 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -431,17 +431,6 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
temp = min(temp, length), \
start += temp, length -= temp)
-/* Clamp length to the next page_directory boundary */
-static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
-{
- uint64_t next_pd = ALIGN(start + 1, 1 << GEN8_PDPE_SHIFT);
-
- if (next_pd > (start + length))
- return length;
-
- return next_pd - start;
-}
-
static inline uint32_t gen8_pte_index(uint64_t address)
{
return i915_pte_index(address, GEN8_PDE_SHIFT);
--
2.4.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 02/16] drm/i915/gen8: Make pdp allocation more dynamic
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
2015-05-26 14:21 ` [PATCH 01/16] drm/i915: Remove unnecessary gen8_clamp_pd Michel Thierry
@ 2015-05-26 14:21 ` Michel Thierry
2015-05-26 14:21 ` [PATCH 03/16] drm/i915/gen8: Abstract PDP usage Michel Thierry
` (14 subsequent siblings)
16 siblings, 0 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
This transitional patch doesn't do much for the existing code. However,
it should make upcoming patches to use the full 48b address space a bit
easier. The patch also introduces the PML4, ie. the new top level structure
of the page tables.
v2: Renamed pdp_free to be similar to pd/pt (unmap_and_free_pdp).
v3: To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.
v4: Rebase after s/page_tables/page_table/ and added extra information
about 4-level page table formats.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
drivers/gpu/drm/i915/i915_drv.h | 7 ++-
drivers/gpu/drm/i915/i915_gem_gtt.c | 107 +++++++++++++++++++++++++++++-------
drivers/gpu/drm/i915/i915_gem_gtt.h | 45 ++++++++++++---
3 files changed, 129 insertions(+), 30 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9adfd12..9eea844 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2436,7 +2436,12 @@ struct drm_i915_cmd_table {
#define HAS_HW_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 6)
#define HAS_LOGICAL_RING_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 8)
#define USES_PPGTT(dev) (i915.enable_ppgtt)
-#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt == 2)
+#define USES_FULL_PPGTT(dev) (i915.enable_ppgtt >= 2)
+#ifdef CONFIG_64BIT
+# define USES_FULL_48BIT_PPGTT(dev) (i915.enable_ppgtt == 3)
+#else
+# define USES_FULL_48BIT_PPGTT(dev) false
+#endif
#define HAS_OVERLAY(dev) (INTEL_INFO(dev)->has_overlay)
#define OVERLAY_NEEDS_PHYSICAL(dev) (INTEL_INFO(dev)->overlay_needs_physical)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5036ca0..a288f6b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -104,10 +104,18 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
{
bool has_aliasing_ppgtt;
bool has_full_ppgtt;
+ bool has_full_64bit_ppgtt;
has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
+#ifdef CONFIG_64BIT
+ has_full_64bit_ppgtt = (IS_BROADWELL(dev) ||
+ INTEL_INFO(dev)->gen >= 9) && false; /* FIXME: 64b */
+#else
+ has_full_64bit_ppgtt = false;
+#endif
+
if (intel_vgpu_active(dev))
has_full_ppgtt = false; /* emulation is too hard */
@@ -125,6 +133,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
if (enable_ppgtt == 2 && has_full_ppgtt)
return 2;
+ if (enable_ppgtt == 3 && has_full_64bit_ppgtt)
+ return 3;
+
#ifdef CONFIG_INTEL_IOMMU
/* Disable ppgtt on SNB if VT-d is on. */
if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) {
@@ -451,6 +462,45 @@ free_pd:
return ERR_PTR(ret);
}
+static void __pdp_fini(struct i915_page_directory_pointer *pdp)
+{
+ kfree(pdp->used_pdpes);
+ kfree(pdp->page_directory);
+ /* HACK */
+ pdp->page_directory = NULL;
+}
+
+static void unmap_and_free_pdp(struct i915_page_directory_pointer *pdp,
+ struct drm_device *dev)
+{
+ __pdp_fini(pdp);
+ if (USES_FULL_48BIT_PPGTT(dev))
+ kfree(pdp);
+}
+
+static int __pdp_init(struct i915_page_directory_pointer *pdp,
+ struct drm_device *dev)
+{
+ size_t pdpes = I915_PDPES_PER_PDP(dev);
+
+ pdp->used_pdpes = kcalloc(BITS_TO_LONGS(pdpes),
+ sizeof(unsigned long),
+ GFP_KERNEL);
+ if (!pdp->used_pdpes)
+ return -ENOMEM;
+
+ pdp->page_directory = kcalloc(pdpes, sizeof(*pdp->page_directory), GFP_KERNEL);
+ if (!pdp->page_directory) {
+ kfree(pdp->used_pdpes);
+ /* the PDP might be the statically allocated top level. Keep it
+ * as clean as possible */
+ pdp->used_pdpes = NULL;
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
/* Broadwell Page Directory Pointer Descriptors */
static int gen8_write_pdp(struct intel_engine_cs *ring,
unsigned entry,
@@ -480,7 +530,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
{
int i, ret;
- for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
+ for (i = 3; i >= 0; i--) {
struct i915_page_directory *pd = ppgtt->pdp.page_directory[i];
dma_addr_t pd_daddr = pd ? pd->daddr : ppgtt->scratch_pd->daddr;
/* The page directory might be NULL, but we need to clear out
@@ -569,9 +619,6 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
pt_vaddr = NULL;
for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
- if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
- break;
-
if (pt_vaddr == NULL) {
struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
struct i915_page_table *pt = pd->page_table[pde];
@@ -653,7 +700,8 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
container_of(vm, struct i915_hw_ppgtt, base);
int i;
- for_each_set_bit(i, ppgtt->pdp.used_pdpes, GEN8_LEGACY_PDPES) {
+ for_each_set_bit(i, ppgtt->pdp.used_pdpes,
+ I915_PDPES_PER_PDP(ppgtt->base.dev)) {
if (WARN_ON(!ppgtt->pdp.page_directory[i]))
continue;
@@ -661,6 +709,7 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
unmap_and_free_pd(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
}
+ unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
unmap_and_free_pd(ppgtt->scratch_pd, ppgtt->base.dev);
unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
}
@@ -753,8 +802,9 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
struct i915_page_directory *pd;
uint64_t temp;
uint32_t pdpe;
+ size_t pdpes = I915_PDPES_PER_PDP(ppgtt->base.dev);
- WARN_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
+ WARN_ON(!bitmap_empty(new_pds, pdpes));
/* FIXME: upper bound must not overflow 32 bits */
WARN_ON((start + length) > (1ULL << 32));
@@ -775,18 +825,19 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
return 0;
unwind_out:
- for_each_set_bit(pdpe, new_pds, GEN8_LEGACY_PDPES)
+ for_each_set_bit(pdpe, new_pds, pdpes)
unmap_and_free_pd(pdp->page_directory[pdpe], dev);
return -ENOMEM;
}
static void
-free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
+free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts,
+ size_t pdpes)
{
int i;
- for (i = 0; i < GEN8_LEGACY_PDPES; i++)
+ for (i = 0; i < pdpes; i++)
kfree(new_pts[i]);
kfree(new_pts);
kfree(new_pds);
@@ -797,23 +848,24 @@ free_gen8_temp_bitmaps(unsigned long *new_pds, unsigned long **new_pts)
*/
static
int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
- unsigned long ***new_pts)
+ unsigned long ***new_pts,
+ size_t pdpes)
{
int i;
unsigned long *pds;
unsigned long **pts;
- pds = kcalloc(BITS_TO_LONGS(GEN8_LEGACY_PDPES), sizeof(unsigned long), GFP_KERNEL);
+ pds = kcalloc(BITS_TO_LONGS(pdpes), sizeof(unsigned long), GFP_KERNEL);
if (!pds)
return -ENOMEM;
- pts = kcalloc(GEN8_LEGACY_PDPES, sizeof(unsigned long *), GFP_KERNEL);
+ pts = kcalloc(pdpes, sizeof(unsigned long *), GFP_KERNEL);
if (!pts) {
kfree(pds);
return -ENOMEM;
}
- for (i = 0; i < GEN8_LEGACY_PDPES; i++) {
+ for (i = 0; i < pdpes; i++) {
pts[i] = kcalloc(BITS_TO_LONGS(I915_PDES),
sizeof(unsigned long), GFP_KERNEL);
if (!pts[i])
@@ -826,7 +878,7 @@ int __must_check alloc_gen8_temp_bitmaps(unsigned long **new_pds,
return 0;
err_out:
- free_gen8_temp_bitmaps(pds, pts);
+ free_gen8_temp_bitmaps(pds, pts, pdpes);
return -ENOMEM;
}
@@ -842,6 +894,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
const uint64_t orig_length = length;
uint64_t temp;
uint32_t pdpe;
+ size_t pdpes = I915_PDPES_PER_PDP(dev);
int ret;
/* Wrap is never okay since we can only represent 48b, and we don't
@@ -850,7 +903,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
if (WARN_ON(start + length < start))
return -ERANGE;
- ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables);
+ ret = alloc_gen8_temp_bitmaps(&new_page_dirs, &new_page_tables, pdpes);
if (ret)
return ret;
@@ -858,7 +911,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
new_page_dirs);
if (ret) {
- free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+ free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
return ret;
}
@@ -914,7 +967,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
set_bit(pdpe, ppgtt->pdp.used_pdpes);
}
- free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+ free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
return 0;
err_out:
@@ -923,10 +976,10 @@ err_out:
unmap_and_free_pt(ppgtt->pdp.page_directory[pdpe]->page_table[temp], vm->dev);
}
- for_each_set_bit(pdpe, new_page_dirs, GEN8_LEGACY_PDPES)
+ for_each_set_bit(pdpe, new_page_dirs, pdpes)
unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
- free_gen8_temp_bitmaps(new_page_dirs, new_page_tables);
+ free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
return ret;
}
@@ -950,8 +1003,22 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
gen8_initialize_pt(&ppgtt->base, ppgtt->scratch_pt);
gen8_initialize_pd(&ppgtt->base, ppgtt->scratch_pd);
+ if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+ int ret = __pdp_init(&ppgtt->pdp, false);
+
+ if (ret) {
+ unmap_and_free_pd(ppgtt->scratch_pd, ppgtt->base.dev);
+ unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
+ return ret;
+ }
+
+ ppgtt->base.total = 1ULL << 32;
+ } else {
+ ppgtt->base.total = 1ULL << 48;
+ return -EPERM; /* Not yet implemented */
+ }
+
ppgtt->base.start = 0;
- ppgtt->base.total = 1ULL << 32;
ppgtt->base.cleanup = gen8_ppgtt_cleanup;
ppgtt->base.allocate_va_range = gen8_alloc_va_range;
ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 15476e0..a01cc34 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -88,9 +88,17 @@ typedef uint64_t gen8_pde_t;
* PDPE | PDE | PTE | offset
* The difference as compared to normal x86 3 level page table is the PDPEs are
* programmed via register.
+ *
+ * GEN8 48b legacy style address is defined as a 4 level page table:
+ * 47:39 | 38:30 | 29:21 | 20:12 | 11:0
+ * PML4E | PDPE | PDE | PTE | offset
*/
+#define GEN8_PML4ES_PER_PML4 512
+#define GEN8_PML4E_SHIFT 39
#define GEN8_PDPE_SHIFT 30
-#define GEN8_PDPE_MASK 0x3
+/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
+ * tables */
+#define GEN8_PDPE_MASK 0x1ff
#define GEN8_PDE_SHIFT 21
#define GEN8_PDE_MASK 0x1ff
#define GEN8_PTE_SHIFT 12
@@ -98,6 +106,13 @@ typedef uint64_t gen8_pde_t;
#define GEN8_LEGACY_PDPES 4
#define GEN8_PTES I915_PTES(sizeof(gen8_pte_t))
+#ifdef CONFIG_64BIT
+# define I915_PDPES_PER_PDP(dev) (USES_FULL_48BIT_PPGTT(dev) ?\
+ GEN8_PML4ES_PER_PML4 : GEN8_LEGACY_PDPES)
+#else
+# define I915_PDPES_PER_PDP GEN8_LEGACY_PDPES
+#endif
+
#define PPAT_UNCACHED_INDEX (_PAGE_PWT | _PAGE_PCD)
#define PPAT_CACHED_PDE_INDEX 0 /* WB LLC */
#define PPAT_CACHED_INDEX _PAGE_PAT /* WB LLCeLLC */
@@ -224,9 +239,17 @@ struct i915_page_directory {
};
struct i915_page_directory_pointer {
- /* struct page *page; */
- DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
- struct i915_page_directory *page_directory[GEN8_LEGACY_PDPES];
+ struct page *page;
+ dma_addr_t daddr;
+ unsigned long *used_pdpes;
+ struct i915_page_directory **page_directory;
+};
+
+struct i915_pml4 {
+ struct page *page;
+ dma_addr_t daddr;
+ DECLARE_BITMAP(used_pml4es, GEN8_PML4ES_PER_PML4);
+ struct i915_page_directory_pointer *pdps[GEN8_PML4ES_PER_PML4];
};
struct i915_address_space {
@@ -325,8 +348,9 @@ struct i915_hw_ppgtt {
struct drm_mm_node node;
unsigned long pd_dirty_rings;
union {
- struct i915_page_directory_pointer pdp;
- struct i915_page_directory pd;
+ struct i915_pml4 pml4; /* GEN8+ & 64b PPGTT */
+ struct i915_page_directory_pointer pdp; /* GEN8+ */
+ struct i915_page_directory pd; /* GEN6-7 */
};
struct i915_page_table *scratch_pt;
@@ -423,14 +447,17 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
temp = min(temp, length), \
start += temp, length -= temp)
-#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter) \
- for (iter = gen8_pdpe_index(start); \
- pd = (pdp)->page_directory[iter], length > 0 && iter < GEN8_LEGACY_PDPES; \
+#define gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, b) \
+ for (iter = gen8_pdpe_index(start); \
+ pd = (pdp)->page_directory[iter], length > 0 && (iter < b); \
iter++, \
temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start, \
temp = min(temp, length), \
start += temp, length -= temp)
+#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter) \
+ gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
+
static inline uint32_t gen8_pte_index(uint64_t address)
{
return i915_pte_index(address, GEN8_PDE_SHIFT);
--
2.4.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 03/16] drm/i915/gen8: Abstract PDP usage
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
2015-05-26 14:21 ` [PATCH 01/16] drm/i915: Remove unnecessary gen8_clamp_pd Michel Thierry
2015-05-26 14:21 ` [PATCH 02/16] drm/i915/gen8: Make pdp allocation more dynamic Michel Thierry
@ 2015-05-26 14:21 ` Michel Thierry
2015-05-26 14:21 ` [PATCH 04/16] drm/i915/gen8: Add dynamic page trace events Michel Thierry
` (13 subsequent siblings)
16 siblings, 0 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
Up until now, ppgtt->pdp has always been the root of our page tables.
Legacy 32b addresses acted like it had 1 PDP with 4 PDPEs.
In preparation for 4 level page tables, we need to stop use ppgtt->pdp
directly unless we know it's what we want. The future structure will use
ppgtt->pml4 for the top level, and the pdp is just one of the entries
being pointed to by a pml4e.
v2: Updated after dynamic page allocation changes.
v3: Rebase after s/page_tables/page_table/.
v4: Rebase after changes in "Dynamic page table allocations" patch.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 141 +++++++++++++++++++++++-------------
1 file changed, 90 insertions(+), 51 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a288f6b..a950f26 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -550,6 +550,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
+ struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
gen8_pte_t *pt_vaddr, scratch_pte;
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -565,10 +566,10 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
struct i915_page_table *pt;
struct page *page_table;
- if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
+ if (WARN_ON(!pdp->page_directory[pdpe]))
continue;
- pd = ppgtt->pdp.page_directory[pdpe];
+ pd = pdp->page_directory[pdpe];
if (WARN_ON(!pd->page_table[pde]))
continue;
@@ -610,6 +611,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
+ struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
gen8_pte_t *pt_vaddr;
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -620,7 +622,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
if (pt_vaddr == NULL) {
- struct i915_page_directory *pd = ppgtt->pdp.page_directory[pdpe];
+ struct i915_page_directory *pd = pdp->page_directory[pdpe];
struct i915_page_table *pt = pd->page_table[pde];
struct page *page_table = pt->page;
@@ -675,6 +677,28 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
if (!HAS_LLC(vm->dev))
drm_clflush_virt_range(page_directory, PAGE_SIZE);
+
+ kunmap_atomic(page_directory);
+}
+
+/* It's likely we'll map more than one pagetable at a time. This function will
+ * save us unnecessary kmap calls, but do no more functionally than multiple
+ * calls to map_pt. */
+static void gen8_map_pagetable_range(struct i915_page_directory *pd,
+ uint64_t start,
+ uint64_t length,
+ struct drm_device *dev)
+{
+ gen8_pde_t * const page_directory = kmap_atomic(pd->page);
+ struct i915_page_table *pt;
+ uint64_t temp, pde;
+
+ gen8_for_each_pde(pt, pd, start, length, temp, pde)
+ __gen8_do_map_pt(page_directory + pde, pt, dev);
+
+ if (!HAS_LLC(dev))
+ drm_clflush_virt_range(page_directory, PAGE_SIZE);
+
kunmap_atomic(page_directory);
}
@@ -700,23 +724,29 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
container_of(vm, struct i915_hw_ppgtt, base);
int i;
- for_each_set_bit(i, ppgtt->pdp.used_pdpes,
- I915_PDPES_PER_PDP(ppgtt->base.dev)) {
- if (WARN_ON(!ppgtt->pdp.page_directory[i]))
- continue;
+ if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+ for_each_set_bit(i, ppgtt->pdp.used_pdpes,
+ I915_PDPES_PER_PDP(ppgtt->base.dev)) {
+ if (WARN_ON(!ppgtt->pdp.page_directory[i]))
+ continue;
- gen8_free_page_tables(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
- unmap_and_free_pd(ppgtt->pdp.page_directory[i], ppgtt->base.dev);
+ gen8_free_page_tables(ppgtt->pdp.page_directory[i],
+ ppgtt->base.dev);
+ unmap_and_free_pd(ppgtt->pdp.page_directory[i],
+ ppgtt->base.dev);
+ }
+ unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
+ } else {
+ WARN_ON(1); /* to be implemented later */
}
- unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
unmap_and_free_pd(ppgtt->scratch_pd, ppgtt->base.dev);
unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
}
/**
* gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
- * @ppgtt: Master ppgtt structure.
+ * @vm: Master vm structure.
* @pd: Page directory for this address range.
* @start: Starting virtual address to begin allocations.
* @length Size of the allocations.
@@ -732,13 +762,15 @@ static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
*
* Return: 0 if success; negative error code otherwise.
*/
-static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
+static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
struct i915_page_directory *pd,
uint64_t start,
uint64_t length,
unsigned long *new_pts)
{
- struct drm_device *dev = ppgtt->base.dev;
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+ struct drm_device *dev = vm->dev;
struct i915_page_table *pt;
uint64_t temp;
uint32_t pde;
@@ -755,7 +787,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
if (IS_ERR(pt))
goto unwind_out;
- gen8_initialize_pt(&ppgtt->base, pt);
+ gen8_initialize_pt(vm, pt);
pd->page_table[pde] = pt;
set_bit(pde, new_pts);
}
@@ -771,7 +803,7 @@ unwind_out:
/**
* gen8_ppgtt_alloc_page_directories() - Allocate page directories for VA range.
- * @ppgtt: Master ppgtt structure.
+ * @vm: Master vm structure.
* @pdp: Page directory pointer for this address range.
* @start: Starting virtual address to begin allocations.
* @length Size of the allocations.
@@ -792,17 +824,18 @@ unwind_out:
*
* Return: 0 if success; negative error code otherwise.
*/
-static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
- struct i915_page_directory_pointer *pdp,
- uint64_t start,
- uint64_t length,
- unsigned long *new_pds)
+static int
+gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
+ struct i915_page_directory_pointer *pdp,
+ uint64_t start,
+ uint64_t length,
+ unsigned long *new_pds)
{
- struct drm_device *dev = ppgtt->base.dev;
+ struct drm_device *dev = vm->dev;
struct i915_page_directory *pd;
uint64_t temp;
uint32_t pdpe;
- size_t pdpes = I915_PDPES_PER_PDP(ppgtt->base.dev);
+ size_t pdpes = I915_PDPES_PER_PDP(vm->dev);
WARN_ON(!bitmap_empty(new_pds, pdpes));
@@ -817,7 +850,7 @@ static int gen8_ppgtt_alloc_page_directories(struct i915_hw_ppgtt *ppgtt,
if (IS_ERR(pd))
goto unwind_out;
- gen8_initialize_pd(&ppgtt->base, pd);
+ gen8_initialize_pd(vm, pd);
pdp->page_directory[pdpe] = pd;
set_bit(pdpe, new_pds);
}
@@ -882,13 +915,13 @@ err_out:
return -ENOMEM;
}
-static int gen8_alloc_va_range(struct i915_address_space *vm,
- uint64_t start,
- uint64_t length)
+static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
+ struct i915_page_directory_pointer *pdp,
+ uint64_t start,
+ uint64_t length)
{
- struct i915_hw_ppgtt *ppgtt =
- container_of(vm, struct i915_hw_ppgtt, base);
unsigned long *new_page_dirs, **new_page_tables;
+ struct drm_device *dev = vm->dev;
struct i915_page_directory *pd;
const uint64_t orig_start = start;
const uint64_t orig_length = length;
@@ -908,16 +941,14 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
return ret;
/* Do the allocations first so we can easily bail out */
- ret = gen8_ppgtt_alloc_page_directories(ppgtt, &ppgtt->pdp, start, length,
- new_page_dirs);
+ ret = gen8_ppgtt_alloc_page_directories(vm, pdp, start, length, new_page_dirs);
if (ret) {
free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
return ret;
}
- /* For every page directory referenced, allocate page tables */
- gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
- ret = gen8_ppgtt_alloc_pagetabs(ppgtt, pd, start, length,
+ gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
+ ret = gen8_ppgtt_alloc_pagetabs(vm, pd, start, length,
new_page_tables[pdpe]);
if (ret)
goto err_out;
@@ -926,10 +957,7 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
start = orig_start;
length = orig_length;
- /* Allocations have completed successfully, so set the bitmaps, and do
- * the mappings. */
- gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
- gen8_pde_t *const page_directory = kmap_atomic(pd->page);
+ gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
struct i915_page_table *pt;
uint64_t pd_len = length;
uint64_t pd_start = start;
@@ -951,20 +979,10 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
/* Our pde is now pointing to the pagetable, pt */
set_bit(pde, pd->used_pdes);
-
- /* Map the PDE to the page table */
- __gen8_do_map_pt(page_directory + pde, pt, vm->dev);
-
- /* NB: We haven't yet mapped ptes to pages. At this
- * point we're still relying on insert_entries() */
}
- if (!HAS_LLC(vm->dev))
- drm_clflush_virt_range(page_directory, PAGE_SIZE);
-
- kunmap_atomic(page_directory);
-
- set_bit(pdpe, ppgtt->pdp.used_pdpes);
+ set_bit(pdpe, pdp->used_pdpes);
+ gen8_map_pagetable_range(pd, start, length, dev);
}
free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -973,16 +991,37 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
err_out:
while (pdpe--) {
for_each_set_bit(temp, new_page_tables[pdpe], I915_PDES)
- unmap_and_free_pt(ppgtt->pdp.page_directory[pdpe]->page_table[temp], vm->dev);
+ unmap_and_free_pt(pdp->page_directory[pdpe]->page_table[temp], dev);
}
for_each_set_bit(pdpe, new_page_dirs, pdpes)
- unmap_and_free_pd(ppgtt->pdp.page_directory[pdpe], vm->dev);
+ unmap_and_free_pd(pdp->page_directory[pdpe], dev);
free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
return ret;
}
+static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
+ struct i915_pml4 *pml4,
+ uint64_t start,
+ uint64_t length)
+{
+ WARN_ON(1); /* to be implemented later */
+ return 0;
+}
+
+static int gen8_alloc_va_range(struct i915_address_space *vm,
+ uint64_t start, uint64_t length)
+{
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+
+ if (!USES_FULL_48BIT_PPGTT(vm->dev))
+ return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
+ else
+ return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+}
+
/*
* GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
* with a net effect resembling a 2-level page table in normal x86 terms. Each
--
2.4.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 04/16] drm/i915/gen8: Add dynamic page trace events
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
` (2 preceding siblings ...)
2015-05-26 14:21 ` [PATCH 03/16] drm/i915/gen8: Abstract PDP usage Michel Thierry
@ 2015-05-26 14:21 ` Michel Thierry
2015-05-26 14:21 ` [PATCH 05/16] drm/i915/gen8: implement alloc/free for 4lvl Michel Thierry
` (12 subsequent siblings)
16 siblings, 0 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
The dynamic page allocation patch series added it for GEN6, this patch
adds them for GEN8.
v2: Consolidate pagetable/page_directory events
v3: Multiple rebases.
v4: Rebase after s/page_tables/page_table/.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 21 ++++++++++++++-------
drivers/gpu/drm/i915/i915_trace.h | 16 ++++++++++++++++
2 files changed, 30 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a950f26..dc33314f8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -684,19 +684,24 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
/* It's likely we'll map more than one pagetable at a time. This function will
* save us unnecessary kmap calls, but do no more functionally than multiple
* calls to map_pt. */
-static void gen8_map_pagetable_range(struct i915_page_directory *pd,
+static void gen8_map_pagetable_range(struct i915_address_space *vm,
+ struct i915_page_directory *pd,
uint64_t start,
- uint64_t length,
- struct drm_device *dev)
+ uint64_t length)
{
gen8_pde_t * const page_directory = kmap_atomic(pd->page);
struct i915_page_table *pt;
uint64_t temp, pde;
- gen8_for_each_pde(pt, pd, start, length, temp, pde)
- __gen8_do_map_pt(page_directory + pde, pt, dev);
+ gen8_for_each_pde(pt, pd, start, length, temp, pde) {
+ __gen8_do_map_pt(page_directory + pde, pt, vm->dev);
+ trace_i915_page_table_entry_map(vm, pde, pt,
+ gen8_pte_index(start),
+ gen8_pte_count(start, length),
+ GEN8_PTES);
+ }
- if (!HAS_LLC(dev))
+ if (!HAS_LLC(vm->dev))
drm_clflush_virt_range(page_directory, PAGE_SIZE);
kunmap_atomic(page_directory);
@@ -790,6 +795,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_address_space *vm,
gen8_initialize_pt(vm, pt);
pd->page_table[pde] = pt;
set_bit(pde, new_pts);
+ trace_i915_page_table_entry_alloc(vm, pde, start, GEN8_PDE_SHIFT);
}
return 0;
@@ -853,6 +859,7 @@ gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
gen8_initialize_pd(vm, pd);
pdp->page_directory[pdpe] = pd;
set_bit(pdpe, new_pds);
+ trace_i915_page_directory_entry_alloc(vm, pdpe, start, GEN8_PDPE_SHIFT);
}
return 0;
@@ -982,7 +989,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
}
set_bit(pdpe, pdp->used_pdpes);
- gen8_map_pagetable_range(pd, start, length, dev);
+ gen8_map_pagetable_range(vm, pd, start, length);
}
free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 497cba5..7f68ec3 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -213,6 +213,22 @@ DEFINE_EVENT(i915_page_table_entry, i915_page_table_entry_alloc,
TP_ARGS(vm, pde, start, pde_shift)
);
+DEFINE_EVENT_PRINT(i915_page_table_entry, i915_page_directory_entry_alloc,
+ TP_PROTO(struct i915_address_space *vm, u32 pdpe, u64 start, u64 pdpe_shift),
+ TP_ARGS(vm, pdpe, start, pdpe_shift),
+
+ TP_printk("vm=%p, pdpe=%d (0x%llx-0x%llx)",
+ __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
+DEFINE_EVENT_PRINT(i915_page_table_entry, i915_page_directory_pointer_entry_alloc,
+ TP_PROTO(struct i915_address_space *vm, u32 pml4e, u64 start, u64 pml4e_shift),
+ TP_ARGS(vm, pml4e, start, pml4e_shift),
+
+ TP_printk("vm=%p, pml4e=%d (0x%llx-0x%llx)",
+ __entry->vm, __entry->pde, __entry->start, __entry->end)
+);
+
/* Avoid extra math because we only support two sizes. The format is defined by
* bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
#define TRACE_PT_SIZE(bits) \
--
2.4.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 05/16] drm/i915/gen8: implement alloc/free for 4lvl
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
` (3 preceding siblings ...)
2015-05-26 14:21 ` [PATCH 04/16] drm/i915/gen8: Add dynamic page trace events Michel Thierry
@ 2015-05-26 14:21 ` Michel Thierry
2015-05-26 14:21 ` [PATCH 06/16] drm/i915/gen8: Add 4 level switching infrastructure and lrc support Michel Thierry
` (11 subsequent siblings)
16 siblings, 0 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
PML4 has no special attributes, and there will always be a PML4.
So simply initialize it at creation, and destroy it at the end.
The code for 4lvl is able to call into the existing 3lvl page table code
to handle all of the lower levels.
v2: Return something at the end of gen8_alloc_va_range_4lvl to keep the
compiler happy. And define ret only in one place.
Updated gen8_ppgtt_unmap_pages and gen8_ppgtt_free to handle 4lvl.
v3: Use i915_dma_unmap_single instead of pci API. Fix a
couple of incorrect checks when unmapping pdp and pd pages (Akash).
v4: Call __pdp_fini also for 32b PPGTT. Clean up alloc_pdp_single param
list.
v5: Prevent (harmless) out of range access in gen8_for_each_pml4e.
v6: Simplify alloc_vma_range_4lvl and gen8_ppgtt_init_common error
paths. (Akash)
v7: Rebase, s/gen8_ppgtt_free_*/gen8_ppgtt_cleanup_*/.
v8: Change location of pml4_init/fini. It will make next patches
cleaner.
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 198 ++++++++++++++++++++++++++++++------
drivers/gpu/drm/i915/i915_gem_gtt.h | 12 ++-
2 files changed, 177 insertions(+), 33 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index dc33314f8..7dad575 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -474,8 +474,12 @@ static void unmap_and_free_pdp(struct i915_page_directory_pointer *pdp,
struct drm_device *dev)
{
__pdp_fini(pdp);
- if (USES_FULL_48BIT_PPGTT(dev))
+
+ if (USES_FULL_48BIT_PPGTT(dev)) {
+ i915_dma_unmap_single(pdp, dev);
+ __free_page(pdp->page);
kfree(pdp);
+ }
}
static int __pdp_init(struct i915_page_directory_pointer *pdp,
@@ -501,6 +505,37 @@ static int __pdp_init(struct i915_page_directory_pointer *pdp,
return 0;
}
+static struct
+i915_page_directory_pointer *alloc_pdp_single(struct i915_hw_ppgtt *ppgtt)
+{
+ struct drm_device *dev = ppgtt->base.dev;
+ struct i915_page_directory_pointer *pdp;
+ int ret;
+
+ WARN_ON(!USES_FULL_48BIT_PPGTT(dev));
+
+ pdp = kmalloc(sizeof(*pdp), GFP_KERNEL);
+ if (!pdp)
+ return ERR_PTR(-ENOMEM);
+
+ pdp->page = alloc_page(GFP_KERNEL | GFP_DMA32 | __GFP_ZERO);
+ if (!pdp->page) {
+ kfree(pdp);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ ret = __pdp_init(pdp, dev);
+ if (ret) {
+ __free_page(pdp->page);
+ kfree(pdp);
+ return ERR_PTR(ret);
+ }
+
+ i915_dma_map_single(pdp, dev);
+
+ return pdp;
+}
+
/* Broadwell Page Directory Pointer Descriptors */
static int gen8_write_pdp(struct intel_engine_cs *ring,
unsigned entry,
@@ -681,6 +716,28 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
kunmap_atomic(page_directory);
}
+static void pml4_fini(struct i915_pml4 *pml4)
+{
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(pml4, struct i915_hw_ppgtt, pml4);
+ i915_dma_unmap_single(pml4, ppgtt->base.dev);
+ __free_page(pml4->page);
+ pml4->page = NULL;
+}
+
+static int pml4_init(struct i915_hw_ppgtt *ppgtt)
+{
+ struct i915_pml4 *pml4 = &ppgtt->pml4;
+
+ pml4->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ if (!pml4->page)
+ return -ENOMEM;
+
+ i915_dma_map_single(pml4, ppgtt->base.dev);
+
+ return 0;
+}
+
/* It's likely we'll map more than one pagetable at a time. This function will
* save us unnecessary kmap calls, but do no more functionally than multiple
* calls to map_pt. */
@@ -723,28 +780,46 @@ static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_dev
}
}
-static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
+static void gen8_ppgtt_cleanup_3lvl(struct i915_page_directory_pointer *pdp,
+ struct drm_device *dev)
{
- struct i915_hw_ppgtt *ppgtt =
- container_of(vm, struct i915_hw_ppgtt, base);
int i;
- if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
- for_each_set_bit(i, ppgtt->pdp.used_pdpes,
- I915_PDPES_PER_PDP(ppgtt->base.dev)) {
- if (WARN_ON(!ppgtt->pdp.page_directory[i]))
- continue;
+ for_each_set_bit(i, pdp->used_pdpes, I915_PDPES_PER_PDP(dev)) {
+ if (WARN_ON(!pdp->page_directory[i]))
+ continue;
- gen8_free_page_tables(ppgtt->pdp.page_directory[i],
- ppgtt->base.dev);
- unmap_and_free_pd(ppgtt->pdp.page_directory[i],
- ppgtt->base.dev);
- }
- unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev);
- } else {
- WARN_ON(1); /* to be implemented later */
+ gen8_free_page_tables(pdp->page_directory[i], dev);
+ unmap_and_free_pd(pdp->page_directory[i], dev);
}
+ unmap_and_free_pdp(pdp, dev);
+}
+
+static void gen8_ppgtt_cleanup_4lvl(struct i915_hw_ppgtt *ppgtt)
+{
+ int i;
+
+ for_each_set_bit(i, ppgtt->pml4.used_pml4es, GEN8_PML4ES_PER_PML4) {
+ if (WARN_ON(!ppgtt->pml4.pdps[i]))
+ continue;
+
+ gen8_ppgtt_cleanup_3lvl(ppgtt->pml4.pdps[i], ppgtt->base.dev);
+ }
+
+ pml4_fini(&ppgtt->pml4);
+}
+
+static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
+{
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+
+ if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
+ gen8_ppgtt_cleanup_3lvl(&ppgtt->pdp, ppgtt->base.dev);
+ else
+ gen8_ppgtt_cleanup_4lvl(ppgtt);
+
unmap_and_free_pd(ppgtt->scratch_pd, ppgtt->base.dev);
unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
}
@@ -1013,8 +1088,62 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
uint64_t start,
uint64_t length)
{
- WARN_ON(1); /* to be implemented later */
+ DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4);
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+ struct i915_page_directory_pointer *pdp;
+ const uint64_t orig_start = start;
+ const uint64_t orig_length = length;
+ uint64_t temp, pml4e;
+ int ret = 0;
+
+ /* Do the pml4 allocations first, so we don't need to track the newly
+ * allocated tables below the pdp */
+ bitmap_zero(new_pdps, GEN8_PML4ES_PER_PML4);
+
+ /* The pagedirectory and pagetable allocations are done in the shared 3
+ * and 4 level code. Just allocate the pdps.
+ */
+ gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+ if (!pdp) {
+ WARN_ON(test_bit(pml4e, pml4->used_pml4es));
+ pdp = alloc_pdp_single(ppgtt);
+ if (IS_ERR(pdp))
+ goto err_out;
+
+ pml4->pdps[pml4e] = pdp;
+ set_bit(pml4e, new_pdps);
+ trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, pml4e,
+ pml4e << GEN8_PML4E_SHIFT,
+ GEN8_PML4E_SHIFT);
+ }
+ }
+
+ WARN(bitmap_weight(new_pdps, GEN8_PML4ES_PER_PML4) > 2,
+ "The allocation has spanned more than 512GB. "
+ "It is highly likely this is incorrect.");
+
+ start = orig_start;
+ length = orig_length;
+
+ gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+ WARN_ON(!pdp);
+
+ ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
+ if (ret)
+ goto err_out;
+ }
+
+ bitmap_or(pml4->used_pml4es, new_pdps, pml4->used_pml4es,
+ GEN8_PML4ES_PER_PML4);
+
return 0;
+
+err_out:
+ for_each_set_bit(pml4e, new_pdps, GEN8_PML4ES_PER_PML4)
+ gen8_ppgtt_cleanup_3lvl(pml4->pdps[pml4e], vm->dev);
+
+ return ret;
}
static int gen8_alloc_va_range(struct i915_address_space *vm,
@@ -1023,10 +1152,10 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
- if (!USES_FULL_48BIT_PPGTT(vm->dev))
- return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
- else
+ if (USES_FULL_48BIT_PPGTT(vm->dev))
return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+ else
+ return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
}
/*
@@ -1038,6 +1167,8 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
*/
static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
{
+ int ret;
+
ppgtt->scratch_pt = alloc_pt_single(ppgtt->base.dev);
if (IS_ERR(ppgtt->scratch_pt))
return PTR_ERR(ppgtt->scratch_pt);
@@ -1049,19 +1180,19 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
gen8_initialize_pt(&ppgtt->base, ppgtt->scratch_pt);
gen8_initialize_pd(&ppgtt->base, ppgtt->scratch_pd);
- if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
- int ret = __pdp_init(&ppgtt->pdp, false);
+ if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+ ret = pml4_init(ppgtt);
+ if (ret)
+ goto err_out;
- if (ret) {
- unmap_and_free_pd(ppgtt->scratch_pd, ppgtt->base.dev);
- unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
- return ret;
- }
+ ppgtt->base.total = 1ULL << 48;
+ } else {
+ ret = __pdp_init(&ppgtt->pdp, false);
+ if (ret)
+ goto err_out;
ppgtt->base.total = 1ULL << 32;
- } else {
- ppgtt->base.total = 1ULL << 48;
- return -EPERM; /* Not yet implemented */
+ trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, 0, 0, GEN8_PML4E_SHIFT);
}
ppgtt->base.start = 0;
@@ -1075,6 +1206,11 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->switch_mm = gen8_mm_switch;
return 0;
+
+err_out:
+ unmap_and_free_pd(ppgtt->scratch_pd, ppgtt->base.dev);
+ unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
+ return ret;
}
static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index a01cc34..2229d05 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -95,6 +95,7 @@ typedef uint64_t gen8_pde_t;
*/
#define GEN8_PML4ES_PER_PML4 512
#define GEN8_PML4E_SHIFT 39
+#define GEN8_PML4E_MASK (GEN8_PML4ES_PER_PML4 - 1)
#define GEN8_PDPE_SHIFT 30
/* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page
* tables */
@@ -455,6 +456,14 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
temp = min(temp, length), \
start += temp, length -= temp)
+#define gen8_for_each_pml4e(pdp, pml4, start, length, temp, iter) \
+ for (iter = gen8_pml4e_index(start); \
+ pdp = (pml4)->pdps[iter], length > 0 && iter < GEN8_PML4ES_PER_PML4; \
+ iter++, \
+ temp = ALIGN(start+1, 1ULL << GEN8_PML4E_SHIFT) - start, \
+ temp = min(temp, length), \
+ start += temp, length -= temp)
+
#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter) \
gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
@@ -475,8 +484,7 @@ static inline uint32_t gen8_pdpe_index(uint64_t address)
static inline uint32_t gen8_pml4e_index(uint64_t address)
{
- WARN_ON(1); /* For 64B */
- return 0;
+ return (address >> GEN8_PML4E_SHIFT) & GEN8_PML4E_MASK;
}
static inline size_t gen8_pte_count(uint64_t address, uint64_t length)
--
2.4.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 06/16] drm/i915/gen8: Add 4 level switching infrastructure and lrc support
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
` (4 preceding siblings ...)
2015-05-26 14:21 ` [PATCH 05/16] drm/i915/gen8: implement alloc/free for 4lvl Michel Thierry
@ 2015-05-26 14:21 ` Michel Thierry
2015-05-26 14:21 ` [PATCH 07/16] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT Michel Thierry
` (10 subsequent siblings)
16 siblings, 0 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
In 64b (48bit canonical) PPGTT addressing, the PDP0 register contains
the base address to PML4, while the other PDP registers are ignored.
In LRC, the addressing mode must be specified in every context descriptor.
v2: PML4 update in legacy context switch is left for historic reasons,
the preferred mode of operation is with lrc context based submission.
v3: s/gen8_map_page_directory/gen8_setup_page_directory and
s/gen8_map_page_directory_pointer/gen8_setup_page_directory_pointer.
Also, clflush will be needed for bxt. (Akash)
v4: Squashed lrc-specific code and use a macro to set PML4 register.
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 62 ++++++++++++++++++++++++++++++++++---
drivers/gpu/drm/i915/i915_gem_gtt.h | 2 ++
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/intel_lrc.c | 50 +++++++++++++++++++++++-------
4 files changed, 98 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7dad575..fe29216 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -216,6 +216,9 @@ static gen8_pde_t gen8_pde_encode(struct drm_device *dev,
return pde;
}
+#define gen8_pdpe_encode gen8_pde_encode
+#define gen8_pml4e_encode gen8_pde_encode
+
static gen6_pte_t snb_pte_encode(dma_addr_t addr,
enum i915_cache_level level,
bool valid, u32 unused)
@@ -560,8 +563,8 @@ static int gen8_write_pdp(struct intel_engine_cs *ring,
return 0;
}
-static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
- struct intel_engine_cs *ring)
+static int gen8_legacy_mm_switch(struct i915_hw_ppgtt *ppgtt,
+ struct intel_engine_cs *ring)
{
int i, ret;
@@ -578,6 +581,12 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
return 0;
}
+static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
+ struct intel_engine_cs *ring)
+{
+ return gen8_write_pdp(ring, 0, ppgtt->pml4.daddr);
+}
+
static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
uint64_t start,
uint64_t length,
@@ -764,6 +773,45 @@ static void gen8_map_pagetable_range(struct i915_address_space *vm,
kunmap_atomic(page_directory);
}
+static void
+gen8_setup_page_directory(struct i915_page_directory_pointer *pdp,
+ struct i915_page_directory *pd,
+ int index,
+ struct drm_device *dev)
+{
+ gen8_ppgtt_pdpe_t *page_directorypo;
+ gen8_ppgtt_pdpe_t pdpe;
+
+ if (!USES_FULL_48BIT_PPGTT(dev))
+ return;
+
+ page_directorypo = kmap_atomic(pdp->page);
+ pdpe = gen8_pdpe_encode(dev, pd->daddr, I915_CACHE_LLC);
+ page_directorypo[index] = pdpe;
+
+ if (!HAS_LLC(dev))
+ drm_clflush_virt_range(page_directorypo, PAGE_SIZE);
+
+ kunmap_atomic(page_directorypo);
+}
+
+static void
+gen8_setup_page_directory_pointer(struct i915_pml4 *pml4,
+ struct i915_page_directory_pointer *pdp,
+ int index,
+ struct drm_device *dev)
+{
+ gen8_ppgtt_pml4e_t *pagemap = kmap_atomic(pml4->page);
+ gen8_ppgtt_pml4e_t pml4e = gen8_pml4e_encode(dev, pdp->daddr, I915_CACHE_LLC);
+ WARN_ON(!USES_FULL_48BIT_PPGTT(dev));
+ pagemap[index] = pml4e;
+
+ if (!HAS_LLC(dev))
+ drm_clflush_virt_range(pagemap, PAGE_SIZE);
+
+ kunmap_atomic(pagemap);
+}
+
static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_device *dev)
{
int i;
@@ -1065,6 +1113,7 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
set_bit(pdpe, pdp->used_pdpes);
gen8_map_pagetable_range(vm, pd, start, length);
+ gen8_setup_page_directory(pdp, pd, pdpe, dev);
}
free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1132,6 +1181,8 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
if (ret)
goto err_out;
+
+ gen8_setup_page_directory_pointer(pml4, pdp, pml4e, vm->dev);
}
bitmap_or(pml4->used_pml4es, new_pdps, pml4->used_pml4es,
@@ -1186,12 +1237,14 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
goto err_out;
ppgtt->base.total = 1ULL << 48;
+ ppgtt->switch_mm = gen8_48b_mm_switch;
} else {
ret = __pdp_init(&ppgtt->pdp, false);
if (ret)
goto err_out;
ppgtt->base.total = 1ULL << 32;
+ ppgtt->switch_mm = gen8_legacy_mm_switch;
trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, 0, 0, GEN8_PML4E_SHIFT);
}
@@ -1203,8 +1256,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->base.unbind_vma = ppgtt_unbind_vma;
ppgtt->base.bind_vma = ppgtt_bind_vma;
- ppgtt->switch_mm = gen8_mm_switch;
-
return 0;
err_out:
@@ -1392,8 +1443,9 @@ static void gen8_ppgtt_enable(struct drm_device *dev)
int j;
for_each_ring(ring, dev_priv, j) {
+ u32 four_level = USES_FULL_48BIT_PPGTT(dev) ? GEN8_GFX_PPGTT_64B : 0;
I915_WRITE(RING_MODE_GEN7(ring),
- _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE));
+ _MASKED_BIT_ENABLE(GFX_PPGTT_ENABLE | four_level));
}
}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 2229d05..9d53b64 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -39,6 +39,8 @@ struct drm_i915_file_private;
typedef uint32_t gen6_pte_t;
typedef uint64_t gen8_pte_t;
typedef uint64_t gen8_pde_t;
+typedef uint64_t gen8_ppgtt_pdpe_t;
+typedef uint64_t gen8_ppgtt_pml4e_t;
#define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 6eeba63..5b5f94c 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1642,6 +1642,7 @@ enum skl_disp_power_wells {
#define GFX_REPLAY_MODE (1<<11)
#define GFX_PSMI_GRANULARITY (1<<10)
#define GFX_PPGTT_ENABLE (1<<9)
+#define GEN8_GFX_PPGTT_64B (1<<7)
#define VLV_DISPLAY_BASE 0x180000
#define VLV_MIPI_BASE VLV_DISPLAY_BASE
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 96ae90a..bbcb3cb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -197,6 +197,11 @@
reg_state[CTX_PDP ## n ## _LDW+1] = lower_32_bits(_addr); \
}
+#define ASSIGN_CTX_PML4(ppgtt, reg_state) { \
+ reg_state[CTX_PDP0_UDW + 1] = upper_32_bits(ppgtt->pml4.daddr); \
+ reg_state[CTX_PDP0_LDW + 1] = lower_32_bits(ppgtt->pml4.daddr); \
+}
+
enum {
ADVANCED_CONTEXT = 0,
LEGACY_CONTEXT,
@@ -269,11 +274,15 @@ static uint64_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
struct drm_device *dev = ring->dev;
uint64_t desc;
uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj);
+ bool legacy_64bit_ctx = USES_FULL_48BIT_PPGTT(dev);
WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
desc = GEN8_CTX_VALID;
- desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
+ if (legacy_64bit_ctx)
+ desc |= LEGACY_64B_CONTEXT << GEN8_CTX_MODE_SHIFT;
+ else
+ desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
if (IS_GEN8(ctx_obj->base.dev))
desc |= GEN8_CTX_L3LLC_COHERENT;
desc |= GEN8_CTX_PRIVILEGE;
@@ -344,10 +353,16 @@ static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
reg_state[CTX_RING_TAIL+1] = tail;
reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(ring_obj);
- /* True PPGTT with dynamic page allocation: update PDP registers and
- * point the unallocated PDPs to the scratch page
- */
- if (ppgtt) {
+ if (ppgtt && USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+ /* True 64b PPGTT (48bit canonical)
+ * PDP0_DESCRIPTOR contains the base address to PML4 and
+ * other PDP Descriptors are ignored
+ */
+ ASSIGN_CTX_PML4(ppgtt, reg_state);
+ } else if (ppgtt) {
+ /* True 32b PPGTT with dynamic page allocation: update PDP
+ * registers and point the unallocated PDPs to the scratch page
+ */
ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
@@ -1777,13 +1792,24 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
- /* With dynamic page allocation, PDPs may not be allocated at this point,
- * Point the unallocated PDPs to the scratch page
- */
- ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
- ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
- ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
- ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
+ if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+ /* 64b PPGTT (48bit canonical)
+ * PDP0_DESCRIPTOR contains the base address to PML4 and
+ * other PDP Descriptors are ignored.
+ */
+ ASSIGN_CTX_PML4(ppgtt, reg_state);
+ } else {
+ /* 32b PPGTT
+ * PDP*_DESCRIPTOR contains the base address of space supported.
+ * With dynamic page allocation, PDPs may not be allocated at
+ * this point. Point the unallocated PDPs to the scratch page
+ */
+ ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
+ ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
+ ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
+ ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
+ }
+
if (ring->id == RCS) {
reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
reg_state[CTX_R_PWR_CLK_STATE] = GEN8_R_PWR_CLK_STATE;
--
2.4.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 07/16] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
` (5 preceding siblings ...)
2015-05-26 14:21 ` [PATCH 06/16] drm/i915/gen8: Add 4 level switching infrastructure and lrc support Michel Thierry
@ 2015-05-26 14:21 ` Michel Thierry
2015-05-26 14:21 ` [PATCH 08/16] drm/i915: Plumb sg_iter through va allocation ->maps Michel Thierry
` (9 subsequent siblings)
16 siblings, 0 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
The insert_entries function was the function used to write PTEs. For the
PPGTT it was "hardcoded" to only understand two level page tables, which
was the case for GEN7. We can reuse this for 4 level page tables, and
remove the concept of insert_entries, which was never viable past 2
level page tables anyway, but it requires a bit of rework to make the
function a bit more generic.
This patch begins the generalization work, and it will be heavily used
upon when the 48b code is complete. The patch series attempts to make
each function which touches a part of code specific to the page table
level and here is no exception.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 56 ++++++++++++++++++++++++++-----------
1 file changed, 39 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index fe29216..e71dbfc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -587,23 +587,19 @@ static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
return gen8_write_pdp(ring, 0, ppgtt->pml4.daddr);
}
-static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
- uint64_t start,
- uint64_t length,
- bool use_scratch)
+static void gen8_ppgtt_clear_pte_range(struct i915_page_directory_pointer *pdp,
+ uint64_t start,
+ uint64_t length,
+ gen8_pte_t scratch_pte,
+ const bool flush)
{
- struct i915_hw_ppgtt *ppgtt =
- container_of(vm, struct i915_hw_ppgtt, base);
- struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
- gen8_pte_t *pt_vaddr, scratch_pte;
+ gen8_pte_t *pt_vaddr;
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
unsigned num_entries = length >> PAGE_SHIFT;
unsigned last_pte, i;
- scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
- I915_CACHE_LLC, use_scratch);
while (num_entries) {
struct i915_page_directory *pd;
@@ -636,7 +632,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
num_entries--;
}
- if (!HAS_LLC(ppgtt->base.dev))
+ if (flush)
drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
kunmap_atomic(pt_vaddr);
@@ -648,14 +644,28 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
}
}
-static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
- struct sg_table *pages,
- uint64_t start,
- enum i915_cache_level cache_level, u32 unused)
+static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
+ uint64_t start,
+ uint64_t length,
+ bool use_scratch)
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+
+ gen8_pte_t scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
+ I915_CACHE_LLC, use_scratch);
+
+ gen8_ppgtt_clear_pte_range(pdp, start, length, scratch_pte, !HAS_LLC(vm->dev));
+}
+
+static void
+gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer *pdp,
+ struct sg_table *pages,
+ uint64_t start,
+ enum i915_cache_level cache_level,
+ const bool flush)
+{
gen8_pte_t *pt_vaddr;
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -677,7 +687,7 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
cache_level, true);
if (++pte == GEN8_PTES) {
- if (!HAS_LLC(ppgtt->base.dev))
+ if (flush)
drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
kunmap_atomic(pt_vaddr);
pt_vaddr = NULL;
@@ -689,12 +699,24 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
}
}
if (pt_vaddr) {
- if (!HAS_LLC(ppgtt->base.dev))
+ if (flush)
drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
kunmap_atomic(pt_vaddr);
}
}
+static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
+ struct sg_table *pages,
+ uint64_t start,
+ enum i915_cache_level cache_level,
+ u32 unused)
+{
+ struct i915_hw_ppgtt *ppgtt = container_of(vm, struct i915_hw_ppgtt, base);
+ struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+
+ gen8_ppgtt_insert_pte_entries(pdp, pages, start, cache_level, !HAS_LLC(vm->dev));
+}
+
static void __gen8_do_map_pt(gen8_pde_t * const pde,
struct i915_page_table *pt,
struct drm_device *dev)
--
2.4.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 08/16] drm/i915: Plumb sg_iter through va allocation ->maps
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
` (6 preceding siblings ...)
2015-05-26 14:21 ` [PATCH 07/16] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT Michel Thierry
@ 2015-05-26 14:21 ` Michel Thierry
2015-05-26 14:21 ` [PATCH 09/16] drm/i915/gen8: Add 4 level support in insert_entries and clear_range Michel Thierry
` (8 subsequent siblings)
16 siblings, 0 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
As a step towards implementing 4 levels, while not discarding the
existing pte map functions, we need to pass the sg_iter through. The
current function understands to the page directory granularity. An
object's pages may span the page directory, and so using the iter
directly as we write the PTEs allows the iterator to stay coherent
through a VMA mapping operation spanning multiple page table levels.
v2: Rebase after s/page_tables/page_table/.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 46 +++++++++++++++++++++++--------------
1 file changed, 29 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e71dbfc..2b6ee8e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -661,7 +661,7 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
static void
gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer *pdp,
- struct sg_table *pages,
+ struct sg_page_iter *sg_iter,
uint64_t start,
enum i915_cache_level cache_level,
const bool flush)
@@ -670,11 +670,10 @@ gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer *pdp,
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
- struct sg_page_iter sg_iter;
pt_vaddr = NULL;
- for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
+ while (__sg_page_iter_next(sg_iter)) {
if (pt_vaddr == NULL) {
struct i915_page_directory *pd = pdp->page_directory[pdpe];
struct i915_page_table *pt = pd->page_table[pde];
@@ -684,7 +683,7 @@ gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer *pdp,
}
pt_vaddr[pte] =
- gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
+ gen8_pte_encode(sg_page_iter_dma_address(sg_iter),
cache_level, true);
if (++pte == GEN8_PTES) {
if (flush)
@@ -713,8 +712,10 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
{
struct i915_hw_ppgtt *ppgtt = container_of(vm, struct i915_hw_ppgtt, base);
struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+ struct sg_page_iter sg_iter;
- gen8_ppgtt_insert_pte_entries(pdp, pages, start, cache_level, !HAS_LLC(vm->dev));
+ __sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
+ gen8_ppgtt_insert_pte_entries(pdp, &sg_iter, start, cache_level, !HAS_LLC(vm->dev));
}
static void __gen8_do_map_pt(gen8_pde_t * const pde,
@@ -1067,10 +1068,12 @@ err_out:
return -ENOMEM;
}
-static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
- struct i915_page_directory_pointer *pdp,
- uint64_t start,
- uint64_t length)
+static int __gen8_alloc_vma_range_3lvl(struct i915_address_space *vm,
+ struct i915_page_directory_pointer *pdp,
+ struct sg_page_iter *sg_iter,
+ uint64_t start,
+ uint64_t length,
+ u32 flags)
{
unsigned long *new_page_dirs, **new_page_tables;
struct drm_device *dev = vm->dev;
@@ -1129,7 +1132,11 @@ static int gen8_alloc_va_range_3lvl(struct i915_address_space *vm,
gen8_pte_index(pd_start),
gen8_pte_count(pd_start, pd_len));
- /* Our pde is now pointing to the pagetable, pt */
+ if (sg_iter) {
+ WARN_ON(!sg_iter->__nents);
+ gen8_ppgtt_insert_pte_entries(pdp, sg_iter, pd_start,
+ flags, !HAS_LLC(vm->dev));
+ }
set_bit(pde, pd->used_pdes);
}
@@ -1154,10 +1161,12 @@ err_out:
return ret;
}
-static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
- struct i915_pml4 *pml4,
- uint64_t start,
- uint64_t length)
+static int __gen8_alloc_vma_range_4lvl(struct i915_address_space *vm,
+ struct i915_pml4 *pml4,
+ struct sg_page_iter *sg_iter,
+ uint64_t start,
+ uint64_t length,
+ u32 flags)
{
DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4);
struct i915_hw_ppgtt *ppgtt =
@@ -1200,7 +1209,8 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
WARN_ON(!pdp);
- ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
+ ret = __gen8_alloc_vma_range_3lvl(vm, pdp, sg_iter,
+ start, length, flags);
if (ret)
goto err_out;
@@ -1226,9 +1236,11 @@ static int gen8_alloc_va_range(struct i915_address_space *vm,
container_of(vm, struct i915_hw_ppgtt, base);
if (USES_FULL_48BIT_PPGTT(vm->dev))
- return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+ return __gen8_alloc_vma_range_4lvl(vm, &ppgtt->pml4, NULL,
+ start, length, 0);
else
- return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
+ return __gen8_alloc_vma_range_3lvl(vm, &ppgtt->pdp, NULL,
+ start, length, 0);
}
/*
--
2.4.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 09/16] drm/i915/gen8: Add 4 level support in insert_entries and clear_range
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
` (7 preceding siblings ...)
2015-05-26 14:21 ` [PATCH 08/16] drm/i915: Plumb sg_iter through va allocation ->maps Michel Thierry
@ 2015-05-26 14:21 ` Michel Thierry
2015-05-26 15:10 ` Michel Thierry
2015-05-26 14:21 ` [PATCH 10/16] drm/i915/gen8: Initialize PDPs Michel Thierry
` (7 subsequent siblings)
16 siblings, 1 reply; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
When 48b is enabled, gen8_ppgtt_insert_entries needs to read the Page Map
Level 4 (PML4), before it selects which Page Directory Pointer (PDP)
it will write to.
Similarly, gen8_ppgtt_clear_range needs to get the correct PDP/PD range.
This patch was inspired by Ben's "Depend exclusively on map and
unmap_vma".
v2: Rebase after s/page_tables/page_table/.
v3: Remove unnecessary pdpe loop in gen8_ppgtt_clear_range_4lvl and use
clamp_pdp in gen8_ppgtt_insert_entries (Akash).
v4: Merge gen8_ppgtt_clear_range_4lvl into gen8_ppgtt_clear_range to
maintain symmetry with gen8_ppgtt_insert_entries (Akash).
v5: Do not mix pages and bytes in insert_entries (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 51 +++++++++++++++++++++++++++++++------
drivers/gpu/drm/i915/i915_gem_gtt.h | 11 ++++++++
2 files changed, 54 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 2b6ee8e..dbbf367 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -651,18 +651,33 @@ static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
{
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
- struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
-
gen8_pte_t scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
I915_CACHE_LLC, use_scratch);
- gen8_ppgtt_clear_pte_range(pdp, start, length, scratch_pte, !HAS_LLC(vm->dev));
+ if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+ gen8_ppgtt_clear_pte_range(&ppgtt->pdp, start, length,
+ scratch_pte,
+ !HAS_LLC(ppgtt->base.dev));
+ } else {
+ uint64_t templ4, pml4e;
+ struct i915_page_directory_pointer *pdp;
+
+ gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
+ uint64_t pdp_len = gen8_clamp_pdp(start, length);
+ uint64_t pdp_start = start;
+
+ gen8_ppgtt_clear_pte_range(pdp, pdp_start, pdp_len,
+ scratch_pte,
+ !HAS_LLC(ppgtt->base.dev));
+ }
+ }
}
static void
gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer *pdp,
struct sg_page_iter *sg_iter,
uint64_t start,
+ size_t pages,
enum i915_cache_level cache_level,
const bool flush)
{
@@ -673,7 +688,7 @@ gen8_ppgtt_insert_pte_entries(struct i915_page_directory_pointer *pdp,
pt_vaddr = NULL;
- while (__sg_page_iter_next(sg_iter)) {
+ while (pages-- && __sg_page_iter_next(sg_iter)) {
if (pt_vaddr == NULL) {
struct i915_page_directory *pd = pdp->page_directory[pdpe];
struct i915_page_table *pt = pd->page_table[pde];
@@ -710,12 +725,31 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
enum i915_cache_level cache_level,
u32 unused)
{
- struct i915_hw_ppgtt *ppgtt = container_of(vm, struct i915_hw_ppgtt, base);
- struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
struct sg_page_iter sg_iter;
__sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
- gen8_ppgtt_insert_pte_entries(pdp, &sg_iter, start, cache_level, !HAS_LLC(vm->dev));
+
+ if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+ gen8_ppgtt_insert_pte_entries(&ppgtt->pdp, &sg_iter, start,
+ sg_nents(pages->sgl),
+ cache_level, !HAS_LLC(vm->dev));
+ } else {
+ struct i915_page_directory_pointer *pdp;
+ uint64_t templ4, pml4e;
+ uint64_t length = sg_nents(pages->sgl) << PAGE_SHIFT;
+
+ gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
+ uint64_t pdp_len = gen8_clamp_pdp(start, length) >> PAGE_SHIFT;
+ uint64_t pdp_start = start;
+
+ gen8_ppgtt_insert_pte_entries(pdp, &sg_iter,
+ pdp_start, pdp_len,
+ cache_level,
+ !HAS_LLC(vm->dev));
+ }
+ }
}
static void __gen8_do_map_pt(gen8_pde_t * const pde,
@@ -1135,7 +1169,8 @@ static int __gen8_alloc_vma_range_3lvl(struct i915_address_space *vm,
if (sg_iter) {
WARN_ON(!sg_iter->__nents);
gen8_ppgtt_insert_pte_entries(pdp, sg_iter, pd_start,
- flags, !HAS_LLC(vm->dev));
+ gen8_pte_count(pd_start, pd_len),
+ flags, !HAS_LLC(vm->dev));
}
set_bit(pde, pd->used_pdes);
}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 9d53b64..9af33b2 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -469,6 +469,17 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter) \
gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev))
+/* Clamp length to the next page_directory pointer boundary */
+static inline uint64_t gen8_clamp_pdp(uint64_t start, uint64_t length)
+{
+ uint64_t next_pdp = ALIGN(start + 1, 1ULL << GEN8_PML4E_SHIFT);
+
+ if (next_pdp > (start + length))
+ return length;
+
+ return next_pdp - start;
+}
+
static inline uint32_t gen8_pte_index(uint64_t address)
{
return i915_pte_index(address, GEN8_PDE_SHIFT);
--
2.4.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 10/16] drm/i915/gen8: Initialize PDPs
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
` (8 preceding siblings ...)
2015-05-26 14:21 ` [PATCH 09/16] drm/i915/gen8: Add 4 level support in insert_entries and clear_range Michel Thierry
@ 2015-05-26 14:21 ` Michel Thierry
2015-05-26 14:21 ` [PATCH 11/16] drm/i915: Expand error state's address width to 64b Michel Thierry
` (6 subsequent siblings)
16 siblings, 0 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
Similar to PDs, while setting up a page directory pointer, make all entries
of the pdp point to the scratch pdp before mapping (and make all its entries
point to the scratch page); this is to be safe in case of out of bound
access or proactive prefetch.
Systems without LLC require an explicit flush.
This commit also moves gen8_initialize_pt next to the other initialize
page functions.
v2: Handle scratch_pdp allocation failure correctly, and keep
initialize_px functions together (Akash)
Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 108 ++++++++++++++++++++++++++++--------
drivers/gpu/drm/i915/i915_gem_gtt.h | 1 +
2 files changed, 86 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index dbbf367..df37c84 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -365,24 +365,6 @@ static void unmap_and_free_pt(struct i915_page_table *pt,
kfree(pt);
}
-static void gen8_initialize_pt(struct i915_address_space *vm,
- struct i915_page_table *pt)
-{
- gen8_pte_t *pt_vaddr, scratch_pte;
- int i;
-
- pt_vaddr = kmap_atomic(pt->page);
- scratch_pte = gen8_pte_encode(vm->scratch.addr,
- I915_CACHE_LLC, true);
-
- for (i = 0; i < GEN8_PTES; i++)
- pt_vaddr[i] = scratch_pte;
-
- if (!HAS_LLC(vm->dev))
- drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
- kunmap_atomic(pt_vaddr);
-}
-
static struct i915_page_table *alloc_pt_single(struct drm_device *dev)
{
struct i915_page_table *pt;
@@ -521,7 +503,7 @@ i915_page_directory_pointer *alloc_pdp_single(struct i915_hw_ppgtt *ppgtt)
if (!pdp)
return ERR_PTR(-ENOMEM);
- pdp->page = alloc_page(GFP_KERNEL | GFP_DMA32 | __GFP_ZERO);
+ pdp->page = alloc_page(GFP_KERNEL | GFP_DMA32);
if (!pdp->page) {
kfree(pdp);
return ERR_PTR(-ENOMEM);
@@ -761,6 +743,24 @@ static void __gen8_do_map_pt(gen8_pde_t * const pde,
*pde = entry;
}
+static void gen8_initialize_pt(struct i915_address_space *vm,
+ struct i915_page_table *pt)
+{
+ gen8_pte_t *pt_vaddr, scratch_pte;
+ int i;
+
+ pt_vaddr = kmap_atomic(pt->page);
+ scratch_pte = gen8_pte_encode(vm->scratch.addr,
+ I915_CACHE_LLC, true);
+
+ for (i = 0; i < GEN8_PTES; i++)
+ pt_vaddr[i] = scratch_pte;
+
+ if (!HAS_LLC(vm->dev))
+ drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
+ kunmap_atomic(pt_vaddr);
+}
+
static void gen8_initialize_pd(struct i915_address_space *vm,
struct i915_page_directory *pd)
{
@@ -782,6 +782,55 @@ static void gen8_initialize_pd(struct i915_address_space *vm,
kunmap_atomic(page_directory);
}
+static void gen8_initialize_pdp(struct i915_address_space *vm,
+ struct i915_page_directory_pointer *pdp)
+{
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+ gen8_ppgtt_pdpe_t *page_directorypo;
+ struct i915_page_directory *pd;
+ int i;
+
+ page_directorypo = kmap_atomic(pdp->page);
+ pd = (struct i915_page_directory *)ppgtt->scratch_pd;
+ for (i = 0; i < I915_PDPES_PER_PDP(vm->dev); i++) {
+ /* Map the PDPE to the page directory */
+ gen8_ppgtt_pdpe_t entry =
+ gen8_pdpe_encode(vm->dev, pd->daddr, I915_CACHE_LLC);
+ page_directorypo[i] = entry;
+ }
+
+ if (!HAS_LLC(vm->dev))
+ drm_clflush_virt_range(page_directorypo, PAGE_SIZE);
+
+ kunmap_atomic(page_directorypo);
+}
+
+static void gen8_initialize_pml4(struct i915_address_space *vm,
+ struct i915_pml4 *pml4)
+{
+ struct i915_hw_ppgtt *ppgtt =
+ container_of(vm, struct i915_hw_ppgtt, base);
+ gen8_ppgtt_pml4e_t *page_maplevel4;
+ struct i915_page_directory_pointer *pdp;
+ int i;
+
+ page_maplevel4 = kmap_atomic(pml4->page);
+ pdp = (struct i915_page_directory_pointer *)ppgtt->scratch_pdp;
+ for (i = 0; i < GEN8_PML4ES_PER_PML4; i++) {
+ /* Map the PML4E to the page directory pointer */
+ gen8_ppgtt_pml4e_t entry =
+ gen8_pml4e_encode(vm->dev, pdp->daddr,
+ I915_CACHE_LLC);
+ page_maplevel4[i] = entry;
+ }
+
+ if (!HAS_LLC(vm->dev))
+ drm_clflush_virt_range(page_maplevel4, PAGE_SIZE);
+
+ kunmap_atomic(page_maplevel4);
+}
+
static void pml4_fini(struct i915_pml4 *pml4)
{
struct i915_hw_ppgtt *ppgtt =
@@ -795,10 +844,12 @@ static int pml4_init(struct i915_hw_ppgtt *ppgtt)
{
struct i915_pml4 *pml4 = &ppgtt->pml4;
- pml4->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ pml4->page = alloc_page(GFP_KERNEL | GFP_DMA32);
if (!pml4->page)
return -ENOMEM;
+ gen8_initialize_pml4(&ppgtt->base, pml4);
+
i915_dma_map_single(pml4, ppgtt->base.dev);
return 0;
@@ -913,6 +964,7 @@ static void gen8_ppgtt_cleanup_4lvl(struct i915_hw_ppgtt *ppgtt)
}
pml4_fini(&ppgtt->pml4);
+ unmap_and_free_pdp(ppgtt->scratch_pdp, ppgtt->base.dev);
}
static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
@@ -1220,12 +1272,12 @@ static int __gen8_alloc_vma_range_4lvl(struct i915_address_space *vm,
* and 4 level code. Just allocate the pdps.
*/
gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
- if (!pdp) {
- WARN_ON(test_bit(pml4e, pml4->used_pml4es));
+ if (!test_bit(pml4e, pml4->used_pml4es)) {
pdp = alloc_pdp_single(ppgtt);
if (IS_ERR(pdp))
goto err_out;
+ gen8_initialize_pdp(vm, pdp);
pml4->pdps[pml4e] = pdp;
set_bit(pml4e, new_pdps);
trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, pml4e,
@@ -1301,9 +1353,17 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
gen8_initialize_pd(&ppgtt->base, ppgtt->scratch_pd);
if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+ ppgtt->scratch_pdp = alloc_pdp_single(ppgtt);
+ if (IS_ERR(ppgtt->scratch_pdp)) {
+ ret = PTR_ERR(ppgtt->scratch_pdp);
+ goto err_out;
+ }
+
+ gen8_initialize_pdp(&ppgtt->base, ppgtt->scratch_pdp);
+
ret = pml4_init(ppgtt);
if (ret)
- goto err_out;
+ goto err_pdp_out;
ppgtt->base.total = 1ULL << 48;
ppgtt->switch_mm = gen8_48b_mm_switch;
@@ -1327,6 +1387,8 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
return 0;
+err_pdp_out:
+ unmap_and_free_pdp(ppgtt->scratch_pdp, ppgtt->base.dev);
err_out:
unmap_and_free_pd(ppgtt->scratch_pd, ppgtt->base.dev);
unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 9af33b2..6e2f39c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -358,6 +358,7 @@ struct i915_hw_ppgtt {
struct i915_page_table *scratch_pt;
struct i915_page_directory *scratch_pd;
+ struct i915_page_directory_pointer *scratch_pdp;
struct drm_i915_file_private *file_priv;
--
2.4.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 11/16] drm/i915: Expand error state's address width to 64b
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
` (9 preceding siblings ...)
2015-05-26 14:21 ` [PATCH 10/16] drm/i915/gen8: Initialize PDPs Michel Thierry
@ 2015-05-26 14:21 ` Michel Thierry
2015-05-26 14:21 ` [PATCH 12/16] drm/i915/gen8: Add ppgtt info and debug_dump Michel Thierry
` (5 subsequent siblings)
16 siblings, 0 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 4 ++--
drivers/gpu/drm/i915/i915_gpu_error.c | 17 +++++++++--------
2 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9eea844..32493f0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -507,7 +507,7 @@ struct drm_i915_error_state {
struct drm_i915_error_object {
int page_count;
- u32 gtt_offset;
+ u64 gtt_offset;
u32 *pages[0];
} *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
@@ -533,7 +533,7 @@ struct drm_i915_error_state {
u32 size;
u32 name;
u32 rseqno[I915_NUM_RINGS], wseqno;
- u32 gtt_offset;
+ u64 gtt_offset;
u32 read_domains;
u32 write_domain;
s32 fence_reg:I915_MAX_NUM_FENCE_BITS;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 6f42569..cdbd4c2 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -197,7 +197,7 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
err_printf(m, " %s [%d]:\n", name, count);
while (count--) {
- err_printf(m, " %08x %8u %02x %02x [ ",
+ err_printf(m, " %016llx %8u %02x %02x [ ",
err->gtt_offset,
err->size,
err->read_domains,
@@ -426,7 +426,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
err_printf(m, " (submitted by %s [%d])",
error->ring[i].comm,
error->ring[i].pid);
- err_printf(m, " --- gtt_offset = 0x%08x\n",
+ err_printf(m, " --- gtt_offset = 0x%016llx\n",
obj->gtt_offset);
print_error_obj(m, obj);
}
@@ -434,7 +434,8 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
obj = error->ring[i].wa_batchbuffer;
if (obj) {
err_printf(m, "%s (w/a) --- gtt_offset = 0x%08x\n",
- dev_priv->ring[i].name, obj->gtt_offset);
+ dev_priv->ring[i].name,
+ lower_32_bits(obj->gtt_offset));
print_error_obj(m, obj);
}
@@ -453,14 +454,14 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
if ((obj = error->ring[i].ringbuffer)) {
err_printf(m, "%s --- ringbuffer = 0x%08x\n",
dev_priv->ring[i].name,
- obj->gtt_offset);
+ lower_32_bits(obj->gtt_offset));
print_error_obj(m, obj);
}
if ((obj = error->ring[i].hws_page)) {
err_printf(m, "%s --- HW Status = 0x%08x\n",
dev_priv->ring[i].name,
- obj->gtt_offset);
+ lower_32_bits(obj->gtt_offset));
offset = 0;
for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
err_printf(m, "[%04x] %08x %08x %08x %08x\n",
@@ -476,13 +477,13 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
if ((obj = error->ring[i].ctx)) {
err_printf(m, "%s --- HW Context = 0x%08x\n",
dev_priv->ring[i].name,
- obj->gtt_offset);
+ lower_32_bits(obj->gtt_offset));
print_error_obj(m, obj);
}
}
if ((obj = error->semaphore_obj)) {
- err_printf(m, "Semaphore page = 0x%08x\n", obj->gtt_offset);
+ err_printf(m, "Semaphore page = 0x%016llx\n", obj->gtt_offset);
for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
err_printf(m, "[%04x] %08x %08x %08x %08x\n",
elt * 4,
@@ -590,7 +591,7 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
int num_pages;
bool use_ggtt;
int i = 0;
- u32 reloc_offset;
+ u64 reloc_offset;
if (src == NULL || src->pages == NULL)
return NULL;
--
2.4.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 12/16] drm/i915/gen8: Add ppgtt info and debug_dump
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
` (10 preceding siblings ...)
2015-05-26 14:21 ` [PATCH 11/16] drm/i915: Expand error state's address width to 64b Michel Thierry
@ 2015-05-26 14:21 ` Michel Thierry
2015-05-26 14:21 ` [PATCH 13/16] drm/i915: object size needs to be u64 Michel Thierry
` (4 subsequent siblings)
16 siblings, 0 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
v2: Clean up patch after rebases.
v3: gen8_dump_ppgtt for 32b and 48b PPGTT.
v4: Use used_pml4es/pdpes (Akash).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
---
drivers/gpu/drm/i915/i915_debugfs.c | 18 ++++----
drivers/gpu/drm/i915/i915_gem_gtt.c | 88 +++++++++++++++++++++++++++++++++++++
2 files changed, 98 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index fece922..c24f506 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2227,7 +2227,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
{
struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_engine_cs *ring;
- struct drm_file *file;
int i;
if (INTEL_INFO(dev)->gen == 6)
@@ -2250,13 +2249,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
ppgtt->debug_dump(ppgtt, m);
}
- list_for_each_entry_reverse(file, &dev->filelist, lhead) {
- struct drm_i915_file_private *file_priv = file->driver_priv;
-
- seq_printf(m, "proc: %s\n",
- get_pid_task(file->pid, PIDTYPE_PID)->comm);
- idr_for_each(&file_priv->context_idr, per_file_ctx, m);
- }
seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
}
@@ -2265,6 +2257,7 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
struct drm_info_node *node = m->private;
struct drm_device *dev = node->minor->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
+ struct drm_file *file;
int ret = mutex_lock_interruptible(&dev->struct_mutex);
if (ret)
@@ -2276,6 +2269,15 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
else if (INTEL_INFO(dev)->gen >= 6)
gen6_ppgtt_info(m, dev);
+ list_for_each_entry_reverse(file, &dev->filelist, lhead) {
+ struct drm_i915_file_private *file_priv = file->driver_priv;
+
+ seq_printf(m, "\nproc: %s\n",
+ get_pid_task(file->pid, PIDTYPE_PID)->comm);
+ idr_for_each(&file_priv->context_idr, per_file_ctx,
+ (void *)(unsigned long)m);
+ }
+
intel_runtime_pm_put(dev_priv);
mutex_unlock(&dev->struct_mutex);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index df37c84..b57695e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -365,6 +365,93 @@ static void unmap_and_free_pt(struct i915_page_table *pt,
kfree(pt);
}
+static void gen8_dump_pdp(struct i915_page_directory_pointer *pdp,
+ uint64_t start, uint64_t length,
+ gen8_pte_t scratch_pte,
+ struct seq_file *m)
+{
+ struct i915_page_directory *pd;
+ uint64_t temp;
+ uint32_t pdpe;
+
+ gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
+ struct i915_page_table *pt;
+ uint64_t pd_len = length;
+ uint64_t pd_start = start;
+ uint32_t pde;
+
+ if (!pd)
+ continue;
+
+ if(!test_bit(pdpe, pdp->used_pdpes))
+ continue;
+
+ seq_printf(m, "\tPDPE #%d\n", pdpe);
+ gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+ uint32_t pte;
+ gen8_pte_t *pt_vaddr;
+
+ if (!pt)
+ continue;
+
+ pt_vaddr = kmap_atomic(pt->page);
+ for (pte = 0; pte < GEN8_PTES; pte+=4) {
+ uint64_t va =
+ (pdpe << GEN8_PDPE_SHIFT) |
+ (pde << GEN8_PDE_SHIFT) |
+ (pte << GEN8_PTE_SHIFT);
+ int i;
+ bool found = false;
+ for (i = 0; i < 4; i++)
+ if (pt_vaddr[pte + i] != scratch_pte)
+ found = true;
+ if (!found)
+ continue;
+
+ seq_printf(m, "\t\t0x%llx [%03d,%03d,%04d]: =", va, pdpe, pde, pte);
+ for (i = 0; i < 4; i++) {
+ if (pt_vaddr[pte + i] != scratch_pte)
+ seq_printf(m, " %llx", pt_vaddr[pte + i]);
+ else
+ seq_puts(m, " SCRATCH ");
+ }
+ seq_puts(m, "\n");
+ }
+ kunmap_atomic(pt_vaddr);
+ }
+ }
+}
+
+static void gen8_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
+{
+ uint64_t start = ppgtt->base.start;
+ uint64_t length = ppgtt->base.total;
+ gen8_pte_t scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
+ I915_CACHE_LLC, true);
+
+ if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+ gen8_dump_pdp(&ppgtt->pdp, start, length, scratch_pte, m);
+ } else {
+ uint64_t templ4, pml4e;
+ struct i915_pml4 *pml4 = &ppgtt->pml4;
+ struct i915_page_directory_pointer *pdp;
+
+ gen8_for_each_pml4e(pdp, pml4, start, length, templ4, pml4e) {
+ uint64_t pdp_len = length;
+ uint64_t pdp_start = start;
+
+ if (!pdp)
+ continue;
+
+ if (!test_bit(pml4e, pml4->used_pml4es))
+ continue;
+
+ seq_printf(m, " PML4E #%llu\n", pml4e);
+ gen8_dump_pdp(pdp, pdp_start, pdp_len, scratch_pte, m);
+ }
+ }
+}
+
static struct i915_page_table *alloc_pt_single(struct drm_device *dev)
{
struct i915_page_table *pt;
@@ -1384,6 +1471,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->base.clear_range = gen8_ppgtt_clear_range;
ppgtt->base.unbind_vma = ppgtt_unbind_vma;
ppgtt->base.bind_vma = ppgtt_bind_vma;
+ ppgtt->debug_dump = gen8_dump_ppgtt;
return 0;
--
2.4.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 13/16] drm/i915: object size needs to be u64
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
` (11 preceding siblings ...)
2015-05-26 14:21 ` [PATCH 12/16] drm/i915/gen8: Add ppgtt info and debug_dump Michel Thierry
@ 2015-05-26 14:21 ` Michel Thierry
2015-05-26 14:21 ` [PATCH 14/16] drm/i915: Check against correct user_size limit in 48b ppgtt mode Michel Thierry
` (3 subsequent siblings)
16 siblings, 0 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
In a 48b world, users can try to allocate buffers bigger than 4GB; in
these cases it is important that size is a 64b variable.
Also added a warning for illegal bind with size = 0.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 5 +++--
drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +++
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cc206f1..acd928d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3670,7 +3670,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
{
struct drm_device *dev = obj->base.dev;
struct drm_i915_private *dev_priv = dev->dev_private;
- u32 size, fence_size, fence_alignment, unfenced_alignment;
+ u32 fence_alignment, unfenced_alignment;
+ u64 size, fence_size;
unsigned long start =
flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
unsigned long end =
@@ -3729,7 +3730,7 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
* attempt to find space.
*/
if (size > end) {
- DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%u > %s aperture=%lu\n",
+ DRM_DEBUG("Attempting to bind an object (view type=%u) larger than the aperture: size=%llu > %s aperture=%lu\n",
ggtt_view ? ggtt_view->type : 0,
size,
flags & PIN_MAPPABLE ? "mappable" : "total",
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index b57695e..7e8699f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3367,6 +3367,9 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
if (WARN_ON(flags == 0))
return -EINVAL;
+ if (WARN_ON(vma->node.size == 0))
+ return -EINVAL;
+
bind_flags = 0;
if (flags & PIN_GLOBAL)
bind_flags |= GLOBAL_BIND;
--
2.4.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 14/16] drm/i915: Check against correct user_size limit in 48b ppgtt mode
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
` (12 preceding siblings ...)
2015-05-26 14:21 ` [PATCH 13/16] drm/i915: object size needs to be u64 Michel Thierry
@ 2015-05-26 14:21 ` Michel Thierry
2015-05-26 14:21 ` [PATCH 15/16] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
` (2 subsequent siblings)
16 siblings, 0 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
GTT is only 32b and its max value is 4GB. In order to allow objects
bigger than 4GB in 48b PPGTT, i915_gem_userptr_ioctl needs to check
against max 48b range (1ULL << 48).
Whenever possible, read the PPGTT's total instead of the GTT one, this
will be accurate in 32 and 48 bit modes.
v2: Use the default ctx to infer the ppgtt max size (Akash).
Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_userptr.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 1f4e5a3..9783415 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -789,8 +789,10 @@ int
i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
{
struct drm_i915_private *dev_priv = dev->dev_private;
+ struct drm_i915_file_private *file_priv = file->driver_priv;
struct drm_i915_gem_userptr *args = data;
struct drm_i915_gem_object *obj;
+ struct intel_context *ctx;
int ret;
u32 handle;
@@ -801,8 +803,14 @@ i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file *file
if (offset_in_page(args->user_ptr | args->user_size))
return -EINVAL;
- if (args->user_size > dev_priv->gtt.base.total)
- return -E2BIG;
+ ctx = i915_gem_context_get(file_priv, DEFAULT_CONTEXT_HANDLE);
+ if (ctx->ppgtt) {
+ if (args->user_size > ctx->ppgtt->base.total)
+ return -E2BIG;
+ } else {
+ if (args->user_size > dev_priv->gtt.base.total)
+ return -E2BIG;
+ }
if (!access_ok(args->flags & I915_USERPTR_READ_ONLY ? VERIFY_READ : VERIFY_WRITE,
(char __user *)(unsigned long)args->user_ptr, args->user_size))
--
2.4.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 15/16] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
` (13 preceding siblings ...)
2015-05-26 14:21 ` [PATCH 14/16] drm/i915: Check against correct user_size limit in 48b ppgtt mode Michel Thierry
@ 2015-05-26 14:21 ` Michel Thierry
2015-05-26 15:25 ` Daniel Vetter
2015-05-26 14:21 ` [PATCH 16/16] drm/i915/gen8: Flip the 48b switch Michel Thierry
2015-05-26 14:21 ` [PATCH] tests/gem_ppgtt: Check Wa32bitOffsets workarounds Michel Thierry
16 siblings, 1 reply; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
There are some allocations that must be only referenced by 32bit
offsets. To limit the chances of having the first 4GB already full,
objects not requiring this workaround don't use the first 2 PDPs.
User must pass EXEC_OBJECT_NEEDS_32BADDRESS flag to indicate it needs a
32b address.
The flag is ignored in 32b PPGTT.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 1 +
drivers/gpu/drm/i915/i915_gem.c | 11 +++++++++++
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 +++
include/uapi/drm/i915_drm.h | 3 ++-
4 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 32493f0..a06f19c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2714,6 +2714,7 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
#define PIN_OFFSET_BIAS (1<<3)
#define PIN_USER (1<<4)
#define PIN_UPDATE (1<<5)
+#define PIN_FULL_RANGE (1<<6)
#define PIN_OFFSET_MASK (~4095)
int __must_check
i915_gem_object_pin(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index acd928d..a133b7d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3713,6 +3713,17 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
obj->tiling_mode,
false);
size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
+
+ /* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
+ * limit address to 4GB-1 for objects requiring this wa; for
+ * others, start on the 2nd PDP.
+ */
+ if (USES_FULL_48BIT_PPGTT(dev)) {
+ if (flags & PIN_FULL_RANGE)
+ start += (2ULL << GEN8_PDPE_SHIFT);
+ else
+ end = ((4ULL << GEN8_PDPE_SHIFT) - 1);
+ }
}
if (alignment == 0)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index bd0e4bd..3de7f0f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -588,6 +588,9 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
flags |= PIN_GLOBAL;
+ if (!(entry->flags & EXEC_OBJECT_NEEDS_32BADDRESS))
+ flags |= PIN_FULL_RANGE;
+
if (!drm_mm_node_allocated(&vma->node)) {
if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
flags |= PIN_GLOBAL | PIN_MAPPABLE;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 4851d66..ebdf6dd 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -680,7 +680,8 @@ struct drm_i915_gem_exec_object2 {
#define EXEC_OBJECT_NEEDS_FENCE (1<<0)
#define EXEC_OBJECT_NEEDS_GTT (1<<1)
#define EXEC_OBJECT_WRITE (1<<2)
-#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_WRITE<<1)
+#define EXEC_OBJECT_NEEDS_32BADDRESS (1<<3)
+#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_NEEDS_32BADDRESS<<1)
__u64 flags;
__u64 rsvd1;
--
2.4.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 16/16] drm/i915/gen8: Flip the 48b switch
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
` (14 preceding siblings ...)
2015-05-26 14:21 ` [PATCH 15/16] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
@ 2015-05-26 14:21 ` Michel Thierry
2015-05-26 14:21 ` [PATCH] tests/gem_ppgtt: Check Wa32bitOffsets workarounds Michel Thierry
16 siblings, 0 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
Use 48b addresses if hw supports it and i915.enable_ppgtt=3.
Note, aliasing PPGTT remains 32b only.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 5 +----
drivers/gpu/drm/i915/i915_params.c | 2 +-
2 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7e8699f..75d0e4c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -111,7 +111,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int enable_ppgtt)
#ifdef CONFIG_64BIT
has_full_64bit_ppgtt = (IS_BROADWELL(dev) ||
- INTEL_INFO(dev)->gen >= 9) && false; /* FIXME: 64b */
+ INTEL_INFO(dev)->gen >= 9);
#else
has_full_64bit_ppgtt = false;
#endif
@@ -1164,9 +1164,6 @@ gen8_ppgtt_alloc_page_directories(struct i915_address_space *vm,
WARN_ON(!bitmap_empty(new_pds, pdpes));
- /* FIXME: upper bound must not overflow 32 bits */
- WARN_ON((start + length) > (1ULL << 32));
-
gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
if (pd)
continue;
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 8ac5a1b..743eefa 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -116,7 +116,7 @@ MODULE_PARM_DESC(enable_hangcheck,
module_param_named_unsafe(enable_ppgtt, i915.enable_ppgtt, int, 0400);
MODULE_PARM_DESC(enable_ppgtt,
"Override PPGTT usage. "
- "(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
+ "(-1=auto [default], 0=disabled, 1=aliasing, 2=full, 3=full_64b)");
module_param_named(enable_execlists, i915.enable_execlists, int, 0400);
MODULE_PARM_DESC(enable_execlists,
--
2.4.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH] tests/gem_ppgtt: Check Wa32bitOffsets workarounds
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
` (15 preceding siblings ...)
2015-05-26 14:21 ` [PATCH 16/16] drm/i915/gen8: Flip the 48b switch Michel Thierry
@ 2015-05-26 14:21 ` Michel Thierry
16 siblings, 0 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 14:21 UTC (permalink / raw)
To: intel-gfx; +Cc: akash.goel
Test EXEC_OBJECT_NEEDS_32BADDRESS flag to use reserved 32b segment.
Driver will try to use lower PDPs of each PPGTT for the objects
requiring Wa32bitGeneralStateOffset or Wa32bitInstructionBaseOffset.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
tests/gem_ppgtt.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 85 insertions(+), 5 deletions(-)
diff --git a/tests/gem_ppgtt.c b/tests/gem_ppgtt.c
index d1e484a..d70dcf9 100644
--- a/tests/gem_ppgtt.c
+++ b/tests/gem_ppgtt.c
@@ -48,7 +48,15 @@
#define HEIGHT 512
#define SIZE (HEIGHT*STRIDE)
-static bool uses_full_ppgtt(int fd)
+#define EXEC_OBJECT_NEEDS_32BADDR (1<<3)
+
+/*
+ * 0 - No PPGTT
+ * 1 - Aliasing PPGTT
+ * 2 - Full PPGTT (32b)
+ * 3 - Full PPGTT (48b)
+ */
+static bool __uses_full_ppgtt(int fd, int min)
{
struct drm_i915_getparam gp;
int val = 0;
@@ -61,7 +69,17 @@ static bool uses_full_ppgtt(int fd)
return 0;
errno = 0;
- return val > 1;
+ return val >= min;
+}
+
+static bool uses_full_ppgtt(int fd)
+{
+ return __uses_full_ppgtt(fd, 2);
+}
+
+static bool uses_48b_full_ppgtt(int fd)
+{
+ return __uses_full_ppgtt(fd, 3);
}
static drm_intel_bo *create_bo(drm_intel_bufmgr *bufmgr,
@@ -216,7 +234,7 @@ static void surfaces_check(dri_bo **bo, int count, uint32_t expected)
}
}
-static uint64_t exec_and_get_offset(int fd, uint32_t batch)
+static uint64_t exec_and_get_offset(int fd, uint32_t batch, bool needs_32b_addr)
{
struct drm_i915_gem_execbuffer2 execbuf;
struct drm_i915_gem_exec_object2 exec[1];
@@ -226,6 +244,7 @@ static uint64_t exec_and_get_offset(int fd, uint32_t batch)
memset(exec, 0, sizeof(exec));
exec[0].handle = batch;
+ exec[0].flags = (needs_32b_addr) ? EXEC_OBJECT_NEEDS_32BADDR : 0;
memset(&execbuf, 0, sizeof(execbuf));
execbuf.buffers_ptr = (uintptr_t)exec;
@@ -252,7 +271,7 @@ static void flink_and_close(void)
fd2 = drm_open_any();
flinked_bo = gem_open(fd2, name);
- offset = exec_and_get_offset(fd2, flinked_bo);
+ offset = exec_and_get_offset(fd2, flinked_bo, 0);
gem_sync(fd2, flinked_bo);
gem_close(fd2, flinked_bo);
@@ -260,7 +279,7 @@ static void flink_and_close(void)
* same size should get the same offset
*/
new_bo = gem_create(fd2, 4096);
- offset_new = exec_and_get_offset(fd2, new_bo);
+ offset_new = exec_and_get_offset(fd2, new_bo, 0);
gem_close(fd2, new_bo);
igt_assert_eq(offset, offset_new);
@@ -270,6 +289,64 @@ static void flink_and_close(void)
close(fd2);
}
+static void createbo_and_compare_offsets(uint32_t fd, uint32_t fd2,
+ bool needs_32b, bool needs_32b2)
+{
+ uint32_t bo, bo2;
+ uint64_t offset, offset2;
+
+ bo = gem_create(fd, 4096);
+ offset = exec_and_get_offset(fd, bo, needs_32b);
+ gem_sync(fd, bo);
+
+ bo2 = gem_create(fd2, 4096);
+ offset2 = exec_and_get_offset(fd2, bo2, needs_32b2);
+ gem_sync(fd2, bo2);
+
+ if (needs_32b == needs_32b2)
+ igt_assert_eq(offset, offset2);
+ else
+ igt_assert_neq(offset, offset2);
+
+
+ /* lower PDPs of each PPGTT are reserved for the objects
+ * requiring this workaround
+ */
+ if (needs_32b)
+ igt_assert(offset < (1ULL << 32));
+
+ if (needs_32b2)
+ igt_assert(offset2 < (1ULL << 32));
+
+ gem_close(fd, bo);
+ gem_close(fd2, bo2);
+}
+
+
+static void wa_32b_offset_test(void)
+{
+ uint32_t fd, fd2;
+
+ fd = drm_open_any();
+ igt_require(uses_48b_full_ppgtt(fd));
+
+ fd2 = drm_open_any();
+
+ /* allow full addr range */
+ createbo_and_compare_offsets(fd, fd2, 0, 0);
+
+ /* limit 32b addr range */
+ createbo_and_compare_offsets(fd, fd2, 1, 1);
+
+ /* mixed */
+ createbo_and_compare_offsets(fd, fd2, 0, 1);
+ createbo_and_compare_offsets(fd, fd2, 1, 0);
+
+ close(fd);
+ close(fd2);
+}
+
+
#define N_CHILD 8
int main(int argc, char **argv)
{
@@ -302,5 +379,8 @@ int main(int argc, char **argv)
igt_subtest("flink-and-close-vma-leak")
flink_and_close();
+ igt_subtest("wa-32b-offset-test")
+ wa_32b_offset_test();
+
igt_exit();
}
--
2.3.6
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 09/16] drm/i915/gen8: Add 4 level support in insert_entries and clear_range
2015-05-26 14:21 ` [PATCH 09/16] drm/i915/gen8: Add 4 level support in insert_entries and clear_range Michel Thierry
@ 2015-05-26 15:10 ` Michel Thierry
0 siblings, 0 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 15:10 UTC (permalink / raw)
To: intel-gfx@lists.freedesktop.org; +Cc: Goel, Akash
On 5/26/2015 3:21 PM, Michel Thierry wrote:
> When 48b is enabled, gen8_ppgtt_insert_entries needs to read the Page Map
> Level 4 (PML4), before it selects which Page Directory Pointer (PDP)
> it will write to.
>
> Similarly, gen8_ppgtt_clear_range needs to get the correct PDP/PD range.
>
> This patch was inspired by Ben's "Depend exclusively on map and
> unmap_vma".
>
> v2: Rebase after s/page_tables/page_table/.
> v3: Remove unnecessary pdpe loop in gen8_ppgtt_clear_range_4lvl and use
> clamp_pdp in gen8_ppgtt_insert_entries (Akash).
> v4: Merge gen8_ppgtt_clear_range_4lvl into gen8_ppgtt_clear_range to
> maintain symmetry with gen8_ppgtt_insert_entries (Akash).
> v5: Do not mix pages and bytes in insert_entries (Akash).
>
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_gtt.c | 51 +++++++++++++++++++++++++++++++------
> drivers/gpu/drm/i915/i915_gem_gtt.h | 11 ++++++++
> 2 files changed, 54 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 2b6ee8e..dbbf367 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -710,12 +725,31 @@ static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
> enum i915_cache_level cache_level,
> u32 unused)
> {
> - struct i915_hw_ppgtt *ppgtt = container_of(vm, struct i915_hw_ppgtt, base);
> - struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
> + struct i915_hw_ppgtt *ppgtt =
> + container_of(vm, struct i915_hw_ppgtt, base);
> struct sg_page_iter sg_iter;
>
> __sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
> - gen8_ppgtt_insert_pte_entries(pdp, &sg_iter, start, cache_level, !HAS_LLC(vm->dev));
> +
> + if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
> + gen8_ppgtt_insert_pte_entries(&ppgtt->pdp, &sg_iter, start,
> + sg_nents(pages->sgl),
> + cache_level, !HAS_LLC(vm->dev));
> + } else {
> + struct i915_page_directory_pointer *pdp;
> + uint64_t templ4, pml4e;
> + uint64_t length = sg_nents(pages->sgl) << PAGE_SHIFT;
Actually, this should be:
uint64_t length = (uint64_t)sg_nents(pages->sgl) << PAGE_SHIFT;
Otherwise it will overflow if we're inserting 4GB at once.
> +
> + gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, pml4e) {
> + uint64_t pdp_len = gen8_clamp_pdp(start, length) >> PAGE_SHIFT;
> + uint64_t pdp_start = start;
> +
> + gen8_ppgtt_insert_pte_entries(pdp, &sg_iter,
> + pdp_start, pdp_len,
> + cache_level,
> + !HAS_LLC(vm->dev));
> + }
> + }
> }
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 15/16] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
2015-05-26 14:21 ` [PATCH 15/16] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
@ 2015-05-26 15:25 ` Daniel Vetter
2015-05-26 16:56 ` Michel Thierry
2015-05-26 20:16 ` Chris Wilson
0 siblings, 2 replies; 23+ messages in thread
From: Daniel Vetter @ 2015-05-26 15:25 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx, akash.goel
On Tue, May 26, 2015 at 03:21:22PM +0100, Michel Thierry wrote:
> There are some allocations that must be only referenced by 32bit
> offsets. To limit the chances of having the first 4GB already full,
> objects not requiring this workaround don't use the first 2 PDPs.
>
> User must pass EXEC_OBJECT_NEEDS_32BADDRESS flag to indicate it needs a
> 32b address.
>
> The flag is ignored in 32b PPGTT.
>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 1 +
> drivers/gpu/drm/i915/i915_gem.c | 11 +++++++++++
> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 +++
> include/uapi/drm/i915_drm.h | 3 ++-
> 4 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 32493f0..a06f19c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2714,6 +2714,7 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
> #define PIN_OFFSET_BIAS (1<<3)
> #define PIN_USER (1<<4)
> #define PIN_UPDATE (1<<5)
> +#define PIN_FULL_RANGE (1<<6)
> #define PIN_OFFSET_MASK (~4095)
> int __must_check
> i915_gem_object_pin(struct drm_i915_gem_object *obj,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index acd928d..a133b7d 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3713,6 +3713,17 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
> obj->tiling_mode,
> false);
> size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
> +
> + /* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
> + * limit address to 4GB-1 for objects requiring this wa; for
> + * others, start on the 2nd PDP.
> + */
> + if (USES_FULL_48BIT_PPGTT(dev)) {
> + if (flags & PIN_FULL_RANGE)
> + start += (2ULL << GEN8_PDPE_SHIFT);
> + else
> + end = ((4ULL << GEN8_PDPE_SHIFT) - 1);
> + }
> }
>
> if (alignment == 0)
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index bd0e4bd..3de7f0f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -588,6 +588,9 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
> if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
> flags |= PIN_GLOBAL;
>
> + if (!(entry->flags & EXEC_OBJECT_NEEDS_32BADDRESS))
> + flags |= PIN_FULL_RANGE;
> +
> if (!drm_mm_node_allocated(&vma->node)) {
> if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
> flags |= PIN_GLOBAL | PIN_MAPPABLE;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 4851d66..ebdf6dd 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -680,7 +680,8 @@ struct drm_i915_gem_exec_object2 {
> #define EXEC_OBJECT_NEEDS_FENCE (1<<0)
> #define EXEC_OBJECT_NEEDS_GTT (1<<1)
> #define EXEC_OBJECT_WRITE (1<<2)
> -#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_WRITE<<1)
> +#define EXEC_OBJECT_NEEDS_32BADDRESS (1<<3)
> +#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_NEEDS_32BADDRESS<<1)
This is the wrong way round for existing userspace (and remember, bdw is
shipping already so we can't opt out of that). Instead the w/a needs to
allow 32bit+ addresses only if the new bit is set.
Also this needs the usual pile of userspace enabling (libdrm+mesa).
-Daniel
> __u64 flags;
>
> __u64 rsvd1;
> --
> 2.4.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 15/16] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
2015-05-26 15:25 ` Daniel Vetter
@ 2015-05-26 16:56 ` Michel Thierry
2015-05-26 20:16 ` Chris Wilson
1 sibling, 0 replies; 23+ messages in thread
From: Michel Thierry @ 2015-05-26 16:56 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx, akash.goel
On 5/26/2015 4:25 PM, Daniel Vetter wrote:
> On Tue, May 26, 2015 at 03:21:22PM +0100, Michel Thierry wrote:
>> There are some allocations that must be only referenced by 32bit
>> offsets. To limit the chances of having the first 4GB already full,
>> objects not requiring this workaround don't use the first 2 PDPs.
>>
>> User must pass EXEC_OBJECT_NEEDS_32BADDRESS flag to indicate it needs a
>> 32b address.
>>
>> The flag is ignored in 32b PPGTT.
>>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_drv.h | 1 +
>> drivers/gpu/drm/i915/i915_gem.c | 11 +++++++++++
>> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 +++
>> include/uapi/drm/i915_drm.h | 3 ++-
>> 4 files changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 32493f0..a06f19c 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2714,6 +2714,7 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
>> #define PIN_OFFSET_BIAS (1<<3)
>> #define PIN_USER (1<<4)
>> #define PIN_UPDATE (1<<5)
>> +#define PIN_FULL_RANGE (1<<6)
>> #define PIN_OFFSET_MASK (~4095)
>> int __must_check
>> i915_gem_object_pin(struct drm_i915_gem_object *obj,
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index acd928d..a133b7d 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -3713,6 +3713,17 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
>> obj->tiling_mode,
>> false);
>> size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
>> +
>> + /* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
>> + * limit address to 4GB-1 for objects requiring this wa; for
>> + * others, start on the 2nd PDP.
>> + */
>> + if (USES_FULL_48BIT_PPGTT(dev)) {
>> + if (flags & PIN_FULL_RANGE)
>> + start += (2ULL << GEN8_PDPE_SHIFT);
>> + else
>> + end = ((4ULL << GEN8_PDPE_SHIFT) - 1);
>> + }
>> }
>>
>> if (alignment == 0)
>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> index bd0e4bd..3de7f0f 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> @@ -588,6 +588,9 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
>> if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
>> flags |= PIN_GLOBAL;
>>
>> + if (!(entry->flags & EXEC_OBJECT_NEEDS_32BADDRESS))
>> + flags |= PIN_FULL_RANGE;
>> +
>> if (!drm_mm_node_allocated(&vma->node)) {
>> if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
>> flags |= PIN_GLOBAL | PIN_MAPPABLE;
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 4851d66..ebdf6dd 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -680,7 +680,8 @@ struct drm_i915_gem_exec_object2 {
>> #define EXEC_OBJECT_NEEDS_FENCE (1<<0)
>> #define EXEC_OBJECT_NEEDS_GTT (1<<1)
>> #define EXEC_OBJECT_WRITE (1<<2)
>> -#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_WRITE<<1)
>> +#define EXEC_OBJECT_NEEDS_32BADDRESS (1<<3)
>> +#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_NEEDS_32BADDRESS<<1)
> This is the wrong way round for existing userspace (and remember, bdw is
> shipping already so we can't opt out of that). Instead the w/a needs to
> allow 32bit+ addresses only if the new bit is set.
>
> Also this needs the usual pile of userspace enabling (libdrm+mesa).
> -Daniel
Hi,
Ok, I'll change it to something like EXEC_OBJECT_SUPPORTS_48BADDRESS.
--Michel
>> __u64 flags;
>>
>> __u64 rsvd1;
>> --
>> 2.4.0
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 15/16] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
2015-05-26 15:25 ` Daniel Vetter
2015-05-26 16:56 ` Michel Thierry
@ 2015-05-26 20:16 ` Chris Wilson
2015-05-27 12:02 ` Daniel Vetter
1 sibling, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2015-05-26 20:16 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx, akash.goel
On Tue, May 26, 2015 at 05:25:48PM +0200, Daniel Vetter wrote:
> On Tue, May 26, 2015 at 03:21:22PM +0100, Michel Thierry wrote:
> > There are some allocations that must be only referenced by 32bit
> > offsets.
> > To limit the chances of having the first 4GB already full,
> > objects not requiring this workaround don't use the first 2 PDPs.
This is complete tosh. Please have a later patch that uses SEARCH_BELOW
for 48bit objects, and cite eviction rates and improvements.
> This is the wrong way round for existing userspace (and remember, bdw is
> shipping already so we can't opt out of that). Instead the w/a needs to
> allow 32bit+ addresses only if the new bit is set.
>
> Also this needs the usual pile of userspace enabling (libdrm+mesa).
Indeed, the code is very much broken as is. So I expect to see a
Testcase: igt/gem_exec_48bit in the next patch.
As a hint, consider reusing the object.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 15/16] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
2015-05-26 20:16 ` Chris Wilson
@ 2015-05-27 12:02 ` Daniel Vetter
0 siblings, 0 replies; 23+ messages in thread
From: Daniel Vetter @ 2015-05-27 12:02 UTC (permalink / raw)
To: Chris Wilson, Daniel Vetter, Michel Thierry, intel-gfx,
akash.goel
On Tue, May 26, 2015 at 09:16:11PM +0100, Chris Wilson wrote:
> On Tue, May 26, 2015 at 05:25:48PM +0200, Daniel Vetter wrote:
> > On Tue, May 26, 2015 at 03:21:22PM +0100, Michel Thierry wrote:
> > > There are some allocations that must be only referenced by 32bit
> > > offsets.
>
> > > To limit the chances of having the first 4GB already full,
> > > objects not requiring this workaround don't use the first 2 PDPs.
>
> This is complete tosh. Please have a later patch that uses SEARCH_BELOW
> for 48bit objects, and cite eviction rates and improvements.
Yeah didn't spot that at first, I agree that trying to segregate objects
upfront is premature optimization. There's very few state objects
(compard to textures), usually of differen size classes and different
lifetimes. And they should all be reused anyway for efficient, so after a
few rounds of execbuf things should settle in nicely.
Let's avoid a bit of complexity here when not yet proven to be required.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2015-05-27 12:00 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-26 14:21 [PATCH 00/16] 48b PPGTT Michel Thierry
2015-05-26 14:21 ` [PATCH 01/16] drm/i915: Remove unnecessary gen8_clamp_pd Michel Thierry
2015-05-26 14:21 ` [PATCH 02/16] drm/i915/gen8: Make pdp allocation more dynamic Michel Thierry
2015-05-26 14:21 ` [PATCH 03/16] drm/i915/gen8: Abstract PDP usage Michel Thierry
2015-05-26 14:21 ` [PATCH 04/16] drm/i915/gen8: Add dynamic page trace events Michel Thierry
2015-05-26 14:21 ` [PATCH 05/16] drm/i915/gen8: implement alloc/free for 4lvl Michel Thierry
2015-05-26 14:21 ` [PATCH 06/16] drm/i915/gen8: Add 4 level switching infrastructure and lrc support Michel Thierry
2015-05-26 14:21 ` [PATCH 07/16] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT Michel Thierry
2015-05-26 14:21 ` [PATCH 08/16] drm/i915: Plumb sg_iter through va allocation ->maps Michel Thierry
2015-05-26 14:21 ` [PATCH 09/16] drm/i915/gen8: Add 4 level support in insert_entries and clear_range Michel Thierry
2015-05-26 15:10 ` Michel Thierry
2015-05-26 14:21 ` [PATCH 10/16] drm/i915/gen8: Initialize PDPs Michel Thierry
2015-05-26 14:21 ` [PATCH 11/16] drm/i915: Expand error state's address width to 64b Michel Thierry
2015-05-26 14:21 ` [PATCH 12/16] drm/i915/gen8: Add ppgtt info and debug_dump Michel Thierry
2015-05-26 14:21 ` [PATCH 13/16] drm/i915: object size needs to be u64 Michel Thierry
2015-05-26 14:21 ` [PATCH 14/16] drm/i915: Check against correct user_size limit in 48b ppgtt mode Michel Thierry
2015-05-26 14:21 ` [PATCH 15/16] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset Michel Thierry
2015-05-26 15:25 ` Daniel Vetter
2015-05-26 16:56 ` Michel Thierry
2015-05-26 20:16 ` Chris Wilson
2015-05-27 12:02 ` Daniel Vetter
2015-05-26 14:21 ` [PATCH 16/16] drm/i915/gen8: Flip the 48b switch Michel Thierry
2015-05-26 14:21 ` [PATCH] tests/gem_ppgtt: Check Wa32bitOffsets workarounds Michel Thierry
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox