* [Intel-xe] [PATCH v2 00/14] small-bar support
@ 2023-02-28 10:41 Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 01/14] drm/xe/display: fix IS_ALDERLAKE_P() Matthew Auld
` (13 more replies)
0 siblings, 14 replies; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 10:41 UTC (permalink / raw)
To: intel-xe
Still needs Mesa changes to consume the new uAPI, however shouldn't be much work
since Iris/ANV already support it for i915. IGTs seems to be passing from local
testing. Most of the prep patches (and semi related fixes) can already be
landed.
IGT: https://gitlab.freedesktop.org/drm/xe/igt-gpu-tools/-/merge_requests/47
v2: Rebase
--
2.39.2
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Intel-xe] [PATCH v2 01/14] drm/xe/display: fix IS_ALDERLAKE_P()
2023-02-28 10:41 [Intel-xe] [PATCH v2 00/14] small-bar support Matthew Auld
@ 2023-02-28 10:41 ` Matthew Auld
2023-02-28 14:09 ` Maarten Lankhorst
` (2 more replies)
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 02/14] drm/xe/display: fix bo leak when unloading module Matthew Auld
` (12 subsequent siblings)
13 siblings, 3 replies; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 10:41 UTC (permalink / raw)
To: intel-xe; +Cc: Lucas De Marchi
On ADL-P we currently trigger:
xe 0000:00:02.0: drm_WARN_ON(!((dev_priv)->info.platform == XE_ALDERLAKE_S) && !(dev_priv && 0))
WARNING: CPU: 3 PID: 338 at drivers/gpu/drm/xe/display/ext/intel_pch.c:32 intel_pch_type+0xce/0x2f0 [xe]
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/display/i915_drv.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/display/i915_drv.h b/drivers/gpu/drm/xe/display/i915_drv.h
index da83a957dd86..06348cb03f44 100644
--- a/drivers/gpu/drm/xe/display/i915_drv.h
+++ b/drivers/gpu/drm/xe/display/i915_drv.h
@@ -73,7 +73,7 @@ static inline struct drm_i915_private *kdev_to_i915(struct device *kdev)
#define IS_ROCKETLAKE(dev_priv) (dev_priv && 0)
#define IS_DG1(dev_priv) IS_PLATFORM(dev_priv, XE_DG1)
#define IS_ALDERLAKE_S(dev_priv) IS_PLATFORM(dev_priv, XE_ALDERLAKE_S)
-#define IS_ALDERLAKE_P(dev_priv) (dev_priv && 0)
+#define IS_ALDERLAKE_P(dev_priv) IS_PLATFORM(dev_priv, XE_ALDERLAKE_P)
#define IS_XEHPSDV(dev_priv) (dev_priv && 0)
#define IS_DG2(dev_priv) IS_PLATFORM(dev_priv, XE_DG2)
#define IS_PONTEVECCHIO(dev_priv) IS_PLATFORM(dev_priv, XE_PVC)
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [Intel-xe] [PATCH v2 02/14] drm/xe/display: fix bo leak when unloading module
2023-02-28 10:41 [Intel-xe] [PATCH v2 00/14] small-bar support Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 01/14] drm/xe/display: fix IS_ALDERLAKE_P() Matthew Auld
@ 2023-02-28 10:41 ` Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 03/14] drm/xe: prefer xe_bo_create_pin_map() Matthew Auld
` (11 subsequent siblings)
13 siblings, 0 replies; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 10:41 UTC (permalink / raw)
To: intel-xe; +Cc: Lucas De Marchi, Maarten Lankhorst
One of the remaining issues affecting module unload (DG2 + display)
appears to be some kind of BO leak:
WARNING: CPU: 1 PID: 1198 at drivers/gpu/drm/drm_mm.c:999 drm_mm_takedown+0x2e/0xc0
WARNING: CPU: 1 PID: 1198 at drivers/gpu/drm/drm_buddy.c:174 drm_buddy_fini+0x5e/0x70 [drm_buddy]
Turning on more debugging points at:
[drm:drm_mm_takedown] *ERROR* node [00e10000 + 00e11000]: inserted at
drm_mm_insert_node_in_range+0x2c8/0x500
__xe_ggtt_insert_bo_at+0xb0/0x120 [xe]
xe_bo_create_locked_range+0x255/0x2d0 [xe]
xe_bo_create_pin_map_at+0x4d/0x170 [xe]
intel_alloc_initial_plane_obj.isra.0+0x175/0x340 [xe]
intel_crtc_initial_plane_config+0x7e/0x270 [xe]
intel_modeset_init_nogem+0x341/0x8b0 [xe]
xe_display_init_noaccel+0x1e/0x50 [xe]
xe_device_probe+0x1fa/0x2a0 [xe]
So looks like we are leaking the object ref for the initial fb. Looking
at the i915 code we also drop the driver reference to the underlying BO,
which appears to just leave the drm framebuffer holding the last ref. Do
the same for Xe.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/display/xe_plane_initial.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/xe/display/xe_plane_initial.c b/drivers/gpu/drm/xe/display/xe_plane_initial.c
index bd5d588e8fbf..a2ef0823ed02 100644
--- a/drivers/gpu/drm/xe/display/xe_plane_initial.c
+++ b/drivers/gpu/drm/xe/display/xe_plane_initial.c
@@ -230,6 +230,8 @@ intel_find_initial_plane_obj(struct intel_crtc *crtc,
atomic_or(plane->frontbuffer_bit, &to_intel_framebuffer(fb)->bits);
+ plane_config->vma = vma;
+
/*
* Flip to the newly created mapping ASAP, so we can re-use the
* first part of GGTT for WOPCM, prevent flickering, and prevent
@@ -263,6 +265,9 @@ static void plane_config_fini(struct intel_initial_plane_config *plane_config)
else
kfree(fb);
}
+
+ if (plane_config->vma)
+ drm_gem_object_put(&plane_config->vma->bo->ttm.base);
}
void intel_crtc_initial_plane_config(struct intel_crtc *crtc)
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [Intel-xe] [PATCH v2 03/14] drm/xe: prefer xe_bo_create_pin_map()
2023-02-28 10:41 [Intel-xe] [PATCH v2 00/14] small-bar support Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 01/14] drm/xe/display: fix IS_ALDERLAKE_P() Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 02/14] drm/xe/display: fix bo leak when unloading module Matthew Auld
@ 2023-02-28 10:41 ` Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 04/14] drm/xe/bo: explicitly reject zero sized BO Matthew Auld
` (10 subsequent siblings)
13 siblings, 0 replies; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 10:41 UTC (permalink / raw)
To: intel-xe; +Cc: Lucas De Marchi
With small-bar we likely want to annotate all the kernel users that
require CPU access with vram. If xe_bo_create_pin_map() is the central
place for that then we should have a central place to annotate.
This also simplifies the code and fixes what appears to be a double
xe_bo_put(hwe->hwsp) in the error handling.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_hw_engine.c | 23 +++-----------
drivers/gpu/drm/xe/xe_lrc.c | 53 +++++++++++--------------------
drivers/gpu/drm/xe/xe_lrc_types.h | 1 -
3 files changed, 22 insertions(+), 55 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
index ae541b5e50f3..b035e2fa6744 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine.c
@@ -310,24 +310,14 @@ static int hw_engine_init(struct xe_gt *gt, struct xe_hw_engine *hwe,
xe_reg_sr_apply_mmio(&hwe->reg_sr, gt);
xe_reg_sr_apply_whitelist(&hwe->reg_whitelist, hwe->mmio_base, gt);
- hwe->hwsp = xe_bo_create_locked(xe, gt, NULL, SZ_4K, ttm_bo_type_kernel,
- XE_BO_CREATE_VRAM_IF_DGFX(gt) |
- XE_BO_CREATE_GGTT_BIT);
+ hwe->hwsp = xe_bo_create_pin_map(xe, gt, NULL, SZ_4K, ttm_bo_type_kernel,
+ XE_BO_CREATE_VRAM_IF_DGFX(gt) |
+ XE_BO_CREATE_GGTT_BIT);
if (IS_ERR(hwe->hwsp)) {
err = PTR_ERR(hwe->hwsp);
goto err_name;
}
- err = xe_bo_pin(hwe->hwsp);
- if (err)
- goto err_unlock_put_hwsp;
-
- err = xe_bo_vmap(hwe->hwsp);
- if (err)
- goto err_unpin_hwsp;
-
- xe_bo_unlock_no_vm(hwe->hwsp);
-
err = xe_lrc_init(&hwe->kernel_lrc, hwe, NULL, NULL, SZ_16K);
if (err)
goto err_hwsp;
@@ -353,15 +343,10 @@ static int hw_engine_init(struct xe_gt *gt, struct xe_hw_engine *hwe,
return 0;
-err_unpin_hwsp:
- xe_bo_unpin(hwe->hwsp);
-err_unlock_put_hwsp:
- xe_bo_unlock_no_vm(hwe->hwsp);
- xe_bo_put(hwe->hwsp);
err_kernel_lrc:
xe_lrc_finish(&hwe->kernel_lrc);
err_hwsp:
- xe_bo_put(hwe->hwsp);
+ xe_bo_unpin_map_no_vm(hwe->hwsp);
err_name:
hwe->name = NULL;
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index af4518a82db2..9140b057a5ba 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -615,7 +615,11 @@ int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
lrc->flags = 0;
- lrc->bo = xe_bo_create_locked(xe, hwe->gt, vm,
+ /*
+ * FIXME: Perma-pinning LRC as we don't yet support moving GGTT address
+ * via VM bind calls.
+ */
+ lrc->bo = xe_bo_create_pin_map(xe, hwe->gt, vm,
ring_size + xe_lrc_size(xe, hwe->class),
ttm_bo_type_kernel,
XE_BO_CREATE_VRAM_IF_DGFX(hwe->gt) |
@@ -628,21 +632,6 @@ int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
else
lrc->full_gt = hwe->gt;
- /*
- * FIXME: Perma-pinning LRC as we don't yet support moving GGTT address
- * via VM bind calls.
- */
- err = xe_bo_pin(lrc->bo);
- if (err)
- goto err_unlock_put_bo;
- lrc->flags |= XE_LRC_PINNED;
-
- err = xe_bo_vmap(lrc->bo);
- if (err)
- goto err_unpin_bo;
-
- xe_bo_unlock_vm_held(lrc->bo);
-
lrc->ring.size = ring_size;
lrc->ring.tail = 0;
@@ -652,8 +641,8 @@ int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
if (!gt->default_lrc[hwe->class]) {
init_data = empty_lrc_data(hwe);
if (!init_data) {
- xe_lrc_finish(lrc);
- return -ENOMEM;
+ err = -ENOMEM;
+ goto err_lrc_finish;
}
}
@@ -710,12 +699,8 @@ int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
return 0;
-err_unpin_bo:
- if (lrc->flags & XE_LRC_PINNED)
- xe_bo_unpin(lrc->bo);
-err_unlock_put_bo:
- xe_bo_unlock_vm_held(lrc->bo);
- xe_bo_put(lrc->bo);
+err_lrc_finish:
+ xe_lrc_finish(lrc);
return err;
}
@@ -724,17 +709,15 @@ void xe_lrc_finish(struct xe_lrc *lrc)
struct ww_acquire_ctx ww;
xe_hw_fence_ctx_finish(&lrc->fence_ctx);
- if (lrc->flags & XE_LRC_PINNED) {
- if (lrc->bo->vm)
- xe_vm_lock(lrc->bo->vm, &ww, 0, false);
- else
- xe_bo_lock_no_vm(lrc->bo, NULL);
- xe_bo_unpin(lrc->bo);
- if (lrc->bo->vm)
- xe_vm_unlock(lrc->bo->vm, &ww);
- else
- xe_bo_unlock_no_vm(lrc->bo);
- }
+ if (lrc->bo->vm)
+ xe_vm_lock(lrc->bo->vm, &ww, 0, false);
+ else
+ xe_bo_lock_no_vm(lrc->bo, NULL);
+ xe_bo_unpin(lrc->bo);
+ if (lrc->bo->vm)
+ xe_vm_unlock(lrc->bo->vm, &ww);
+ else
+ xe_bo_unlock_no_vm(lrc->bo);
xe_bo_put(lrc->bo);
}
diff --git a/drivers/gpu/drm/xe/xe_lrc_types.h b/drivers/gpu/drm/xe/xe_lrc_types.h
index 2827efa2091d..8fe08535873d 100644
--- a/drivers/gpu/drm/xe/xe_lrc_types.h
+++ b/drivers/gpu/drm/xe/xe_lrc_types.h
@@ -25,7 +25,6 @@ struct xe_lrc {
/** @flags: LRC flags */
u32 flags;
-#define XE_LRC_PINNED BIT(1)
/** @ring: submission ring state */
struct {
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [Intel-xe] [PATCH v2 04/14] drm/xe/bo: explicitly reject zero sized BO
2023-02-28 10:41 [Intel-xe] [PATCH v2 00/14] small-bar support Matthew Auld
` (2 preceding siblings ...)
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 03/14] drm/xe: prefer xe_bo_create_pin_map() Matthew Auld
@ 2023-02-28 10:41 ` Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 05/14] drm/xe/mmio: s/lmem/vram/ Matthew Auld
` (9 subsequent siblings)
13 siblings, 0 replies; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 10:41 UTC (permalink / raw)
To: intel-xe; +Cc: Lucas De Marchi
In the depths of ttm, when allocating the vma node this should result in
-ENOSPC it seems. However we should probably rather reject as part of
our own ioctl sanity checking (and return or more sensible error), and
then treat as programmer error in the lower levels.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_bo.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index da6df74e71b9..2bfd3f6f2e9a 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -961,6 +961,9 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
/* Only kernel objects should set GT */
XE_BUG_ON(gt && type != ttm_bo_type_kernel);
+ if (XE_WARN_ON(!size))
+ return ERR_PTR(-EINVAL);
+
if (!bo) {
bo = xe_bo_alloc();
if (IS_ERR(bo))
@@ -1515,6 +1518,9 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
if (XE_IOCTL_ERR(xe, args->handle))
return -EINVAL;
+ if (XE_IOCTL_ERR(xe, !args->size))
+ return -EINVAL;
+
if (XE_IOCTL_ERR(xe, args->size > SIZE_MAX))
return -EINVAL;
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [Intel-xe] [PATCH v2 05/14] drm/xe/mmio: s/lmem/vram/
2023-02-28 10:41 [Intel-xe] [PATCH v2 00/14] small-bar support Matthew Auld
` (3 preceding siblings ...)
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 04/14] drm/xe/bo: explicitly reject zero sized BO Matthew Auld
@ 2023-02-28 10:41 ` Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 06/14] drm/xe/vram: start tracking the io_size Matthew Auld
` (8 subsequent siblings)
13 siblings, 0 replies; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 10:41 UTC (permalink / raw)
To: intel-xe; +Cc: Lucas De Marchi
This seems to be the preferred nomenclature in xe.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_mmio.c | 40 +++++++++++++++++-----------------
drivers/gpu/drm/xe/xe_module.c | 6 ++---
drivers/gpu/drm/xe/xe_module.h | 2 +-
3 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
index 65b0df9bb579..e5bd4609aaee 100644
--- a/drivers/gpu/drm/xe/xe_mmio.c
+++ b/drivers/gpu/drm/xe/xe_mmio.c
@@ -68,7 +68,7 @@ _resize_bar(struct xe_device *xe, int resno, resource_size_t size)
return 1;
}
-static int xe_resize_lmem_bar(struct xe_device *xe, resource_size_t lmem_size)
+static int xe_resize_vram_bar(struct xe_device *xe, resource_size_t vram_size)
{
struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
struct pci_bus *root = pdev->bus;
@@ -78,31 +78,31 @@ static int xe_resize_lmem_bar(struct xe_device *xe, resource_size_t lmem_size)
u32 pci_cmd;
int i;
int ret;
- u64 force_lmem_bar_size = xe_force_lmem_bar_size;
+ u64 force_vram_bar_size = xe_force_vram_bar_size;
current_size = roundup_pow_of_two(pci_resource_len(pdev, GEN12_LMEM_BAR));
- if (force_lmem_bar_size) {
+ if (force_vram_bar_size) {
u32 bar_sizes;
- rebar_size = force_lmem_bar_size * (resource_size_t)SZ_1M;
+ rebar_size = force_vram_bar_size * (resource_size_t)SZ_1M;
bar_sizes = pci_rebar_get_possible_sizes(pdev, GEN12_LMEM_BAR);
if (rebar_size == current_size)
return 0;
if (!(bar_sizes & BIT(pci_rebar_bytes_to_size(rebar_size))) ||
- rebar_size >= roundup_pow_of_two(lmem_size)) {
- rebar_size = lmem_size;
+ rebar_size >= roundup_pow_of_two(vram_size)) {
+ rebar_size = vram_size;
drm_info(&xe->drm,
"Given bar size is not within supported size, setting it to default: %llu\n",
- (u64)lmem_size >> 20);
+ (u64)vram_size >> 20);
}
} else {
rebar_size = current_size;
- if (rebar_size != roundup_pow_of_two(lmem_size))
- rebar_size = lmem_size;
+ if (rebar_size != roundup_pow_of_two(vram_size))
+ rebar_size = vram_size;
else
return 0;
}
@@ -117,7 +117,7 @@ static int xe_resize_lmem_bar(struct xe_device *xe, resource_size_t lmem_size)
}
if (!root_res) {
- drm_info(&xe->drm, "Can't resize LMEM BAR - platform support is missing\n");
+ drm_info(&xe->drm, "Can't resize VRAM BAR - platform support is missing\n");
return -1;
}
@@ -168,7 +168,7 @@ int xe_mmio_total_vram_size(struct xe_device *xe, u64 *vram_size, u64 *usable_si
if (usable_size) {
reg = xe_gt_mcr_unicast_read_any(gt, XEHP_FLAT_CCS_BASE_ADDR);
*usable_size = (u64)REG_FIELD_GET(GENMASK(31, 8), reg) * SZ_64K;
- drm_info(&xe->drm, "lmem_size: 0x%llx usable_size: 0x%llx\n",
+ drm_info(&xe->drm, "vram_size: 0x%llx usable_size: 0x%llx\n",
*vram_size, *usable_size);
}
@@ -180,7 +180,7 @@ int xe_mmio_probe_vram(struct xe_device *xe)
struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
struct xe_gt *gt;
u8 id;
- u64 lmem_size;
+ u64 vram_size;
u64 original_size;
u64 current_size;
u64 usable_size;
@@ -207,29 +207,29 @@ int xe_mmio_probe_vram(struct xe_device *xe)
gt = xe_device_get_gt(xe, 0);
original_size = pci_resource_len(pdev, GEN12_LMEM_BAR);
- err = xe_mmio_total_vram_size(xe, &lmem_size, &usable_size);
+ err = xe_mmio_total_vram_size(xe, &vram_size, &usable_size);
if (err)
return err;
- resize_result = xe_resize_lmem_bar(xe, lmem_size);
+ resize_result = xe_resize_vram_bar(xe, vram_size);
current_size = pci_resource_len(pdev, GEN12_LMEM_BAR);
xe->mem.vram.io_start = pci_resource_start(pdev, GEN12_LMEM_BAR);
- xe->mem.vram.size = min(current_size, lmem_size);
+ xe->mem.vram.size = min(current_size, vram_size);
if (!xe->mem.vram.size)
return -EIO;
if (resize_result > 0)
- drm_info(&xe->drm, "Successfully resize LMEM from %lluMiB to %lluMiB\n",
+ drm_info(&xe->drm, "Successfully resize VRAM from %lluMiB to %lluMiB\n",
(u64)original_size >> 20,
(u64)current_size >> 20);
- else if (xe->mem.vram.size < lmem_size && !xe_force_lmem_bar_size)
+ else if (xe->mem.vram.size < vram_size && !xe_force_vram_bar_size)
drm_info(&xe->drm, "Using a reduced BAR size of %lluMiB. Consider enabling 'Resizable BAR' support in your BIOS.\n",
(u64)xe->mem.vram.size >> 20);
- if (xe->mem.vram.size < lmem_size)
+ if (xe->mem.vram.size < vram_size)
drm_warn(&xe->drm, "Restricting VRAM size to PCI resource size (0x%llx->0x%llx)\n",
- lmem_size, (u64)xe->mem.vram.size);
+ vram_size, (u64)xe->mem.vram.size);
xe->mem.vram.mapping = ioremap_wc(xe->mem.vram.io_start, xe->mem.vram.size);
xe->mem.vram.size = min_t(u64, xe->mem.vram.size, usable_size);
@@ -360,7 +360,7 @@ int xe_mmio_init(struct xe_device *xe)
* and we should not continue with driver initialization.
*/
if (IS_DGFX(xe) && !(xe_mmio_read32(gt, GU_CNTL.reg) & LMEM_INIT)) {
- drm_err(&xe->drm, "LMEM not initialized by firmware\n");
+ drm_err(&xe->drm, "VRAM not initialized by firmware\n");
return -ENODEV;
}
diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index 5a51a9959eff..6860586ce7f8 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -22,9 +22,9 @@ bool enable_display = true;
module_param_named(enable_display, enable_display, bool, 0444);
MODULE_PARM_DESC(enable_display, "Enable display");
-u32 xe_force_lmem_bar_size;
-module_param_named(lmem_bar_size, xe_force_lmem_bar_size, uint, 0600);
-MODULE_PARM_DESC(lmem_bar_size, "Set the lmem bar size(in MiB)");
+u32 xe_force_vram_bar_size;
+module_param_named(vram_bar_size, xe_force_vram_bar_size, uint, 0600);
+MODULE_PARM_DESC(vram_bar_size, "Set the vram bar size(in MiB)");
int xe_guc_log_level = 5;
module_param_named(guc_log_level, xe_guc_log_level, int, 0600);
diff --git a/drivers/gpu/drm/xe/xe_module.h b/drivers/gpu/drm/xe/xe_module.h
index 2c6ee46f5595..86916c176382 100644
--- a/drivers/gpu/drm/xe/xe_module.h
+++ b/drivers/gpu/drm/xe/xe_module.h
@@ -8,6 +8,6 @@
/* Module modprobe variables */
extern bool enable_guc;
extern bool enable_display;
-extern u32 xe_force_lmem_bar_size;
+extern u32 xe_force_vram_bar_size;
extern int xe_guc_log_level;
extern char *xe_param_force_probe;
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [Intel-xe] [PATCH v2 06/14] drm/xe/vram: start tracking the io_size
2023-02-28 10:41 [Intel-xe] [PATCH v2 00/14] small-bar support Matthew Auld
` (4 preceding siblings ...)
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 05/14] drm/xe/mmio: s/lmem/vram/ Matthew Auld
@ 2023-02-28 10:41 ` Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 07/14] drm/xe/buddy: remove the virtualized start Matthew Auld
` (7 subsequent siblings)
13 siblings, 0 replies; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 10:41 UTC (permalink / raw)
To: intel-xe; +Cc: Lucas De Marchi
First step towards supporting small-bar is to track the io_size for
vram. We can longer assume that the io_size == vram size. This way we
know how much is CPU accessible via the BAR, and how much is not.
Effectively giving us a two tiered vram, where in some later patches we
can support different allocation strategies depending on if the memory
needs to be CPU accessible or not.
Note as this stage we still clamp the vram size to the usable vram size.
Only in the final patch do we turn this on for real, and allow distinct
io_size and vram_size.
v2: (Lucas):
- Improve the commit message, plus improve the kernel-doc for the
io_size to give a better sense of what it actually is.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_device_types.h | 14 +++++++--
drivers/gpu/drm/xe/xe_gt_types.h | 14 +++++++--
drivers/gpu/drm/xe/xe_mmio.c | 44 ++++++++++++++++++++--------
3 files changed, 55 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 9743987fc883..9e998b4738e1 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -173,9 +173,19 @@ struct xe_device {
struct {
/** @vram: VRAM info for device */
struct {
- /** @io_start: start address of VRAM */
+ /** @io_start: IO start address of VRAM */
resource_size_t io_start;
- /** @size: size of VRAM */
+ /**
+ * @io_size: IO size of VRAM.
+ *
+ * This represents how much of VRAM we can access via
+ * the CPU through the VRAM BAR. This can be smaller
+ * than @size, in which case only part of VRAM is CPU
+ * accessible (typically the first 256M). This
+ * configuration is known as small-bar.
+ */
+ resource_size_t io_size;
+ /** @size: Total size of VRAM */
resource_size_t size;
/** @mapping: pointer to VRAM mappable space */
void *__iomem mapping;
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index b01edd3fdc4d..00c43f3a33a2 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -143,9 +143,19 @@ struct xe_gt {
* (virtual split), can be subset of global device VRAM
*/
struct {
- /** @io_start: start address of VRAM */
+ /** @io_start: IO start address of this VRAM instance */
resource_size_t io_start;
- /** @size: size of VRAM */
+ /**
+ * @io_size: IO size of this VRAM instance
+ *
+ * This represents how much of this VRAM we can access
+ * via the CPU through the VRAM BAR. This can be smaller
+ * than @size, in which case only part of VRAM is CPU
+ * accessible (typically the first 256M). This
+ * configuration is known as small-bar.
+ */
+ resource_size_t io_size;
+ /** @size: size of VRAM. */
resource_size_t size;
/** @mapping: pointer to VRAM mappable space */
void *__iomem mapping;
diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
index e5bd4609aaee..5cacaa05759a 100644
--- a/drivers/gpu/drm/xe/xe_mmio.c
+++ b/drivers/gpu/drm/xe/xe_mmio.c
@@ -182,7 +182,6 @@ int xe_mmio_probe_vram(struct xe_device *xe)
u8 id;
u64 vram_size;
u64 original_size;
- u64 current_size;
u64 usable_size;
int resize_result, err;
@@ -190,11 +189,13 @@ int xe_mmio_probe_vram(struct xe_device *xe)
xe->mem.vram.mapping = 0;
xe->mem.vram.size = 0;
xe->mem.vram.io_start = 0;
+ xe->mem.vram.io_size = 0;
for_each_gt(gt, xe, id) {
gt->mem.vram.mapping = 0;
gt->mem.vram.size = 0;
gt->mem.vram.io_start = 0;
+ gt->mem.vram.io_size = 0;
}
return 0;
}
@@ -212,10 +213,10 @@ int xe_mmio_probe_vram(struct xe_device *xe)
return err;
resize_result = xe_resize_vram_bar(xe, vram_size);
- current_size = pci_resource_len(pdev, GEN12_LMEM_BAR);
xe->mem.vram.io_start = pci_resource_start(pdev, GEN12_LMEM_BAR);
-
- xe->mem.vram.size = min(current_size, vram_size);
+ xe->mem.vram.io_size = min(usable_size,
+ pci_resource_len(pdev, GEN12_LMEM_BAR));
+ xe->mem.vram.size = xe->mem.vram.io_size;
if (!xe->mem.vram.size)
return -EIO;
@@ -223,15 +224,15 @@ int xe_mmio_probe_vram(struct xe_device *xe)
if (resize_result > 0)
drm_info(&xe->drm, "Successfully resize VRAM from %lluMiB to %lluMiB\n",
(u64)original_size >> 20,
- (u64)current_size >> 20);
- else if (xe->mem.vram.size < vram_size && !xe_force_vram_bar_size)
+ (u64)xe->mem.vram.io_size >> 20);
+ else if (xe->mem.vram.io_size < usable_size && !xe_force_vram_bar_size)
drm_info(&xe->drm, "Using a reduced BAR size of %lluMiB. Consider enabling 'Resizable BAR' support in your BIOS.\n",
(u64)xe->mem.vram.size >> 20);
if (xe->mem.vram.size < vram_size)
drm_warn(&xe->drm, "Restricting VRAM size to PCI resource size (0x%llx->0x%llx)\n",
vram_size, (u64)xe->mem.vram.size);
- xe->mem.vram.mapping = ioremap_wc(xe->mem.vram.io_start, xe->mem.vram.size);
+ xe->mem.vram.mapping = ioremap_wc(xe->mem.vram.io_start, xe->mem.vram.io_size);
xe->mem.vram.size = min_t(u64, xe->mem.vram.size, usable_size);
drm_info(&xe->drm, "TOTAL VRAM: %pa, %pa\n", &xe->mem.vram.io_start, &xe->mem.vram.size);
@@ -239,7 +240,7 @@ int xe_mmio_probe_vram(struct xe_device *xe)
/* FIXME: Assuming equally partitioned VRAM, incorrect */
if (xe->info.tile_count > 1) {
u8 adj_tile_count = xe->info.tile_count;
- resource_size_t size, io_start;
+ resource_size_t size, io_start, io_size;
for_each_gt(gt, xe, id)
if (xe_gt_is_media_type(gt))
@@ -249,15 +250,31 @@ int xe_mmio_probe_vram(struct xe_device *xe)
size = xe->mem.vram.size / adj_tile_count;
io_start = xe->mem.vram.io_start;
+ io_size = xe->mem.vram.io_size;
for_each_gt(gt, xe, id) {
- if (id && !xe_gt_is_media_type(gt))
- io_start += size;
+ if (id && !xe_gt_is_media_type(gt)) {
+ io_size -= min(io_size, size);
+ io_start += io_size;
+ }
gt->mem.vram.size = size;
- gt->mem.vram.io_start = io_start;
- gt->mem.vram.mapping = xe->mem.vram.mapping +
- (io_start - xe->mem.vram.io_start);
+
+ /*
+ * XXX: multi-tile small-bar might be wild. Hopefully
+ * full tile without any mappable vram is not something
+ * we care about.
+ */
+
+ gt->mem.vram.io_size = min(size, io_size);
+ if (io_size) {
+ gt->mem.vram.io_start = io_start;
+ gt->mem.vram.mapping = xe->mem.vram.mapping +
+ (io_start - xe->mem.vram.io_start);
+ } else {
+ drm_err(&xe->drm, "Tile without any CPU visible VRAM. Aborting.\n");
+ return -ENODEV;
+ }
drm_info(&xe->drm, "VRAM[%u, %u]: %pa, %pa\n",
id, gt->info.vram_id, >->mem.vram.io_start,
@@ -266,6 +283,7 @@ int xe_mmio_probe_vram(struct xe_device *xe)
} else {
gt->mem.vram.size = xe->mem.vram.size;
gt->mem.vram.io_start = xe->mem.vram.io_start;
+ gt->mem.vram.io_size = xe->mem.vram.io_size;
gt->mem.vram.mapping = xe->mem.vram.mapping;
drm_info(&xe->drm, "VRAM: %pa\n", >->mem.vram.size);
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [Intel-xe] [PATCH v2 07/14] drm/xe/buddy: remove the virtualized start
2023-02-28 10:41 [Intel-xe] [PATCH v2 00/14] small-bar support Matthew Auld
` (5 preceding siblings ...)
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 06/14] drm/xe/vram: start tracking the io_size Matthew Auld
@ 2023-02-28 10:41 ` Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 08/14] drm/xe/buddy: add visible tracking Matthew Auld
` (6 subsequent siblings)
13 siblings, 0 replies; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 10:41 UTC (permalink / raw)
To: intel-xe; +Cc: Lucas De Marchi
Hopefully not needed anymore. We can add a .compatible() hook once we
need to differentiate between mappable and non-mappable vram. If the
allocation is not contiguous then the start value is kind of
meaningless, so rather just mark as invalid.
In upstream, TTM wants to eventually remove the ttm_resource.start
usage.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_bo.c | 6 ++++++
drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 29 ++++++++++++++--------------
2 files changed, 20 insertions(+), 15 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 2bfd3f6f2e9a..7b331314064c 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -662,6 +662,12 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
void *new_addr = gt->mem.vram.mapping +
(new_mem->start << PAGE_SHIFT);
+ if (XE_WARN_ON(new_mem->start == XE_BO_INVALID_OFFSET)) {
+ ret = -EINVAL;
+ xe_device_mem_access_put(xe);
+ goto out;
+ }
+
XE_BUG_ON(new_mem->start !=
bo->placements->fpfn);
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index 643365b18bc7..e3ac8c6d3978 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -54,7 +54,6 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
struct xe_ttm_vram_mgr_resource *vres;
u64 size, remaining_size, lpfn, fpfn;
struct drm_buddy *mm = &mgr->mm;
- struct drm_buddy_block *block;
unsigned long pages_per_block;
int r;
@@ -186,24 +185,24 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
list_splice_tail(trim_list, &vres->blocks);
}
- vres->base.start = 0;
- list_for_each_entry(block, &vres->blocks, link) {
- unsigned long start;
+ if (!(vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) &&
+ xe_is_vram_mgr_blocks_contiguous(mm, &vres->blocks))
+ vres->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
- start = drm_buddy_block_offset(block) +
- drm_buddy_block_size(mm, block);
- start >>= PAGE_SHIFT;
+ /*
+ * For some kernel objects we still rely on the start when io mapping
+ * the object.
+ */
+ if (vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) {
+ struct drm_buddy_block *block = list_first_entry(&vres->blocks,
+ typeof(*block),
+ link);
- if (start > PFN_UP(vres->base.size))
- start -= PFN_UP(vres->base.size);
- else
- start = 0;
- vres->base.start = max(vres->base.start, start);
+ vres->base.start = drm_buddy_block_offset(block) >> PAGE_SHIFT;
+ } else {
+ vres->base.start = XE_BO_INVALID_OFFSET;
}
- if (xe_is_vram_mgr_blocks_contiguous(mm, &vres->blocks))
- vres->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
-
*res = &vres->base;
return 0;
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [Intel-xe] [PATCH v2 08/14] drm/xe/buddy: add visible tracking
2023-02-28 10:41 [Intel-xe] [PATCH v2 00/14] small-bar support Matthew Auld
` (6 preceding siblings ...)
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 07/14] drm/xe/buddy: remove the virtualized start Matthew Auld
@ 2023-02-28 10:41 ` Matthew Auld
2023-02-28 14:24 ` Maarten Lankhorst
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 09/14] drm/xe/buddy: add compatible and intersects hooks Matthew Auld
` (5 subsequent siblings)
13 siblings, 1 reply; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 10:41 UTC (permalink / raw)
To: intel-xe; +Cc: Lucas De Marchi
Replace the allocation code with the i915 version. This simplifies the
code a little, and importantly we get the accounting at the mgr level,
which is useful for debug (and maybe userspace), plus per resource
tracking so we can easily check if a resource is using one or pages in
the mappable part of vram (useful for eviction), or if the resource is
completely within the mappable portion (useful for checking if the
resource can be safely CPU mapped).
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c | 18 +-
drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 201 ++++++++++-----------
drivers/gpu/drm/xe/xe_ttm_vram_mgr.h | 3 +-
drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h | 6 +
4 files changed, 118 insertions(+), 110 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
index 2e8d07ad42ae..33bbd72711b3 100644
--- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
@@ -144,7 +144,7 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
{
struct xe_ttm_stolen_mgr *mgr = drmm_kzalloc(&xe->drm, sizeof(*mgr), GFP_KERNEL);
struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
- u64 stolen_size, pgsize;
+ u64 stolen_size, io_size, pgsize;
int err;
if (IS_DGFX(xe))
@@ -163,7 +163,17 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
if (pgsize < PAGE_SIZE)
pgsize = PAGE_SIZE;
- err = __xe_ttm_vram_mgr_init(xe, &mgr->base, XE_PL_STOLEN, stolen_size, pgsize);
+ /*
+ * We don't try to attempt partial visible support for stolen vram,
+ * since stolen is always at the end of vram, and the BAR size is pretty
+ * much always 256M, with small-bar.
+ */
+ io_size = 0;
+ if (!xe_ttm_stolen_cpu_inaccessible(xe))
+ io_size = stolen_size;
+
+ err = __xe_ttm_vram_mgr_init(xe, &mgr->base, XE_PL_STOLEN, stolen_size,
+ io_size, pgsize);
if (err) {
drm_dbg_kms(&xe->drm, "Stolen mgr init failed: %i\n", err);
return;
@@ -172,8 +182,8 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
drm_dbg_kms(&xe->drm, "Initialized stolen memory support with %llu bytes\n",
stolen_size);
- if (!xe_ttm_stolen_cpu_inaccessible(xe))
- mgr->mapping = devm_ioremap_wc(&pdev->dev, mgr->io_base, stolen_size);
+ if (io_size)
+ mgr->mapping = devm_ioremap_wc(&pdev->dev, mgr->io_base, io_size);
}
u64 xe_ttm_stolen_io_offset(struct xe_bo *bo, u32 offset)
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index e3ac8c6d3978..f703e962f499 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -49,45 +49,26 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
const struct ttm_place *place,
struct ttm_resource **res)
{
- u64 max_bytes, cur_size, min_block_size;
struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
struct xe_ttm_vram_mgr_resource *vres;
- u64 size, remaining_size, lpfn, fpfn;
struct drm_buddy *mm = &mgr->mm;
- unsigned long pages_per_block;
- int r;
-
- lpfn = (u64)place->lpfn << PAGE_SHIFT;
- if (!lpfn || lpfn > man->size)
- lpfn = man->size;
-
- fpfn = (u64)place->fpfn << PAGE_SHIFT;
+ u64 size, remaining_size, min_page_size;
+ unsigned long lpfn;
+ int err;
- max_bytes = mgr->manager.size;
- if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
- pages_per_block = ~0ul;
- } else {
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- pages_per_block = HPAGE_PMD_NR;
-#else
- /* default to 2MB */
- pages_per_block = 2UL << (20UL - PAGE_SHIFT);
-#endif
-
- pages_per_block = max_t(uint32_t, pages_per_block,
- tbo->page_alignment);
- }
+ lpfn = place->lpfn;
+ if (!lpfn || lpfn > man->size >> PAGE_SHIFT)
+ lpfn = man->size >> PAGE_SHIFT;
vres = kzalloc(sizeof(*vres), GFP_KERNEL);
if (!vres)
return -ENOMEM;
ttm_resource_init(tbo, place, &vres->base);
- remaining_size = vres->base.size;
/* bail out quickly if there's likely not enough VRAM for this BO */
- if (ttm_resource_manager_usage(man) > max_bytes) {
- r = -ENOSPC;
+ if (ttm_resource_manager_usage(man) > man->size) {
+ err = -ENOSPC;
goto error_fini;
}
@@ -96,95 +77,91 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
if (place->flags & TTM_PL_FLAG_TOPDOWN)
vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
- if (fpfn || lpfn != man->size)
- /* Allocate blocks in desired range */
+ if (place->fpfn || lpfn != man->size >> PAGE_SHIFT)
vres->flags |= DRM_BUDDY_RANGE_ALLOCATION;
- mutex_lock(&mgr->lock);
- while (remaining_size) {
- if (tbo->page_alignment)
- min_block_size = tbo->page_alignment << PAGE_SHIFT;
- else
- min_block_size = mgr->default_page_size;
-
- XE_BUG_ON(min_block_size < mm->chunk_size);
-
- /* Limit maximum size to 2GiB due to SG table limitations */
- size = min(remaining_size, 2ULL << 30);
-
- if (size >= pages_per_block << PAGE_SHIFT)
- min_block_size = pages_per_block << PAGE_SHIFT;
-
- cur_size = size;
-
- if (fpfn + size != place->lpfn << PAGE_SHIFT) {
- /*
- * Except for actual range allocation, modify the size and
- * min_block_size conforming to continuous flag enablement
- */
- if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
- size = roundup_pow_of_two(size);
- min_block_size = size;
- /*
- * Modify the size value if size is not
- * aligned with min_block_size
- */
- } else if (!IS_ALIGNED(size, min_block_size)) {
- size = round_up(size, min_block_size);
- }
- }
+ XE_BUG_ON(!vres->base.size);
+ size = vres->base.size;
- r = drm_buddy_alloc_blocks(mm, fpfn,
- lpfn,
- size,
- min_block_size,
- &vres->blocks,
- vres->flags);
- if (unlikely(r))
- goto error_free_blocks;
+ min_page_size = mgr->default_page_size;
+ if (tbo->page_alignment)
+ min_page_size = tbo->page_alignment << PAGE_SHIFT;
- if (size > remaining_size)
- remaining_size = 0;
- else
- remaining_size -= size;
+ XE_BUG_ON(min_page_size < mm->chunk_size);
+ XE_BUG_ON(min_page_size > SZ_2G); /* FIXME: sg limit */
+ XE_BUG_ON(size > SZ_2G &&
+ (vres->base.placement & TTM_PL_FLAG_CONTIGUOUS));
+ XE_BUG_ON(!IS_ALIGNED(size, min_page_size));
+
+ if (place->fpfn + (vres->base.size >> PAGE_SHIFT) != place->lpfn &&
+ place->flags & TTM_PL_FLAG_CONTIGUOUS) {
+ unsigned long pages;
+
+ size = roundup_pow_of_two(size);
+ min_page_size = size;
+
+ pages = size >> ilog2(mm->chunk_size);
+ if (pages > lpfn)
+ lpfn = pages;
}
- mutex_unlock(&mgr->lock);
- if (cur_size != size) {
- struct drm_buddy_block *block;
- struct list_head *trim_list;
- u64 original_size;
- LIST_HEAD(temp);
+ if (size > lpfn << PAGE_SHIFT) {
+ err = -E2BIG; /* don't trigger eviction */
+ goto error_fini;
+ }
- trim_list = &vres->blocks;
- original_size = vres->base.size;
+ mutex_lock(&mgr->lock);
+ if (lpfn <= mgr->visible_size && size > mgr->visible_avail) {
+ mutex_unlock(&mgr->lock);
+ err = -ENOSPC;
+ goto error_fini;
+ }
+ remaining_size = size;
+ do {
/*
- * If size value is rounded up to min_block_size, trim the last
- * block to the required size
+ * Limit maximum size to 2GiB due to SG table limitations.
+ * FIXME: Should maybe be handled as part of sg construction.
*/
- if (!list_is_singular(&vres->blocks)) {
- block = list_last_entry(&vres->blocks, typeof(*block), link);
- list_move_tail(&block->link, &temp);
- trim_list = &temp;
- /*
- * Compute the original_size value by subtracting the
- * last block size with (aligned size - original size)
- */
- original_size = drm_buddy_block_size(mm, block) -
- (size - cur_size);
- }
+ u64 alloc_size = min_t(u64, remaining_size, SZ_2G);
+
+ err = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
+ (u64)lpfn << PAGE_SHIFT,
+ alloc_size,
+ min_page_size,
+ &vres->blocks,
+ vres->flags);
+ if (err)
+ goto error_free_blocks;
- mutex_lock(&mgr->lock);
- drm_buddy_block_trim(mm,
- original_size,
- trim_list);
- mutex_unlock(&mgr->lock);
+ remaining_size -= alloc_size;
+ } while (remaining_size);
- if (!list_empty(&temp))
- list_splice_tail(trim_list, &vres->blocks);
+ if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
+ if (!drm_buddy_block_trim(mm, vres->base.size, &vres->blocks))
+ size = vres->base.size;
}
+ if (lpfn <= mgr->visible_size >> PAGE_SHIFT) {
+ vres->used_visible_size = size;
+ } else {
+ struct drm_buddy_block *block;
+
+ list_for_each_entry(block, &vres->blocks, link) {
+ u64 start = drm_buddy_block_offset(block);
+
+ if (start < mgr->visible_size) {
+ u64 end = start + drm_buddy_block_size(mm, block);
+
+ vres->used_visible_size +=
+ min(end, mgr->visible_size) - start;
+ }
+ }
+ }
+
+ mgr->visible_avail -= vres->used_visible_size;
+ mutex_unlock(&mgr->lock);
+
if (!(vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) &&
xe_is_vram_mgr_blocks_contiguous(mm, &vres->blocks))
vres->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
@@ -213,7 +190,7 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
ttm_resource_fini(man, &vres->base);
kfree(vres);
- return r;
+ return err;
}
static void xe_ttm_vram_mgr_del(struct ttm_resource_manager *man,
@@ -226,6 +203,7 @@ static void xe_ttm_vram_mgr_del(struct ttm_resource_manager *man,
mutex_lock(&mgr->lock);
drm_buddy_free_list(mm, &vres->blocks);
+ mgr->visible_avail += vres->used_visible_size;
mutex_unlock(&mgr->lock);
ttm_resource_fini(man, res);
@@ -240,6 +218,13 @@ static void xe_ttm_vram_mgr_debug(struct ttm_resource_manager *man,
struct drm_buddy *mm = &mgr->mm;
mutex_lock(&mgr->lock);
+ drm_printf(printer, "default_page_size: %lluKiB\n",
+ mgr->default_page_size >> 10);
+ drm_printf(printer, "visible_avail: %lluMiB\n",
+ (u64)mgr->visible_avail >> 20);
+ drm_printf(printer, "visible_size: %lluMiB\n",
+ (u64)mgr->visible_size >> 20);
+
drm_buddy_print(mm, printer);
mutex_unlock(&mgr->lock);
drm_printf(printer, "man size:%llu\n", man->size);
@@ -262,6 +247,8 @@ static void ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
if (ttm_resource_manager_evict_all(&xe->ttm, man))
return;
+ WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
+
drm_buddy_fini(&mgr->mm);
ttm_resource_manager_cleanup(&mgr->manager);
@@ -270,7 +257,8 @@ static void ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
}
int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
- u32 mem_type, u64 size, u64 default_page_size)
+ u32 mem_type, u64 size, u64 io_size,
+ u64 default_page_size)
{
struct ttm_resource_manager *man = &mgr->manager;
int err;
@@ -279,6 +267,8 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
mgr->mem_type = mem_type;
mutex_init(&mgr->lock);
mgr->default_page_size = default_page_size;
+ mgr->visible_size = io_size;
+ mgr->visible_avail = io_size;
ttm_resource_manager_init(man, &xe->ttm, size);
err = drm_buddy_init(&mgr->mm, man->size, default_page_size);
@@ -298,7 +288,8 @@ int xe_ttm_vram_mgr_init(struct xe_gt *gt, struct xe_ttm_vram_mgr *mgr)
mgr->gt = gt;
return __xe_ttm_vram_mgr_init(xe, mgr, XE_PL_VRAM0 + gt->info.vram_id,
- gt->mem.vram.size, PAGE_SIZE);
+ gt->mem.vram.size, gt->mem.vram.io_size,
+ PAGE_SIZE);
}
int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
index 78f332d26224..35e5367a79fb 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
@@ -13,7 +13,8 @@ struct xe_device;
struct xe_gt;
int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
- u32 mem_type, u64 size, u64 default_page_size);
+ u32 mem_type, u64 size, u64 io_size,
+ u64 default_page_size);
int xe_ttm_vram_mgr_init(struct xe_gt *gt, struct xe_ttm_vram_mgr *mgr);
int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
struct ttm_resource *res,
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
index 39aa2ec1b968..3d9417ff7434 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
@@ -23,6 +23,10 @@ struct xe_ttm_vram_mgr {
struct ttm_resource_manager manager;
/** @mm: DRM buddy allocator which manages the VRAM */
struct drm_buddy mm;
+ /** @visible_size: Proped size of the CPU visible portion */
+ u64 visible_size;
+ /** @visible_avail: CPU visible portion still unallocated */
+ u64 visible_avail;
/** @default_page_size: default page size */
u64 default_page_size;
/** @lock: protects allocations of VRAM */
@@ -39,6 +43,8 @@ struct xe_ttm_vram_mgr_resource {
struct ttm_resource base;
/** @blocks: list of DRM buddy blocks */
struct list_head blocks;
+ /** @used_visible_size: How many CPU visible bytes this resource is using */
+ u64 used_visible_size;
/** @flags: flags associated with the resource */
unsigned long flags;
};
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [Intel-xe] [PATCH v2 09/14] drm/xe/buddy: add compatible and intersects hooks
2023-02-28 10:41 [Intel-xe] [PATCH v2 00/14] small-bar support Matthew Auld
` (7 preceding siblings ...)
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 08/14] drm/xe/buddy: add visible tracking Matthew Auld
@ 2023-02-28 10:41 ` Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 10/14] drm/xe/bo: support tiered vram allocation for small-bar Matthew Auld
` (4 subsequent siblings)
13 siblings, 0 replies; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 10:41 UTC (permalink / raw)
To: intel-xe; +Cc: Lucas De Marchi
Copy this from i915. We need .compatible for lmem -> lmem transfers, so
they don't just get nooped by ttm, if need to move something from
mappable to non-mappble or vice versa. The .intersects is needed for
eviction, to determine if a victim resource is worth eviction. e.g if we
need mappable space there is no point in evicting a resource that has
zero mappable pages.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 62 ++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index f703e962f499..8dd33ac65499 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -230,9 +230,71 @@ static void xe_ttm_vram_mgr_debug(struct ttm_resource_manager *man,
drm_printf(printer, "man size:%llu\n", man->size);
}
+static bool xe_ttm_vram_mgr_intersects(struct ttm_resource_manager *man,
+ struct ttm_resource *res,
+ const struct ttm_place *place,
+ size_t size)
+{
+ struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
+ struct xe_ttm_vram_mgr_resource *vres =
+ to_xe_ttm_vram_mgr_resource(res);
+ struct drm_buddy *mm = &mgr->mm;
+ struct drm_buddy_block *block;
+
+ if (!place->fpfn && !place->lpfn)
+ return true;
+
+ if (!place->fpfn && place->lpfn == mgr->visible_size >> PAGE_SHIFT)
+ return vres->used_visible_size > 0;
+
+ list_for_each_entry(block, &vres->blocks, link) {
+ unsigned long fpfn =
+ drm_buddy_block_offset(block) >> PAGE_SHIFT;
+ unsigned long lpfn = fpfn +
+ (drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
+
+ if (place->fpfn < lpfn && place->lpfn > fpfn)
+ return true;
+ }
+
+ return false;
+}
+
+static bool xe_ttm_vram_mgr_compatible(struct ttm_resource_manager *man,
+ struct ttm_resource *res,
+ const struct ttm_place *place,
+ size_t size)
+{
+ struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
+ struct xe_ttm_vram_mgr_resource *vres =
+ to_xe_ttm_vram_mgr_resource(res);
+ struct drm_buddy *mm = &mgr->mm;
+ struct drm_buddy_block *block;
+
+ if (!place->fpfn && !place->lpfn)
+ return true;
+
+ if (!place->fpfn && place->lpfn == mgr->visible_size >> PAGE_SHIFT)
+ return vres->used_visible_size == size;
+
+ list_for_each_entry(block, &vres->blocks, link) {
+ unsigned long fpfn =
+ drm_buddy_block_offset(block) >> PAGE_SHIFT;
+ unsigned long lpfn = fpfn +
+ (drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
+
+ if (fpfn < place->fpfn || lpfn > place->lpfn)
+ return false;
+ }
+
+ return true;
+}
+
static const struct ttm_resource_manager_func xe_ttm_vram_mgr_func = {
.alloc = xe_ttm_vram_mgr_new,
.free = xe_ttm_vram_mgr_del,
+ .intersects = xe_ttm_vram_mgr_intersects,
+ .compatible = xe_ttm_vram_mgr_compatible,
.debug = xe_ttm_vram_mgr_debug
};
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [Intel-xe] [PATCH v2 10/14] drm/xe/bo: support tiered vram allocation for small-bar
2023-02-28 10:41 [Intel-xe] [PATCH v2 00/14] small-bar support Matthew Auld
` (8 preceding siblings ...)
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 09/14] drm/xe/buddy: add compatible and intersects hooks Matthew Auld
@ 2023-02-28 10:41 ` Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 11/14] drm/xe/migrate: retain CCS aux state for vram -> vram Matthew Auld
` (3 subsequent siblings)
13 siblings, 0 replies; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 10:41 UTC (permalink / raw)
To: intel-xe; +Cc: Lucas De Marchi
Add the new flag XE_BO_NEEDS_CPU_ACCESS, to force allocating in the
mappable part of lmem. If no flag is specified we do a topdown
allocation, to limit the chances of stealing the precious mappable part,
if we don't need it. If this is a full-bar system, then this all gets
nooped.
For kernel users, it looks like xe_bo_create_pin_map() is the central
place which users should call if they want CPU access to the object, so
add the flag there.
We still need to plumb this through for userspace allocations. Also it
looks like page-tables are using pin_map(), which is less than ideal. If
we can already use the GPU to do page-table management, then maybe we
should just force that for small-bar.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/tests/xe_migrate.c | 3 +-
drivers/gpu/drm/xe/xe_bo.c | 83 ++++++++++++++++++---------
drivers/gpu/drm/xe/xe_bo.h | 1 +
drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 4 ++
4 files changed, 62 insertions(+), 29 deletions(-)
diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
index 0de17e90aba9..b786d07710d3 100644
--- a/drivers/gpu/drm/xe/tests/xe_migrate.c
+++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
@@ -95,7 +95,8 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
struct xe_bo *sysmem = xe_bo_create_locked(xe, m->gt, NULL,
bo->size,
ttm_bo_type_kernel,
- XE_BO_CREATE_SYSTEM_BIT);
+ XE_BO_CREATE_SYSTEM_BIT |
+ XE_BO_NEEDS_CPU_ACCESS);
if (IS_ERR(sysmem)) {
KUNIT_FAIL(test, "Failed to allocate sysmem bo for %s: %li\n",
str, PTR_ERR(sysmem));
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 7b331314064c..95ec6b34a28c 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -96,22 +96,30 @@ static void try_add_system(struct xe_bo *bo, struct ttm_place *places,
static void try_add_vram0(struct xe_device *xe, struct xe_bo *bo,
struct ttm_place *places, u32 bo_flags, u32 *c)
{
- struct xe_gt *gt;
-
if (bo_flags & XE_BO_CREATE_VRAM0_BIT) {
+ struct ttm_place place = {};
+ struct xe_gt *gt;
+ u64 io_size;
+
gt = mem_type_to_gt(xe, XE_PL_VRAM0);
+ io_size = gt->mem.vram.io_size;
XE_BUG_ON(!gt->mem.vram.size);
- places[*c] = (struct ttm_place) {
- .mem_type = XE_PL_VRAM0,
- /*
- * For eviction / restore on suspend / resume objects
- * pinned in VRAM must be contiguous
- */
- .flags = bo_flags & (XE_BO_CREATE_PINNED_BIT |
- XE_BO_CREATE_GGTT_BIT) ?
- TTM_PL_FLAG_CONTIGUOUS : 0,
- };
+ place.mem_type = XE_PL_VRAM0;
+
+ if (bo_flags & (XE_BO_CREATE_PINNED_BIT |
+ XE_BO_CREATE_GGTT_BIT))
+ place.flags |= TTM_PL_FLAG_CONTIGUOUS;
+
+ if (io_size < gt->mem.vram.size) {
+ if (bo_flags & XE_BO_NEEDS_CPU_ACCESS) {
+ place.fpfn = 0;
+ place.lpfn = io_size >> PAGE_SHIFT;
+ } else {
+ place.flags |= TTM_PL_FLAG_TOPDOWN;
+ }
+ }
+ places[*c] = place;
*c += 1;
if (bo->props.preferred_mem_type == XE_BO_PROPS_INVALID)
@@ -122,22 +130,30 @@ static void try_add_vram0(struct xe_device *xe, struct xe_bo *bo,
static void try_add_vram1(struct xe_device *xe, struct xe_bo *bo,
struct ttm_place *places, u32 bo_flags, u32 *c)
{
- struct xe_gt *gt;
-
if (bo_flags & XE_BO_CREATE_VRAM1_BIT) {
+ struct ttm_place place = {};
+ struct xe_gt *gt;
+ u64 io_size;
+
gt = mem_type_to_gt(xe, XE_PL_VRAM1);
+ io_size = gt->mem.vram.io_size;
XE_BUG_ON(!gt->mem.vram.size);
- places[*c] = (struct ttm_place) {
- .mem_type = XE_PL_VRAM1,
- /*
- * For eviction / restore on suspend / resume objects
- * pinned in VRAM must be contiguous
- */
- .flags = bo_flags & (XE_BO_CREATE_PINNED_BIT |
- XE_BO_CREATE_GGTT_BIT) ?
- TTM_PL_FLAG_CONTIGUOUS : 0,
- };
+ place.mem_type = XE_PL_VRAM1;
+
+ if (bo_flags & (XE_BO_CREATE_PINNED_BIT |
+ XE_BO_CREATE_GGTT_BIT))
+ place.flags |= TTM_PL_FLAG_CONTIGUOUS;
+
+ if (io_size < gt->mem.vram.size) {
+ if (bo_flags & XE_BO_NEEDS_CPU_ACCESS) {
+ place.fpfn = 0;
+ place.lpfn = io_size >> PAGE_SHIFT;
+ } else {
+ place.flags |= TTM_PL_FLAG_TOPDOWN;
+ }
+ }
+ places[*c] = place;
*c += 1;
if (bo->props.preferred_mem_type == XE_BO_PROPS_INVALID)
@@ -369,15 +385,22 @@ static int xe_ttm_io_mem_reserve(struct ttm_device *bdev,
struct ttm_resource *mem)
{
struct xe_device *xe = ttm_to_xe_device(bdev);
- struct xe_gt *gt;
switch (mem->mem_type) {
case XE_PL_SYSTEM:
case XE_PL_TT:
return 0;
case XE_PL_VRAM0:
- case XE_PL_VRAM1:
+ case XE_PL_VRAM1: {
+ struct xe_ttm_vram_mgr_resource *vres =
+ to_xe_ttm_vram_mgr_resource(mem);
+ struct xe_gt *gt;
+
+ if (vres->used_visible_size < mem->size)
+ return -EINVAL;
+
gt = mem_type_to_gt(xe, mem->mem_type);
+
mem->bus.offset = mem->start << PAGE_SHIFT;
if (gt->mem.vram.mapping &&
@@ -392,7 +415,7 @@ static int xe_ttm_io_mem_reserve(struct ttm_device *bdev,
mem->bus.caching = ttm_write_combined;
#endif
return 0;
- case XE_PL_STOLEN:
+ } case XE_PL_STOLEN:
return xe_ttm_stolen_io_mem_reserve(xe, mem);
default:
return -EINVAL;
@@ -1160,7 +1183,8 @@ struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct xe_gt *gt,
xe_ttm_stolen_cpu_inaccessible(xe))
flags |= XE_BO_CREATE_GGTT_BIT;
- bo = xe_bo_create_locked_range(xe, gt, vm, size, start, end, type, flags);
+ bo = xe_bo_create_locked_range(xe, gt, vm, size, start, end, type,
+ flags | XE_BO_NEEDS_CPU_ACCESS);
if (IS_ERR(bo))
return bo;
@@ -1458,6 +1482,9 @@ int xe_bo_vmap(struct xe_bo *bo)
xe_bo_assert_held(bo);
+ if (!(bo->flags & XE_BO_NEEDS_CPU_ACCESS))
+ return -EINVAL;
+
if (!iosys_map_is_null(&bo->vmap))
return 0;
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 0699b2b4c5ca..c937ef10fcf3 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -28,6 +28,7 @@
#define XE_BO_DEFER_BACKING BIT(8)
#define XE_BO_SCANOUT_BIT BIT(9)
#define XE_BO_FIXED_PLACEMENT_BIT BIT(10)
+#define XE_BO_NEEDS_CPU_ACCESS BIT(11)
/* this one is trigger internally only */
#define XE_BO_INTERNAL_TEST BIT(30)
#define XE_BO_INTERNAL_64K BIT(31)
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index 8dd33ac65499..dd2fe543bc61 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -361,12 +361,16 @@ int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
enum dma_data_direction dir,
struct sg_table **sgt)
{
+ struct xe_ttm_vram_mgr_resource *vres = to_xe_ttm_vram_mgr_resource(res);
struct xe_gt *gt = xe_device_get_gt(xe, res->mem_type - XE_PL_VRAM0);
struct xe_res_cursor cursor;
struct scatterlist *sg;
int num_entries = 0;
int i, r;
+ if (vres->used_visible_size < res->size)
+ return -EOPNOTSUPP;
+
*sgt = kmalloc(sizeof(**sgt), GFP_KERNEL);
if (!*sgt)
return -ENOMEM;
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [Intel-xe] [PATCH v2 11/14] drm/xe/migrate: retain CCS aux state for vram -> vram
2023-02-28 10:41 [Intel-xe] [PATCH v2 00/14] small-bar support Matthew Auld
` (9 preceding siblings ...)
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 10/14] drm/xe/bo: support tiered vram allocation for small-bar Matthew Auld
@ 2023-02-28 10:41 ` Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 12/14] drm/xe/display: annotate CC buffers with NEEDS_CPU_ACCESS Matthew Auld
` (2 subsequent siblings)
13 siblings, 0 replies; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 10:41 UTC (permalink / raw)
To: intel-xe; +Cc: Lucas De Marchi
On small-bar we likely need to be able to deal with vram -> vram
transfers. During eviction, as an optimisation, we don't always
want/need to kick stuff into smem. Plus for some types of CCS surfaces,
where the clear color needs to be accessed from the CPU, we might need
to migrate it.
v2: (Lucas):
- s/lmem/vram/ in the commit message
- Tidy up the code a bit; use one emit_copy_ccs()
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_migrate.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index bc69ec17d5ad..94c85421e1a5 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -546,11 +546,19 @@ static u32 xe_migrate_ccs_copy(struct xe_migrate *m,
if (xe_device_has_flat_ccs(gt_to_xe(gt)) && !copy_ccs && dst_is_vram) {
/*
- * If the bo doesn't have any CCS metadata attached, we still
- * need to clear it for security reasons.
+ * If the src is already in vram, then it should already
+ * have been cleared by us, or has been populated by the
+ * user. Make sure we copy the CCS aux state as-is.
+ *
+ * Otherwise if the bo doesn't have any CCS metadata attached,
+ * we still need to clear it for security reasons.
*/
- emit_copy_ccs(gt, bb, dst_ofs, true, m->cleared_vram_ofs, false,
- dst_size);
+ u64 ccs_src_ofs = src_is_vram ? src_ofs : m->cleared_vram_ofs;
+
+ emit_copy_ccs(gt, bb,
+ dst_ofs, true,
+ ccs_src_ofs, src_is_vram, dst_size);
+
flush_flags = MI_FLUSH_DW_CCS;
} else if (copy_ccs) {
if (!src_is_vram)
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [Intel-xe] [PATCH v2 12/14] drm/xe/display: annotate CC buffers with NEEDS_CPU_ACCESS
2023-02-28 10:41 [Intel-xe] [PATCH v2 00/14] small-bar support Matthew Auld
` (10 preceding siblings ...)
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 11/14] drm/xe/migrate: retain CCS aux state for vram -> vram Matthew Auld
@ 2023-02-28 10:41 ` Matthew Auld
2023-02-28 14:48 ` Maarten Lankhorst
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 13/14] drm/xe/uapi: add the userspace bits for small-bar Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 14/14] drm/xe: fully turn on small-bar support Matthew Auld
13 siblings, 1 reply; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 10:41 UTC (permalink / raw)
To: intel-xe; +Cc: Lucas De Marchi
The display code wants to read the clear color value from the buffer.
However if the buffer is the non-mappable part of lmem then we fail the
kmap. The simplest solution is to just mark the buffer with
XE_BO_NEEDS_CPU_ACCESS, which will either allocate the buffer in the
mappable part of lmem, or migrate it there.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/display/xe_fb_pin.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
index 65c0bc28a3d1..66e1309e21d8 100644
--- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
+++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
@@ -203,6 +203,14 @@ static struct i915_vma *__xe_pin_fb_vma(struct intel_framebuffer *fb,
if (ret)
goto err;
+ /*
+ * For this type of buffer we need to able to read from the CPU the
+ * clear color value found in the buffer. This doesn't do anything on
+ * non small-bar devices.
+ */
+ if (intel_fb_rc_ccs_cc_plane(&fb->base) >= 0)
+ bo->flags |= XE_BO_NEEDS_CPU_ACCESS;
+
ret = xe_bo_validate(bo, NULL, true);
if (!ret)
ttm_bo_pin(&bo->ttm);
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [Intel-xe] [PATCH v2 13/14] drm/xe/uapi: add the userspace bits for small-bar
2023-02-28 10:41 [Intel-xe] [PATCH v2 00/14] small-bar support Matthew Auld
` (11 preceding siblings ...)
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 12/14] drm/xe/display: annotate CC buffers with NEEDS_CPU_ACCESS Matthew Auld
@ 2023-02-28 10:41 ` Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 14/14] drm/xe: fully turn on small-bar support Matthew Auld
13 siblings, 0 replies; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 10:41 UTC (permalink / raw)
To: intel-xe; +Cc: Lucas De Marchi
Pretty much all same as i915. We add a new hint for userspace to force an
object into the mappable part of vram. Like i915 we also require smem as
a placement in case we need to spill.
We also need to tell userspace how large the mappable part is. In Vulkan
for example, there will be two vram heaps for small-bar systems. And
here the size of each heap needs to be known. Likewise the used/avail
tracking needs to account for the mappable part.
We also limit the available tracking going forward, and limit to
privileged users only, since these values are system wide and are
technically considered an info leak.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_bo.c | 17 +++++++++++++++--
drivers/gpu/drm/xe/xe_query.c | 22 +++++++++++++++++++---
drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 18 ++++++++++++++++++
drivers/gpu/drm/xe/xe_ttm_vram_mgr.h | 4 ++++
include/uapi/drm/xe_drm.h | 5 ++++-
5 files changed, 60 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 95ec6b34a28c..d05693ea56a9 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -914,7 +914,6 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
ret = ttm_bo_vm_fault_reserved(vmf,
vmf->vma->vm_page_prot,
TTM_BO_VM_NUM_PREFAULT);
-
drm_dev_exit(idx);
} else {
ret = ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
@@ -1541,6 +1540,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
if (XE_IOCTL_ERR(xe, args->flags &
~(XE_GEM_CREATE_FLAG_DEFER_BACKING |
XE_GEM_CREATE_FLAG_SCANOUT |
+ XE_GEM_CREATE_FLAG_NEEDS_CPU_ACCESS |
xe->info.mem_region_mask)))
return -EINVAL;
@@ -1578,6 +1578,18 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
bo_flags |= XE_BO_SCANOUT_BIT;
bo_flags |= args->flags << (ffs(XE_BO_CREATE_SYSTEM_BIT) - 1);
+
+ if (args->flags & XE_GEM_CREATE_FLAG_NEEDS_CPU_ACCESS) {
+ if (XE_IOCTL_ERR(xe, !(bo_flags & (XE_BO_CREATE_VRAM0_BIT |
+ XE_BO_CREATE_VRAM1_BIT))))
+ return -EINVAL;
+
+ if (XE_IOCTL_ERR(xe, !(bo_flags & XE_BO_CREATE_SYSTEM_BIT)))
+ return -EINVAL;
+
+ bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
+ }
+
bo = xe_bo_create(xe, NULL, vm, args->size, ttm_bo_type_device,
bo_flags);
if (vm) {
@@ -1841,7 +1853,8 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
bo = xe_bo_create(xe, NULL, NULL, args->size, ttm_bo_type_device,
XE_BO_CREATE_VRAM_IF_DGFX(to_gt(xe)) |
- XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT);
+ XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
+ XE_BO_NEEDS_CPU_ACCESS);
if (IS_ERR(bo))
return PTR_ERR(bo);
diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c
index 0f70945176f6..e94cad946507 100644
--- a/drivers/gpu/drm/xe/xe_query.c
+++ b/drivers/gpu/drm/xe/xe_query.c
@@ -16,6 +16,7 @@
#include "xe_gt.h"
#include "xe_guc_hwconfig.h"
#include "xe_macros.h"
+#include "xe_ttm_vram_mgr.h"
static const enum xe_engine_class xe_to_user_engine_class[] = {
[XE_ENGINE_CLASS_RENDER] = DRM_XE_ENGINE_CLASS_RENDER,
@@ -127,7 +128,10 @@ static int query_memory_usage(struct xe_device *xe,
usage->regions[0].min_page_size = PAGE_SIZE;
usage->regions[0].max_page_size = PAGE_SIZE;
usage->regions[0].total_size = man->size << PAGE_SHIFT;
- usage->regions[0].used = ttm_resource_manager_usage(man);
+ if (perfmon_capable())
+ usage->regions[0].used = ttm_resource_manager_usage(man);
+ else
+ usage->regions[0].used = usage->regions[0].total_size;
usage->num_regions = 1;
for (i = XE_PL_VRAM0; i <= XE_PL_VRAM1; ++i) {
@@ -144,8 +148,20 @@ static int query_memory_usage(struct xe_device *xe,
SZ_1G;
usage->regions[usage->num_regions].total_size =
man->size;
- usage->regions[usage->num_regions++].used =
- ttm_resource_manager_usage(man);
+
+ if (perfmon_capable()) {
+ xe_ttm_vram_get_used(man,
+ &usage->regions[usage->num_regions].used,
+ &usage->regions[usage->num_regions].cpu_visible_used);
+ } else {
+ usage->regions[usage->num_regions].used = man->size;
+ usage->regions[usage->num_regions].cpu_visible_used =
+ xe_ttm_vram_get_cpu_visible_size(man);
+ }
+
+ usage->regions[usage->num_regions].cpu_visible_size =
+ xe_ttm_vram_get_cpu_visible_size(man);
+ usage->num_regions++;
}
}
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index dd2fe543bc61..255fc31615c6 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -446,3 +446,21 @@ void xe_ttm_vram_mgr_free_sgt(struct device *dev, enum dma_data_direction dir,
sg_free_table(sgt);
kfree(sgt);
}
+
+u64 xe_ttm_vram_get_cpu_visible_size(struct ttm_resource_manager *man)
+{
+ struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
+
+ return mgr->visible_size;
+}
+
+void xe_ttm_vram_get_used(struct ttm_resource_manager *man,
+ u64 *used, u64 *used_visible)
+{
+ struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
+
+ mutex_lock(&mgr->lock);
+ *used = mgr->mm.size - mgr->mm.avail;
+ *used_visible = mgr->visible_size - mgr->visible_avail;
+ mutex_unlock(&mgr->lock);
+}
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
index 35e5367a79fb..27f43490fa11 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
@@ -25,6 +25,10 @@ int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
void xe_ttm_vram_mgr_free_sgt(struct device *dev, enum dma_data_direction dir,
struct sg_table *sgt);
+u64 xe_ttm_vram_get_cpu_visible_size(struct ttm_resource_manager *man);
+void xe_ttm_vram_get_used(struct ttm_resource_manager *man,
+ u64 *used, u64 *used_visible);
+
static inline struct xe_ttm_vram_mgr_resource *
to_xe_ttm_vram_mgr_resource(struct ttm_resource *res)
{
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 593b01ba5919..525ab9b6b282 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -169,7 +169,9 @@ struct drm_xe_query_mem_usage {
__u32 max_page_size;
__u64 total_size;
__u64 used;
- __u64 reserved[8];
+ __u64 cpu_visible_size;
+ __u64 cpu_visible_used;
+ __u64 reserved[6];
} regions[];
};
@@ -270,6 +272,7 @@ struct drm_xe_gem_create {
*/
#define XE_GEM_CREATE_FLAG_DEFER_BACKING (0x1 << 24)
#define XE_GEM_CREATE_FLAG_SCANOUT (0x1 << 25)
+#define XE_GEM_CREATE_FLAG_NEEDS_CPU_ACCESS (0x1 << 26)
__u32 flags;
/**
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [Intel-xe] [PATCH v2 14/14] drm/xe: fully turn on small-bar support
2023-02-28 10:41 [Intel-xe] [PATCH v2 00/14] small-bar support Matthew Auld
` (12 preceding siblings ...)
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 13/14] drm/xe/uapi: add the userspace bits for small-bar Matthew Auld
@ 2023-02-28 10:41 ` Matthew Auld
13 siblings, 0 replies; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 10:41 UTC (permalink / raw)
To: intel-xe; +Cc: Lucas De Marchi
This allows vram_size > io_size, instead of just clamping the vram size
to the BAR size, now that the driver supports it.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_mmio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
index 5cacaa05759a..f6becf32ca49 100644
--- a/drivers/gpu/drm/xe/xe_mmio.c
+++ b/drivers/gpu/drm/xe/xe_mmio.c
@@ -216,7 +216,7 @@ int xe_mmio_probe_vram(struct xe_device *xe)
xe->mem.vram.io_start = pci_resource_start(pdev, GEN12_LMEM_BAR);
xe->mem.vram.io_size = min(usable_size,
pci_resource_len(pdev, GEN12_LMEM_BAR));
- xe->mem.vram.size = xe->mem.vram.io_size;
+ xe->mem.vram.size = vram_size;
if (!xe->mem.vram.size)
return -EIO;
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [Intel-xe] [PATCH v2 01/14] drm/xe/display: fix IS_ALDERLAKE_P()
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 01/14] drm/xe/display: fix IS_ALDERLAKE_P() Matthew Auld
@ 2023-02-28 14:09 ` Maarten Lankhorst
2023-02-28 17:57 ` Matt Roper
2023-02-28 18:54 ` Lucas De Marchi
2 siblings, 0 replies; 27+ messages in thread
From: Maarten Lankhorst @ 2023-02-28 14:09 UTC (permalink / raw)
To: Matthew Auld, intel-xe; +Cc: Lucas De Marchi
[-- Attachment #1: Type: text/plain, Size: 1375 bytes --]
Reviewed-by: Maarten Lankhorst<maarten.lankhorst@linux.intel.com>
On 2023-02-28 11:41, Matthew Auld wrote:
> On ADL-P we currently trigger:
>
> xe 0000:00:02.0: drm_WARN_ON(!((dev_priv)->info.platform == XE_ALDERLAKE_S) && !(dev_priv && 0))
> WARNING: CPU: 3 PID: 338 at drivers/gpu/drm/xe/display/ext/intel_pch.c:32 intel_pch_type+0xce/0x2f0 [xe]
>
> Signed-off-by: Matthew Auld<matthew.auld@intel.com>
> Cc: Lucas De Marchi<lucas.demarchi@intel.com>
> ---
> drivers/gpu/drm/xe/display/i915_drv.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/display/i915_drv.h b/drivers/gpu/drm/xe/display/i915_drv.h
> index da83a957dd86..06348cb03f44 100644
> --- a/drivers/gpu/drm/xe/display/i915_drv.h
> +++ b/drivers/gpu/drm/xe/display/i915_drv.h
> @@ -73,7 +73,7 @@ static inline struct drm_i915_private *kdev_to_i915(struct device *kdev)
> #define IS_ROCKETLAKE(dev_priv) (dev_priv && 0)
> #define IS_DG1(dev_priv) IS_PLATFORM(dev_priv, XE_DG1)
> #define IS_ALDERLAKE_S(dev_priv) IS_PLATFORM(dev_priv, XE_ALDERLAKE_S)
> -#define IS_ALDERLAKE_P(dev_priv) (dev_priv && 0)
> +#define IS_ALDERLAKE_P(dev_priv) IS_PLATFORM(dev_priv, XE_ALDERLAKE_P)
> #define IS_XEHPSDV(dev_priv) (dev_priv && 0)
> #define IS_DG2(dev_priv) IS_PLATFORM(dev_priv, XE_DG2)
> #define IS_PONTEVECCHIO(dev_priv) IS_PLATFORM(dev_priv, XE_PVC)
[-- Attachment #2: Type: text/html, Size: 1993 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Intel-xe] [PATCH v2 08/14] drm/xe/buddy: add visible tracking
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 08/14] drm/xe/buddy: add visible tracking Matthew Auld
@ 2023-02-28 14:24 ` Maarten Lankhorst
2023-02-28 15:05 ` Matthew Auld
0 siblings, 1 reply; 27+ messages in thread
From: Maarten Lankhorst @ 2023-02-28 14:24 UTC (permalink / raw)
To: Matthew Auld, intel-xe; +Cc: Lucas De Marchi
On 2023-02-28 11:41, Matthew Auld wrote:
> Replace the allocation code with the i915 version. This simplifies the
> code a little, and importantly we get the accounting at the mgr level,
> which is useful for debug (and maybe userspace), plus per resource
> tracking so we can easily check if a resource is using one or pages in
> the mappable part of vram (useful for eviction), or if the resource is
> completely within the mappable portion (useful for checking if the
> resource can be safely CPU mapped).
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
> drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c | 18 +-
> drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 201 ++++++++++-----------
> drivers/gpu/drm/xe/xe_ttm_vram_mgr.h | 3 +-
> drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h | 6 +
> 4 files changed, 118 insertions(+), 110 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
> index 2e8d07ad42ae..33bbd72711b3 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
> +++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
> @@ -144,7 +144,7 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
> {
> struct xe_ttm_stolen_mgr *mgr = drmm_kzalloc(&xe->drm, sizeof(*mgr), GFP_KERNEL);
> struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> - u64 stolen_size, pgsize;
> + u64 stolen_size, io_size, pgsize;
> int err;
>
> if (IS_DGFX(xe))
> @@ -163,7 +163,17 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
> if (pgsize < PAGE_SIZE)
> pgsize = PAGE_SIZE;
>
> - err = __xe_ttm_vram_mgr_init(xe, &mgr->base, XE_PL_STOLEN, stolen_size, pgsize);
> + /*
> + * We don't try to attempt partial visible support for stolen vram,
> + * since stolen is always at the end of vram, and the BAR size is pretty
> + * much always 256M, with small-bar.
> + */
> + io_size = 0;
> + if (!xe_ttm_stolen_cpu_inaccessible(xe))
> + io_size = stolen_size;
Not directly related to this patch, but you've overloaded the meaning of
stolen_cpu_inaccessible. It was originally meant to indicate whether we
must map the BO through GGTT,
A better name would have been xe_ttm_stolen_in_bar2 or whatever. This
was also visible in its use in xe_ttm_stolen_mgr_init().
It's OK to leave it as is, but the conditional GGTT map in xe_bo.c that
now uses cpu_inaccessible needs to be fixed.
~Maarten
> +
> + err = __xe_ttm_vram_mgr_init(xe, &mgr->base, XE_PL_STOLEN, stolen_size,
> + io_size, pgsize);
> if (err) {
> drm_dbg_kms(&xe->drm, "Stolen mgr init failed: %i\n", err);
> return;
> @@ -172,8 +182,8 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
> drm_dbg_kms(&xe->drm, "Initialized stolen memory support with %llu bytes\n",
> stolen_size);
>
> - if (!xe_ttm_stolen_cpu_inaccessible(xe))
> - mgr->mapping = devm_ioremap_wc(&pdev->dev, mgr->io_base, stolen_size);
> + if (io_size)
> + mgr->mapping = devm_ioremap_wc(&pdev->dev, mgr->io_base, io_size);
> }
>
> u64 xe_ttm_stolen_io_offset(struct xe_bo *bo, u32 offset)
> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> index e3ac8c6d3978..f703e962f499 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> @@ -49,45 +49,26 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
> const struct ttm_place *place,
> struct ttm_resource **res)
> {
> - u64 max_bytes, cur_size, min_block_size;
> struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
> struct xe_ttm_vram_mgr_resource *vres;
> - u64 size, remaining_size, lpfn, fpfn;
> struct drm_buddy *mm = &mgr->mm;
> - unsigned long pages_per_block;
> - int r;
> -
> - lpfn = (u64)place->lpfn << PAGE_SHIFT;
> - if (!lpfn || lpfn > man->size)
> - lpfn = man->size;
> -
> - fpfn = (u64)place->fpfn << PAGE_SHIFT;
> + u64 size, remaining_size, min_page_size;
> + unsigned long lpfn;
> + int err;
>
> - max_bytes = mgr->manager.size;
> - if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
> - pages_per_block = ~0ul;
> - } else {
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - pages_per_block = HPAGE_PMD_NR;
> -#else
> - /* default to 2MB */
> - pages_per_block = 2UL << (20UL - PAGE_SHIFT);
> -#endif
> -
> - pages_per_block = max_t(uint32_t, pages_per_block,
> - tbo->page_alignment);
> - }
> + lpfn = place->lpfn;
> + if (!lpfn || lpfn > man->size >> PAGE_SHIFT)
> + lpfn = man->size >> PAGE_SHIFT;
>
> vres = kzalloc(sizeof(*vres), GFP_KERNEL);
> if (!vres)
> return -ENOMEM;
>
> ttm_resource_init(tbo, place, &vres->base);
> - remaining_size = vres->base.size;
>
> /* bail out quickly if there's likely not enough VRAM for this BO */
> - if (ttm_resource_manager_usage(man) > max_bytes) {
> - r = -ENOSPC;
> + if (ttm_resource_manager_usage(man) > man->size) {
> + err = -ENOSPC;
> goto error_fini;
> }
>
> @@ -96,95 +77,91 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
> if (place->flags & TTM_PL_FLAG_TOPDOWN)
> vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
>
> - if (fpfn || lpfn != man->size)
> - /* Allocate blocks in desired range */
> + if (place->fpfn || lpfn != man->size >> PAGE_SHIFT)
> vres->flags |= DRM_BUDDY_RANGE_ALLOCATION;
>
> - mutex_lock(&mgr->lock);
> - while (remaining_size) {
> - if (tbo->page_alignment)
> - min_block_size = tbo->page_alignment << PAGE_SHIFT;
> - else
> - min_block_size = mgr->default_page_size;
> -
> - XE_BUG_ON(min_block_size < mm->chunk_size);
> -
> - /* Limit maximum size to 2GiB due to SG table limitations */
> - size = min(remaining_size, 2ULL << 30);
> -
> - if (size >= pages_per_block << PAGE_SHIFT)
> - min_block_size = pages_per_block << PAGE_SHIFT;
> -
> - cur_size = size;
> -
> - if (fpfn + size != place->lpfn << PAGE_SHIFT) {
> - /*
> - * Except for actual range allocation, modify the size and
> - * min_block_size conforming to continuous flag enablement
> - */
> - if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
> - size = roundup_pow_of_two(size);
> - min_block_size = size;
> - /*
> - * Modify the size value if size is not
> - * aligned with min_block_size
> - */
> - } else if (!IS_ALIGNED(size, min_block_size)) {
> - size = round_up(size, min_block_size);
> - }
> - }
> + XE_BUG_ON(!vres->base.size);
> + size = vres->base.size;
>
> - r = drm_buddy_alloc_blocks(mm, fpfn,
> - lpfn,
> - size,
> - min_block_size,
> - &vres->blocks,
> - vres->flags);
> - if (unlikely(r))
> - goto error_free_blocks;
> + min_page_size = mgr->default_page_size;
> + if (tbo->page_alignment)
> + min_page_size = tbo->page_alignment << PAGE_SHIFT;
>
> - if (size > remaining_size)
> - remaining_size = 0;
> - else
> - remaining_size -= size;
> + XE_BUG_ON(min_page_size < mm->chunk_size);
> + XE_BUG_ON(min_page_size > SZ_2G); /* FIXME: sg limit */
> + XE_BUG_ON(size > SZ_2G &&
> + (vres->base.placement & TTM_PL_FLAG_CONTIGUOUS));
> + XE_BUG_ON(!IS_ALIGNED(size, min_page_size));
> +
> + if (place->fpfn + (vres->base.size >> PAGE_SHIFT) != place->lpfn &&
> + place->flags & TTM_PL_FLAG_CONTIGUOUS) {
> + unsigned long pages;
> +
> + size = roundup_pow_of_two(size);
> + min_page_size = size;
> +
> + pages = size >> ilog2(mm->chunk_size);
> + if (pages > lpfn)
> + lpfn = pages;
> }
> - mutex_unlock(&mgr->lock);
>
> - if (cur_size != size) {
> - struct drm_buddy_block *block;
> - struct list_head *trim_list;
> - u64 original_size;
> - LIST_HEAD(temp);
> + if (size > lpfn << PAGE_SHIFT) {
> + err = -E2BIG; /* don't trigger eviction */
> + goto error_fini;
> + }
>
> - trim_list = &vres->blocks;
> - original_size = vres->base.size;
> + mutex_lock(&mgr->lock);
> + if (lpfn <= mgr->visible_size && size > mgr->visible_avail) {
> + mutex_unlock(&mgr->lock);
> + err = -ENOSPC;
> + goto error_fini;
> + }
>
> + remaining_size = size;
> + do {
> /*
> - * If size value is rounded up to min_block_size, trim the last
> - * block to the required size
> + * Limit maximum size to 2GiB due to SG table limitations.
> + * FIXME: Should maybe be handled as part of sg construction.
> */
> - if (!list_is_singular(&vres->blocks)) {
> - block = list_last_entry(&vres->blocks, typeof(*block), link);
> - list_move_tail(&block->link, &temp);
> - trim_list = &temp;
> - /*
> - * Compute the original_size value by subtracting the
> - * last block size with (aligned size - original size)
> - */
> - original_size = drm_buddy_block_size(mm, block) -
> - (size - cur_size);
> - }
> + u64 alloc_size = min_t(u64, remaining_size, SZ_2G);
> +
> + err = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
> + (u64)lpfn << PAGE_SHIFT,
> + alloc_size,
> + min_page_size,
> + &vres->blocks,
> + vres->flags);
> + if (err)
> + goto error_free_blocks;
>
> - mutex_lock(&mgr->lock);
> - drm_buddy_block_trim(mm,
> - original_size,
> - trim_list);
> - mutex_unlock(&mgr->lock);
> + remaining_size -= alloc_size;
> + } while (remaining_size);
>
> - if (!list_empty(&temp))
> - list_splice_tail(trim_list, &vres->blocks);
> + if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
> + if (!drm_buddy_block_trim(mm, vres->base.size, &vres->blocks))
> + size = vres->base.size;
> }
>
> + if (lpfn <= mgr->visible_size >> PAGE_SHIFT) {
> + vres->used_visible_size = size;
> + } else {
> + struct drm_buddy_block *block;
> +
> + list_for_each_entry(block, &vres->blocks, link) {
> + u64 start = drm_buddy_block_offset(block);
> +
> + if (start < mgr->visible_size) {
> + u64 end = start + drm_buddy_block_size(mm, block);
> +
> + vres->used_visible_size +=
> + min(end, mgr->visible_size) - start;
> + }
> + }
> + }
> +
> + mgr->visible_avail -= vres->used_visible_size;
> + mutex_unlock(&mgr->lock);
> +
> if (!(vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) &&
> xe_is_vram_mgr_blocks_contiguous(mm, &vres->blocks))
> vres->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
> @@ -213,7 +190,7 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
> ttm_resource_fini(man, &vres->base);
> kfree(vres);
>
> - return r;
> + return err;
> }
>
> static void xe_ttm_vram_mgr_del(struct ttm_resource_manager *man,
> @@ -226,6 +203,7 @@ static void xe_ttm_vram_mgr_del(struct ttm_resource_manager *man,
>
> mutex_lock(&mgr->lock);
> drm_buddy_free_list(mm, &vres->blocks);
> + mgr->visible_avail += vres->used_visible_size;
> mutex_unlock(&mgr->lock);
>
> ttm_resource_fini(man, res);
> @@ -240,6 +218,13 @@ static void xe_ttm_vram_mgr_debug(struct ttm_resource_manager *man,
> struct drm_buddy *mm = &mgr->mm;
>
> mutex_lock(&mgr->lock);
> + drm_printf(printer, "default_page_size: %lluKiB\n",
> + mgr->default_page_size >> 10);
> + drm_printf(printer, "visible_avail: %lluMiB\n",
> + (u64)mgr->visible_avail >> 20);
> + drm_printf(printer, "visible_size: %lluMiB\n",
> + (u64)mgr->visible_size >> 20);
> +
> drm_buddy_print(mm, printer);
> mutex_unlock(&mgr->lock);
> drm_printf(printer, "man size:%llu\n", man->size);
> @@ -262,6 +247,8 @@ static void ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
> if (ttm_resource_manager_evict_all(&xe->ttm, man))
> return;
>
> + WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
> +
> drm_buddy_fini(&mgr->mm);
>
> ttm_resource_manager_cleanup(&mgr->manager);
> @@ -270,7 +257,8 @@ static void ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
> }
>
> int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
> - u32 mem_type, u64 size, u64 default_page_size)
> + u32 mem_type, u64 size, u64 io_size,
> + u64 default_page_size)
> {
> struct ttm_resource_manager *man = &mgr->manager;
> int err;
> @@ -279,6 +267,8 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
> mgr->mem_type = mem_type;
> mutex_init(&mgr->lock);
> mgr->default_page_size = default_page_size;
> + mgr->visible_size = io_size;
> + mgr->visible_avail = io_size;
>
> ttm_resource_manager_init(man, &xe->ttm, size);
> err = drm_buddy_init(&mgr->mm, man->size, default_page_size);
> @@ -298,7 +288,8 @@ int xe_ttm_vram_mgr_init(struct xe_gt *gt, struct xe_ttm_vram_mgr *mgr)
> mgr->gt = gt;
>
> return __xe_ttm_vram_mgr_init(xe, mgr, XE_PL_VRAM0 + gt->info.vram_id,
> - gt->mem.vram.size, PAGE_SIZE);
> + gt->mem.vram.size, gt->mem.vram.io_size,
> + PAGE_SIZE);
> }
>
> int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> index 78f332d26224..35e5367a79fb 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> @@ -13,7 +13,8 @@ struct xe_device;
> struct xe_gt;
>
> int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
> - u32 mem_type, u64 size, u64 default_page_size);
> + u32 mem_type, u64 size, u64 io_size,
> + u64 default_page_size);
> int xe_ttm_vram_mgr_init(struct xe_gt *gt, struct xe_ttm_vram_mgr *mgr);
> int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
> struct ttm_resource *res,
> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> index 39aa2ec1b968..3d9417ff7434 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> @@ -23,6 +23,10 @@ struct xe_ttm_vram_mgr {
> struct ttm_resource_manager manager;
> /** @mm: DRM buddy allocator which manages the VRAM */
> struct drm_buddy mm;
> + /** @visible_size: Proped size of the CPU visible portion */
> + u64 visible_size;
> + /** @visible_avail: CPU visible portion still unallocated */
> + u64 visible_avail;
> /** @default_page_size: default page size */
> u64 default_page_size;
> /** @lock: protects allocations of VRAM */
> @@ -39,6 +43,8 @@ struct xe_ttm_vram_mgr_resource {
> struct ttm_resource base;
> /** @blocks: list of DRM buddy blocks */
> struct list_head blocks;
> + /** @used_visible_size: How many CPU visible bytes this resource is using */
> + u64 used_visible_size;
> /** @flags: flags associated with the resource */
> unsigned long flags;
> };
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Intel-xe] [PATCH v2 12/14] drm/xe/display: annotate CC buffers with NEEDS_CPU_ACCESS
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 12/14] drm/xe/display: annotate CC buffers with NEEDS_CPU_ACCESS Matthew Auld
@ 2023-02-28 14:48 ` Maarten Lankhorst
2023-02-28 15:20 ` Matthew Auld
2023-02-28 15:22 ` Matthew Auld
0 siblings, 2 replies; 27+ messages in thread
From: Maarten Lankhorst @ 2023-02-28 14:48 UTC (permalink / raw)
To: Matthew Auld, intel-xe; +Cc: Lucas De Marchi
Hey,
On 2023-02-28 11:41, Matthew Auld wrote:
> The display code wants to read the clear color value from the buffer.
> However if the buffer is the non-mappable part of lmem then we fail the
> kmap. The simplest solution is to just mark the buffer with
> XE_BO_NEEDS_CPU_ACCESS, which will either allocate the buffer in the
> mappable part of lmem, or migrate it there.
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
> drivers/gpu/drm/xe/display/xe_fb_pin.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> index 65c0bc28a3d1..66e1309e21d8 100644
> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> @@ -203,6 +203,14 @@ static struct i915_vma *__xe_pin_fb_vma(struct intel_framebuffer *fb,
> if (ret)
> goto err;
>
> + /*
> + * For this type of buffer we need to able to read from the CPU the
> + * clear color value found in the buffer. This doesn't do anything on
> + * non small-bar devices.
> + */
> + if (intel_fb_rc_ccs_cc_plane(&fb->base) >= 0)
> + bo->flags |= XE_BO_NEEDS_CPU_ACCESS;
While we do need a change, do we need to do this inside pinning? We
could also require userspace to create VRAM BO's with CPU_ACCESS, and
reject CCS here.
Of course we should probably also add a UAPI to allow setting the
SCANOUT flag on externally imported DMA-BUF bo's, i don't think we
require CPU_ACCESS flag for that as it's not our BO.
Might require some more thinking on how we want to handle this.
Other patches look good, except for the comments I had:
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
~Maarten
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Intel-xe] [PATCH v2 08/14] drm/xe/buddy: add visible tracking
2023-02-28 14:24 ` Maarten Lankhorst
@ 2023-02-28 15:05 ` Matthew Auld
2023-02-28 20:26 ` Maarten Lankhorst
0 siblings, 1 reply; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 15:05 UTC (permalink / raw)
To: Maarten Lankhorst, intel-xe; +Cc: Lucas De Marchi
On 28/02/2023 14:24, Maarten Lankhorst wrote:
> On 2023-02-28 11:41, Matthew Auld wrote:
>> Replace the allocation code with the i915 version. This simplifies the
>> code a little, and importantly we get the accounting at the mgr level,
>> which is useful for debug (and maybe userspace), plus per resource
>> tracking so we can easily check if a resource is using one or pages in
>> the mappable part of vram (useful for eviction), or if the resource is
>> completely within the mappable portion (useful for checking if the
>> resource can be safely CPU mapped).
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c | 18 +-
>> drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 201 ++++++++++-----------
>> drivers/gpu/drm/xe/xe_ttm_vram_mgr.h | 3 +-
>> drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h | 6 +
>> 4 files changed, 118 insertions(+), 110 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>> b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>> index 2e8d07ad42ae..33bbd72711b3 100644
>> --- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>> +++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>> @@ -144,7 +144,7 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
>> {
>> struct xe_ttm_stolen_mgr *mgr = drmm_kzalloc(&xe->drm,
>> sizeof(*mgr), GFP_KERNEL);
>> struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
>> - u64 stolen_size, pgsize;
>> + u64 stolen_size, io_size, pgsize;
>> int err;
>> if (IS_DGFX(xe))
>> @@ -163,7 +163,17 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
>> if (pgsize < PAGE_SIZE)
>> pgsize = PAGE_SIZE;
>> - err = __xe_ttm_vram_mgr_init(xe, &mgr->base, XE_PL_STOLEN,
>> stolen_size, pgsize);
>> + /*
>> + * We don't try to attempt partial visible support for stolen vram,
>> + * since stolen is always at the end of vram, and the BAR size is
>> pretty
>> + * much always 256M, with small-bar.
>> + */
>> + io_size = 0;
>> + if (!xe_ttm_stolen_cpu_inaccessible(xe))
>> + io_size = stolen_size;
>
> Not directly related to this patch, but you've overloaded the meaning of
> stolen_cpu_inaccessible. It was originally meant to indicate whether we
> must map the BO through GGTT,
>
> A better name would have been xe_ttm_stolen_in_bar2 or whatever. This
> was also visible in its use in xe_ttm_stolen_mgr_init().
>
> It's OK to leave it as is, but the conditional GGTT map in xe_bo.c that
> now uses cpu_inaccessible needs to be fixed.
OK, my thinking was that it didn't really matter given that xe_bo_vmap()
will eventually fail anyway (it should return -EIO I think), since lack
of mappable aperture on dgpu means we have no fallback mode.
I was interperating the:
if (flags & XE_BO_CREATE_STOLEN_BIT &&
xe_ttm_stolen_cpu_inaccessible(xe))
flags |= XE_BO_CREATE_GGTT_BIT;
to just mean "please *attempt* to map through the aperture, if not
directly CPU accessible".
Should I just add an is_dgpu() check and bail early? Also thanks for
taking a look at the series.
>
> ~Maarten
>
>
>> +
>> + err = __xe_ttm_vram_mgr_init(xe, &mgr->base, XE_PL_STOLEN,
>> stolen_size,
>> + io_size, pgsize);
>> if (err) {
>> drm_dbg_kms(&xe->drm, "Stolen mgr init failed: %i\n", err);
>> return;
>> @@ -172,8 +182,8 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
>> drm_dbg_kms(&xe->drm, "Initialized stolen memory support with
>> %llu bytes\n",
>> stolen_size);
>> - if (!xe_ttm_stolen_cpu_inaccessible(xe))
>> - mgr->mapping = devm_ioremap_wc(&pdev->dev, mgr->io_base,
>> stolen_size);
>> + if (io_size)
>> + mgr->mapping = devm_ioremap_wc(&pdev->dev, mgr->io_base,
>> io_size);
>> }
>> u64 xe_ttm_stolen_io_offset(struct xe_bo *bo, u32 offset)
>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>> index e3ac8c6d3978..f703e962f499 100644
>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>> @@ -49,45 +49,26 @@ static int xe_ttm_vram_mgr_new(struct
>> ttm_resource_manager *man,
>> const struct ttm_place *place,
>> struct ttm_resource **res)
>> {
>> - u64 max_bytes, cur_size, min_block_size;
>> struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
>> struct xe_ttm_vram_mgr_resource *vres;
>> - u64 size, remaining_size, lpfn, fpfn;
>> struct drm_buddy *mm = &mgr->mm;
>> - unsigned long pages_per_block;
>> - int r;
>> -
>> - lpfn = (u64)place->lpfn << PAGE_SHIFT;
>> - if (!lpfn || lpfn > man->size)
>> - lpfn = man->size;
>> -
>> - fpfn = (u64)place->fpfn << PAGE_SHIFT;
>> + u64 size, remaining_size, min_page_size;
>> + unsigned long lpfn;
>> + int err;
>> - max_bytes = mgr->manager.size;
>> - if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>> - pages_per_block = ~0ul;
>> - } else {
>> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> - pages_per_block = HPAGE_PMD_NR;
>> -#else
>> - /* default to 2MB */
>> - pages_per_block = 2UL << (20UL - PAGE_SHIFT);
>> -#endif
>> -
>> - pages_per_block = max_t(uint32_t, pages_per_block,
>> - tbo->page_alignment);
>> - }
>> + lpfn = place->lpfn;
>> + if (!lpfn || lpfn > man->size >> PAGE_SHIFT)
>> + lpfn = man->size >> PAGE_SHIFT;
>> vres = kzalloc(sizeof(*vres), GFP_KERNEL);
>> if (!vres)
>> return -ENOMEM;
>> ttm_resource_init(tbo, place, &vres->base);
>> - remaining_size = vres->base.size;
>> /* bail out quickly if there's likely not enough VRAM for this
>> BO */
>> - if (ttm_resource_manager_usage(man) > max_bytes) {
>> - r = -ENOSPC;
>> + if (ttm_resource_manager_usage(man) > man->size) {
>> + err = -ENOSPC;
>> goto error_fini;
>> }
>> @@ -96,95 +77,91 @@ static int xe_ttm_vram_mgr_new(struct
>> ttm_resource_manager *man,
>> if (place->flags & TTM_PL_FLAG_TOPDOWN)
>> vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
>> - if (fpfn || lpfn != man->size)
>> - /* Allocate blocks in desired range */
>> + if (place->fpfn || lpfn != man->size >> PAGE_SHIFT)
>> vres->flags |= DRM_BUDDY_RANGE_ALLOCATION;
>> - mutex_lock(&mgr->lock);
>> - while (remaining_size) {
>> - if (tbo->page_alignment)
>> - min_block_size = tbo->page_alignment << PAGE_SHIFT;
>> - else
>> - min_block_size = mgr->default_page_size;
>> -
>> - XE_BUG_ON(min_block_size < mm->chunk_size);
>> -
>> - /* Limit maximum size to 2GiB due to SG table limitations */
>> - size = min(remaining_size, 2ULL << 30);
>> -
>> - if (size >= pages_per_block << PAGE_SHIFT)
>> - min_block_size = pages_per_block << PAGE_SHIFT;
>> -
>> - cur_size = size;
>> -
>> - if (fpfn + size != place->lpfn << PAGE_SHIFT) {
>> - /*
>> - * Except for actual range allocation, modify the size and
>> - * min_block_size conforming to continuous flag enablement
>> - */
>> - if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>> - size = roundup_pow_of_two(size);
>> - min_block_size = size;
>> - /*
>> - * Modify the size value if size is not
>> - * aligned with min_block_size
>> - */
>> - } else if (!IS_ALIGNED(size, min_block_size)) {
>> - size = round_up(size, min_block_size);
>> - }
>> - }
>> + XE_BUG_ON(!vres->base.size);
>> + size = vres->base.size;
>> - r = drm_buddy_alloc_blocks(mm, fpfn,
>> - lpfn,
>> - size,
>> - min_block_size,
>> - &vres->blocks,
>> - vres->flags);
>> - if (unlikely(r))
>> - goto error_free_blocks;
>> + min_page_size = mgr->default_page_size;
>> + if (tbo->page_alignment)
>> + min_page_size = tbo->page_alignment << PAGE_SHIFT;
>> - if (size > remaining_size)
>> - remaining_size = 0;
>> - else
>> - remaining_size -= size;
>> + XE_BUG_ON(min_page_size < mm->chunk_size);
>> + XE_BUG_ON(min_page_size > SZ_2G); /* FIXME: sg limit */
>> + XE_BUG_ON(size > SZ_2G &&
>> + (vres->base.placement & TTM_PL_FLAG_CONTIGUOUS));
>> + XE_BUG_ON(!IS_ALIGNED(size, min_page_size));
>> +
>> + if (place->fpfn + (vres->base.size >> PAGE_SHIFT) != place->lpfn &&
>> + place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>> + unsigned long pages;
>> +
>> + size = roundup_pow_of_two(size);
>> + min_page_size = size;
>> +
>> + pages = size >> ilog2(mm->chunk_size);
>> + if (pages > lpfn)
>> + lpfn = pages;
>> }
>> - mutex_unlock(&mgr->lock);
>> - if (cur_size != size) {
>> - struct drm_buddy_block *block;
>> - struct list_head *trim_list;
>> - u64 original_size;
>> - LIST_HEAD(temp);
>> + if (size > lpfn << PAGE_SHIFT) {
>> + err = -E2BIG; /* don't trigger eviction */
>> + goto error_fini;
>> + }
>> - trim_list = &vres->blocks;
>> - original_size = vres->base.size;
>> + mutex_lock(&mgr->lock);
>> + if (lpfn <= mgr->visible_size && size > mgr->visible_avail) {
>> + mutex_unlock(&mgr->lock);
>> + err = -ENOSPC;
>> + goto error_fini;
>> + }
>> + remaining_size = size;
>> + do {
>> /*
>> - * If size value is rounded up to min_block_size, trim the last
>> - * block to the required size
>> + * Limit maximum size to 2GiB due to SG table limitations.
>> + * FIXME: Should maybe be handled as part of sg construction.
>> */
>> - if (!list_is_singular(&vres->blocks)) {
>> - block = list_last_entry(&vres->blocks, typeof(*block),
>> link);
>> - list_move_tail(&block->link, &temp);
>> - trim_list = &temp;
>> - /*
>> - * Compute the original_size value by subtracting the
>> - * last block size with (aligned size - original size)
>> - */
>> - original_size = drm_buddy_block_size(mm, block) -
>> - (size - cur_size);
>> - }
>> + u64 alloc_size = min_t(u64, remaining_size, SZ_2G);
>> +
>> + err = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
>> + (u64)lpfn << PAGE_SHIFT,
>> + alloc_size,
>> + min_page_size,
>> + &vres->blocks,
>> + vres->flags);
>> + if (err)
>> + goto error_free_blocks;
>> - mutex_lock(&mgr->lock);
>> - drm_buddy_block_trim(mm,
>> - original_size,
>> - trim_list);
>> - mutex_unlock(&mgr->lock);
>> + remaining_size -= alloc_size;
>> + } while (remaining_size);
>> - if (!list_empty(&temp))
>> - list_splice_tail(trim_list, &vres->blocks);
>> + if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>> + if (!drm_buddy_block_trim(mm, vres->base.size, &vres->blocks))
>> + size = vres->base.size;
>> }
>> + if (lpfn <= mgr->visible_size >> PAGE_SHIFT) {
>> + vres->used_visible_size = size;
>> + } else {
>> + struct drm_buddy_block *block;
>> +
>> + list_for_each_entry(block, &vres->blocks, link) {
>> + u64 start = drm_buddy_block_offset(block);
>> +
>> + if (start < mgr->visible_size) {
>> + u64 end = start + drm_buddy_block_size(mm, block);
>> +
>> + vres->used_visible_size +=
>> + min(end, mgr->visible_size) - start;
>> + }
>> + }
>> + }
>> +
>> + mgr->visible_avail -= vres->used_visible_size;
>> + mutex_unlock(&mgr->lock);
>> +
>> if (!(vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) &&
>> xe_is_vram_mgr_blocks_contiguous(mm, &vres->blocks))
>> vres->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
>> @@ -213,7 +190,7 @@ static int xe_ttm_vram_mgr_new(struct
>> ttm_resource_manager *man,
>> ttm_resource_fini(man, &vres->base);
>> kfree(vres);
>> - return r;
>> + return err;
>> }
>> static void xe_ttm_vram_mgr_del(struct ttm_resource_manager *man,
>> @@ -226,6 +203,7 @@ static void xe_ttm_vram_mgr_del(struct
>> ttm_resource_manager *man,
>> mutex_lock(&mgr->lock);
>> drm_buddy_free_list(mm, &vres->blocks);
>> + mgr->visible_avail += vres->used_visible_size;
>> mutex_unlock(&mgr->lock);
>> ttm_resource_fini(man, res);
>> @@ -240,6 +218,13 @@ static void xe_ttm_vram_mgr_debug(struct
>> ttm_resource_manager *man,
>> struct drm_buddy *mm = &mgr->mm;
>> mutex_lock(&mgr->lock);
>> + drm_printf(printer, "default_page_size: %lluKiB\n",
>> + mgr->default_page_size >> 10);
>> + drm_printf(printer, "visible_avail: %lluMiB\n",
>> + (u64)mgr->visible_avail >> 20);
>> + drm_printf(printer, "visible_size: %lluMiB\n",
>> + (u64)mgr->visible_size >> 20);
>> +
>> drm_buddy_print(mm, printer);
>> mutex_unlock(&mgr->lock);
>> drm_printf(printer, "man size:%llu\n", man->size);
>> @@ -262,6 +247,8 @@ static void ttm_vram_mgr_fini(struct drm_device
>> *dev, void *arg)
>> if (ttm_resource_manager_evict_all(&xe->ttm, man))
>> return;
>> + WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
>> +
>> drm_buddy_fini(&mgr->mm);
>> ttm_resource_manager_cleanup(&mgr->manager);
>> @@ -270,7 +257,8 @@ static void ttm_vram_mgr_fini(struct drm_device
>> *dev, void *arg)
>> }
>> int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct
>> xe_ttm_vram_mgr *mgr,
>> - u32 mem_type, u64 size, u64 default_page_size)
>> + u32 mem_type, u64 size, u64 io_size,
>> + u64 default_page_size)
>> {
>> struct ttm_resource_manager *man = &mgr->manager;
>> int err;
>> @@ -279,6 +267,8 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe,
>> struct xe_ttm_vram_mgr *mgr,
>> mgr->mem_type = mem_type;
>> mutex_init(&mgr->lock);
>> mgr->default_page_size = default_page_size;
>> + mgr->visible_size = io_size;
>> + mgr->visible_avail = io_size;
>> ttm_resource_manager_init(man, &xe->ttm, size);
>> err = drm_buddy_init(&mgr->mm, man->size, default_page_size);
>> @@ -298,7 +288,8 @@ int xe_ttm_vram_mgr_init(struct xe_gt *gt, struct
>> xe_ttm_vram_mgr *mgr)
>> mgr->gt = gt;
>> return __xe_ttm_vram_mgr_init(xe, mgr, XE_PL_VRAM0 +
>> gt->info.vram_id,
>> - gt->mem.vram.size, PAGE_SIZE);
>> + gt->mem.vram.size, gt->mem.vram.io_size,
>> + PAGE_SIZE);
>> }
>> int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>> index 78f332d26224..35e5367a79fb 100644
>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>> @@ -13,7 +13,8 @@ struct xe_device;
>> struct xe_gt;
>> int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct
>> xe_ttm_vram_mgr *mgr,
>> - u32 mem_type, u64 size, u64 default_page_size);
>> + u32 mem_type, u64 size, u64 io_size,
>> + u64 default_page_size);
>> int xe_ttm_vram_mgr_init(struct xe_gt *gt, struct xe_ttm_vram_mgr
>> *mgr);
>> int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
>> struct ttm_resource *res,
>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>> index 39aa2ec1b968..3d9417ff7434 100644
>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>> @@ -23,6 +23,10 @@ struct xe_ttm_vram_mgr {
>> struct ttm_resource_manager manager;
>> /** @mm: DRM buddy allocator which manages the VRAM */
>> struct drm_buddy mm;
>> + /** @visible_size: Proped size of the CPU visible portion */
>> + u64 visible_size;
>> + /** @visible_avail: CPU visible portion still unallocated */
>> + u64 visible_avail;
>> /** @default_page_size: default page size */
>> u64 default_page_size;
>> /** @lock: protects allocations of VRAM */
>> @@ -39,6 +43,8 @@ struct xe_ttm_vram_mgr_resource {
>> struct ttm_resource base;
>> /** @blocks: list of DRM buddy blocks */
>> struct list_head blocks;
>> + /** @used_visible_size: How many CPU visible bytes this resource
>> is using */
>> + u64 used_visible_size;
>> /** @flags: flags associated with the resource */
>> unsigned long flags;
>> };
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Intel-xe] [PATCH v2 12/14] drm/xe/display: annotate CC buffers with NEEDS_CPU_ACCESS
2023-02-28 14:48 ` Maarten Lankhorst
@ 2023-02-28 15:20 ` Matthew Auld
2023-03-02 11:51 ` Maarten Lankhorst
2023-02-28 15:22 ` Matthew Auld
1 sibling, 1 reply; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 15:20 UTC (permalink / raw)
To: Maarten Lankhorst, intel-xe; +Cc: Lucas De Marchi
On 28/02/2023 14:48, Maarten Lankhorst wrote:
> Hey,
>
> On 2023-02-28 11:41, Matthew Auld wrote:
>> The display code wants to read the clear color value from the buffer.
>> However if the buffer is the non-mappable part of lmem then we fail the
>> kmap. The simplest solution is to just mark the buffer with
>> XE_BO_NEEDS_CPU_ACCESS, which will either allocate the buffer in the
>> mappable part of lmem, or migrate it there.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> ---
>> drivers/gpu/drm/xe/display/xe_fb_pin.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> index 65c0bc28a3d1..66e1309e21d8 100644
>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> @@ -203,6 +203,14 @@ static struct i915_vma *__xe_pin_fb_vma(struct
>> intel_framebuffer *fb,
>> if (ret)
>> goto err;
>> + /*
>> + * For this type of buffer we need to able to read from the CPU the
>> + * clear color value found in the buffer. This doesn't do
>> anything on
>> + * non small-bar devices.
>> + */
>> + if (intel_fb_rc_ccs_cc_plane(&fb->base) >= 0)
>> + bo->flags |= XE_BO_NEEDS_CPU_ACCESS;
>
> While we do need a change, do we need to do this inside pinning? We
> could also require userspace to create VRAM BO's with CPU_ACCESS, and
> reject CCS here.
Do you mean reject CCS if CPU_ACCESS is not set? The trouble is that
CPU_ACCESS also requires system memory as a spill option when specifying
the placements, which then prevents using CCS. Or at least that's how it
was in i915.
>
> Of course we should probably also add a UAPI to allow setting the
> SCANOUT flag on externally imported DMA-BUF bo's, i don't think we
> require CPU_ACCESS flag for that as it's not our BO.
Hmm, maybe if SCANOUT then internally use the mappable part of lmem by
default? For dumb buffers I think it does something like that. And keep
the above as a fallback?
>
> Might require some more thinking on how we want to handle this.
>
> Other patches look good, except for the comments I had:
>
> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>
> ~Maarten
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Intel-xe] [PATCH v2 12/14] drm/xe/display: annotate CC buffers with NEEDS_CPU_ACCESS
2023-02-28 14:48 ` Maarten Lankhorst
2023-02-28 15:20 ` Matthew Auld
@ 2023-02-28 15:22 ` Matthew Auld
1 sibling, 0 replies; 27+ messages in thread
From: Matthew Auld @ 2023-02-28 15:22 UTC (permalink / raw)
To: Maarten Lankhorst, intel-xe; +Cc: Lucas De Marchi
On 28/02/2023 14:48, Maarten Lankhorst wrote:
> Hey,
>
> On 2023-02-28 11:41, Matthew Auld wrote:
>> The display code wants to read the clear color value from the buffer.
>> However if the buffer is the non-mappable part of lmem then we fail the
>> kmap. The simplest solution is to just mark the buffer with
>> XE_BO_NEEDS_CPU_ACCESS, which will either allocate the buffer in the
>> mappable part of lmem, or migrate it there.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> ---
>> drivers/gpu/drm/xe/display/xe_fb_pin.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> index 65c0bc28a3d1..66e1309e21d8 100644
>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> @@ -203,6 +203,14 @@ static struct i915_vma *__xe_pin_fb_vma(struct
>> intel_framebuffer *fb,
>> if (ret)
>> goto err;
>> + /*
>> + * For this type of buffer we need to able to read from the CPU the
>> + * clear color value found in the buffer. This doesn't do
>> anything on
>> + * non small-bar devices.
>> + */
>> + if (intel_fb_rc_ccs_cc_plane(&fb->base) >= 0)
>> + bo->flags |= XE_BO_NEEDS_CPU_ACCESS;
>
> While we do need a change, do we need to do this inside pinning? We
> could also require userspace to create VRAM BO's with CPU_ACCESS, and
> reject CCS here.
>
> Of course we should probably also add a UAPI to allow setting the
> SCANOUT flag on externally imported DMA-BUF bo's, i don't think we
> require CPU_ACCESS flag for that as it's not our BO.
>
> Might require some more thinking on how we want to handle this.
>
> Other patches look good, except for the comments I had:
>
> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Thanks.
>
> ~Maarten
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Intel-xe] [PATCH v2 01/14] drm/xe/display: fix IS_ALDERLAKE_P()
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 01/14] drm/xe/display: fix IS_ALDERLAKE_P() Matthew Auld
2023-02-28 14:09 ` Maarten Lankhorst
@ 2023-02-28 17:57 ` Matt Roper
2023-02-28 18:54 ` Lucas De Marchi
2 siblings, 0 replies; 27+ messages in thread
From: Matt Roper @ 2023-02-28 17:57 UTC (permalink / raw)
To: Matthew Auld; +Cc: Lucas De Marchi, intel-xe
On Tue, Feb 28, 2023 at 10:41:24AM +0000, Matthew Auld wrote:
> On ADL-P we currently trigger:
>
> xe 0000:00:02.0: drm_WARN_ON(!((dev_priv)->info.platform == XE_ALDERLAKE_S) && !(dev_priv && 0))
> WARNING: CPU: 3 PID: 338 at drivers/gpu/drm/xe/display/ext/intel_pch.c:32 intel_pch_type+0xce/0x2f0 [xe]
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
> ---
> drivers/gpu/drm/xe/display/i915_drv.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/display/i915_drv.h b/drivers/gpu/drm/xe/display/i915_drv.h
> index da83a957dd86..06348cb03f44 100644
> --- a/drivers/gpu/drm/xe/display/i915_drv.h
> +++ b/drivers/gpu/drm/xe/display/i915_drv.h
> @@ -73,7 +73,7 @@ static inline struct drm_i915_private *kdev_to_i915(struct device *kdev)
> #define IS_ROCKETLAKE(dev_priv) (dev_priv && 0)
> #define IS_DG1(dev_priv) IS_PLATFORM(dev_priv, XE_DG1)
> #define IS_ALDERLAKE_S(dev_priv) IS_PLATFORM(dev_priv, XE_ALDERLAKE_S)
> -#define IS_ALDERLAKE_P(dev_priv) (dev_priv && 0)
> +#define IS_ALDERLAKE_P(dev_priv) IS_PLATFORM(dev_priv, XE_ALDERLAKE_P)
> #define IS_XEHPSDV(dev_priv) (dev_priv && 0)
> #define IS_DG2(dev_priv) IS_PLATFORM(dev_priv, XE_DG2)
> #define IS_PONTEVECCHIO(dev_priv) IS_PLATFORM(dev_priv, XE_PVC)
> --
> 2.39.2
>
--
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Intel-xe] [PATCH v2 01/14] drm/xe/display: fix IS_ALDERLAKE_P()
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 01/14] drm/xe/display: fix IS_ALDERLAKE_P() Matthew Auld
2023-02-28 14:09 ` Maarten Lankhorst
2023-02-28 17:57 ` Matt Roper
@ 2023-02-28 18:54 ` Lucas De Marchi
2 siblings, 0 replies; 27+ messages in thread
From: Lucas De Marchi @ 2023-02-28 18:54 UTC (permalink / raw)
To: Matthew Auld; +Cc: intel-xe
On Tue, Feb 28, 2023 at 10:41:24AM +0000, Matthew Auld wrote:
>On ADL-P we currently trigger:
>
>xe 0000:00:02.0: drm_WARN_ON(!((dev_priv)->info.platform == XE_ALDERLAKE_S) && !(dev_priv && 0))
>WARNING: CPU: 3 PID: 338 at drivers/gpu/drm/xe/display/ext/intel_pch.c:32 intel_pch_type+0xce/0x2f0 [xe]
>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/231
or maybe just "References" ?
>Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Lucas De Marchi
>---
> drivers/gpu/drm/xe/display/i915_drv.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/xe/display/i915_drv.h b/drivers/gpu/drm/xe/display/i915_drv.h
>index da83a957dd86..06348cb03f44 100644
>--- a/drivers/gpu/drm/xe/display/i915_drv.h
>+++ b/drivers/gpu/drm/xe/display/i915_drv.h
>@@ -73,7 +73,7 @@ static inline struct drm_i915_private *kdev_to_i915(struct device *kdev)
> #define IS_ROCKETLAKE(dev_priv) (dev_priv && 0)
> #define IS_DG1(dev_priv) IS_PLATFORM(dev_priv, XE_DG1)
> #define IS_ALDERLAKE_S(dev_priv) IS_PLATFORM(dev_priv, XE_ALDERLAKE_S)
>-#define IS_ALDERLAKE_P(dev_priv) (dev_priv && 0)
>+#define IS_ALDERLAKE_P(dev_priv) IS_PLATFORM(dev_priv, XE_ALDERLAKE_P)
> #define IS_XEHPSDV(dev_priv) (dev_priv && 0)
> #define IS_DG2(dev_priv) IS_PLATFORM(dev_priv, XE_DG2)
> #define IS_PONTEVECCHIO(dev_priv) IS_PLATFORM(dev_priv, XE_PVC)
>--
>2.39.2
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Intel-xe] [PATCH v2 08/14] drm/xe/buddy: add visible tracking
2023-02-28 15:05 ` Matthew Auld
@ 2023-02-28 20:26 ` Maarten Lankhorst
0 siblings, 0 replies; 27+ messages in thread
From: Maarten Lankhorst @ 2023-02-28 20:26 UTC (permalink / raw)
To: Matthew Auld, intel-xe; +Cc: Lucas De Marchi
On 2023-02-28 16:05, Matthew Auld wrote:
> On 28/02/2023 14:24, Maarten Lankhorst wrote:
>> On 2023-02-28 11:41, Matthew Auld wrote:
>>> Replace the allocation code with the i915 version. This simplifies the
>>> code a little, and importantly we get the accounting at the mgr level,
>>> which is useful for debug (and maybe userspace), plus per resource
>>> tracking so we can easily check if a resource is using one or pages in
>>> the mappable part of vram (useful for eviction), or if the resource is
>>> completely within the mappable portion (useful for checking if the
>>> resource can be safely CPU mapped).
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>> ---
>>> drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c | 18 +-
>>> drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 201
>>> ++++++++++-----------
>>> drivers/gpu/drm/xe/xe_ttm_vram_mgr.h | 3 +-
>>> drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h | 6 +
>>> 4 files changed, 118 insertions(+), 110 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>>> b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>>> index 2e8d07ad42ae..33bbd72711b3 100644
>>> --- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>>> +++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>>> @@ -144,7 +144,7 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
>>> {
>>> struct xe_ttm_stolen_mgr *mgr = drmm_kzalloc(&xe->drm,
>>> sizeof(*mgr), GFP_KERNEL);
>>> struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
>>> - u64 stolen_size, pgsize;
>>> + u64 stolen_size, io_size, pgsize;
>>> int err;
>>> if (IS_DGFX(xe))
>>> @@ -163,7 +163,17 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
>>> if (pgsize < PAGE_SIZE)
>>> pgsize = PAGE_SIZE;
>>> - err = __xe_ttm_vram_mgr_init(xe, &mgr->base, XE_PL_STOLEN,
>>> stolen_size, pgsize);
>>> + /*
>>> + * We don't try to attempt partial visible support for stolen
>>> vram,
>>> + * since stolen is always at the end of vram, and the BAR size
>>> is pretty
>>> + * much always 256M, with small-bar.
>>> + */
>>> + io_size = 0;
>>> + if (!xe_ttm_stolen_cpu_inaccessible(xe))
>>> + io_size = stolen_size;
>>
>> Not directly related to this patch, but you've overloaded the meaning
>> of stolen_cpu_inaccessible. It was originally meant to indicate
>> whether we must map the BO through GGTT,
>>
>> A better name would have been xe_ttm_stolen_in_bar2 or whatever. This
>> was also visible in its use in xe_ttm_stolen_mgr_init().
>>
>> It's OK to leave it as is, but the conditional GGTT map in xe_bo.c
>> that now uses cpu_inaccessible needs to be fixed.
>
> OK, my thinking was that it didn't really matter given that
> xe_bo_vmap() will eventually fail anyway (it should return -EIO I
> think), since lack of mappable aperture on dgpu means we have no
> fallback mode.
>
> I was interperating the:
>
> if (flags & XE_BO_CREATE_STOLEN_BIT &&
> xe_ttm_stolen_cpu_inaccessible(xe))
> flags |= XE_BO_CREATE_GGTT_BIT;
>
> to just mean "please *attempt* to map through the aperture, if not
> directly CPU accessible".
>
> Should I just add an is_dgpu() check and bail early? Also thanks for
> taking a look at the series.
I think it should be a separate macro, or hardcoded !IS_DGFX && < 1270
>
>>
>> ~Maarten
>>
>>
>>> +
>>> + err = __xe_ttm_vram_mgr_init(xe, &mgr->base, XE_PL_STOLEN,
>>> stolen_size,
>>> + io_size, pgsize);
>>> if (err) {
>>> drm_dbg_kms(&xe->drm, "Stolen mgr init failed: %i\n", err);
>>> return;
>>> @@ -172,8 +182,8 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
>>> drm_dbg_kms(&xe->drm, "Initialized stolen memory support with
>>> %llu bytes\n",
>>> stolen_size);
>>> - if (!xe_ttm_stolen_cpu_inaccessible(xe))
>>> - mgr->mapping = devm_ioremap_wc(&pdev->dev, mgr->io_base,
>>> stolen_size);
>>> + if (io_size)
>>> + mgr->mapping = devm_ioremap_wc(&pdev->dev, mgr->io_base,
>>> io_size);
>>> }
>>> u64 xe_ttm_stolen_io_offset(struct xe_bo *bo, u32 offset)
>>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>>> index e3ac8c6d3978..f703e962f499 100644
>>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>>> @@ -49,45 +49,26 @@ static int xe_ttm_vram_mgr_new(struct
>>> ttm_resource_manager *man,
>>> const struct ttm_place *place,
>>> struct ttm_resource **res)
>>> {
>>> - u64 max_bytes, cur_size, min_block_size;
>>> struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
>>> struct xe_ttm_vram_mgr_resource *vres;
>>> - u64 size, remaining_size, lpfn, fpfn;
>>> struct drm_buddy *mm = &mgr->mm;
>>> - unsigned long pages_per_block;
>>> - int r;
>>> -
>>> - lpfn = (u64)place->lpfn << PAGE_SHIFT;
>>> - if (!lpfn || lpfn > man->size)
>>> - lpfn = man->size;
>>> -
>>> - fpfn = (u64)place->fpfn << PAGE_SHIFT;
>>> + u64 size, remaining_size, min_page_size;
>>> + unsigned long lpfn;
>>> + int err;
>>> - max_bytes = mgr->manager.size;
>>> - if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>>> - pages_per_block = ~0ul;
>>> - } else {
>>> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> - pages_per_block = HPAGE_PMD_NR;
>>> -#else
>>> - /* default to 2MB */
>>> - pages_per_block = 2UL << (20UL - PAGE_SHIFT);
>>> -#endif
>>> -
>>> - pages_per_block = max_t(uint32_t, pages_per_block,
>>> - tbo->page_alignment);
>>> - }
>>> + lpfn = place->lpfn;
>>> + if (!lpfn || lpfn > man->size >> PAGE_SHIFT)
>>> + lpfn = man->size >> PAGE_SHIFT;
>>> vres = kzalloc(sizeof(*vres), GFP_KERNEL);
>>> if (!vres)
>>> return -ENOMEM;
>>> ttm_resource_init(tbo, place, &vres->base);
>>> - remaining_size = vres->base.size;
>>> /* bail out quickly if there's likely not enough VRAM for this
>>> BO */
>>> - if (ttm_resource_manager_usage(man) > max_bytes) {
>>> - r = -ENOSPC;
>>> + if (ttm_resource_manager_usage(man) > man->size) {
>>> + err = -ENOSPC;
>>> goto error_fini;
>>> }
>>> @@ -96,95 +77,91 @@ static int xe_ttm_vram_mgr_new(struct
>>> ttm_resource_manager *man,
>>> if (place->flags & TTM_PL_FLAG_TOPDOWN)
>>> vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
>>> - if (fpfn || lpfn != man->size)
>>> - /* Allocate blocks in desired range */
>>> + if (place->fpfn || lpfn != man->size >> PAGE_SHIFT)
>>> vres->flags |= DRM_BUDDY_RANGE_ALLOCATION;
>>> - mutex_lock(&mgr->lock);
>>> - while (remaining_size) {
>>> - if (tbo->page_alignment)
>>> - min_block_size = tbo->page_alignment << PAGE_SHIFT;
>>> - else
>>> - min_block_size = mgr->default_page_size;
>>> -
>>> - XE_BUG_ON(min_block_size < mm->chunk_size);
>>> -
>>> - /* Limit maximum size to 2GiB due to SG table limitations */
>>> - size = min(remaining_size, 2ULL << 30);
>>> -
>>> - if (size >= pages_per_block << PAGE_SHIFT)
>>> - min_block_size = pages_per_block << PAGE_SHIFT;
>>> -
>>> - cur_size = size;
>>> -
>>> - if (fpfn + size != place->lpfn << PAGE_SHIFT) {
>>> - /*
>>> - * Except for actual range allocation, modify the size and
>>> - * min_block_size conforming to continuous flag enablement
>>> - */
>>> - if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>>> - size = roundup_pow_of_two(size);
>>> - min_block_size = size;
>>> - /*
>>> - * Modify the size value if size is not
>>> - * aligned with min_block_size
>>> - */
>>> - } else if (!IS_ALIGNED(size, min_block_size)) {
>>> - size = round_up(size, min_block_size);
>>> - }
>>> - }
>>> + XE_BUG_ON(!vres->base.size);
>>> + size = vres->base.size;
>>> - r = drm_buddy_alloc_blocks(mm, fpfn,
>>> - lpfn,
>>> - size,
>>> - min_block_size,
>>> - &vres->blocks,
>>> - vres->flags);
>>> - if (unlikely(r))
>>> - goto error_free_blocks;
>>> + min_page_size = mgr->default_page_size;
>>> + if (tbo->page_alignment)
>>> + min_page_size = tbo->page_alignment << PAGE_SHIFT;
>>> - if (size > remaining_size)
>>> - remaining_size = 0;
>>> - else
>>> - remaining_size -= size;
>>> + XE_BUG_ON(min_page_size < mm->chunk_size);
>>> + XE_BUG_ON(min_page_size > SZ_2G); /* FIXME: sg limit */
>>> + XE_BUG_ON(size > SZ_2G &&
>>> + (vres->base.placement & TTM_PL_FLAG_CONTIGUOUS));
>>> + XE_BUG_ON(!IS_ALIGNED(size, min_page_size));
>>> +
>>> + if (place->fpfn + (vres->base.size >> PAGE_SHIFT) !=
>>> place->lpfn &&
>>> + place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>>> + unsigned long pages;
>>> +
>>> + size = roundup_pow_of_two(size);
>>> + min_page_size = size;
>>> +
>>> + pages = size >> ilog2(mm->chunk_size);
>>> + if (pages > lpfn)
>>> + lpfn = pages;
>>> }
>>> - mutex_unlock(&mgr->lock);
>>> - if (cur_size != size) {
>>> - struct drm_buddy_block *block;
>>> - struct list_head *trim_list;
>>> - u64 original_size;
>>> - LIST_HEAD(temp);
>>> + if (size > lpfn << PAGE_SHIFT) {
>>> + err = -E2BIG; /* don't trigger eviction */
>>> + goto error_fini;
>>> + }
>>> - trim_list = &vres->blocks;
>>> - original_size = vres->base.size;
>>> + mutex_lock(&mgr->lock);
>>> + if (lpfn <= mgr->visible_size && size > mgr->visible_avail) {
>>> + mutex_unlock(&mgr->lock);
>>> + err = -ENOSPC;
>>> + goto error_fini;
>>> + }
>>> + remaining_size = size;
>>> + do {
>>> /*
>>> - * If size value is rounded up to min_block_size, trim the
>>> last
>>> - * block to the required size
>>> + * Limit maximum size to 2GiB due to SG table limitations.
>>> + * FIXME: Should maybe be handled as part of sg construction.
>>> */
>>> - if (!list_is_singular(&vres->blocks)) {
>>> - block = list_last_entry(&vres->blocks, typeof(*block),
>>> link);
>>> - list_move_tail(&block->link, &temp);
>>> - trim_list = &temp;
>>> - /*
>>> - * Compute the original_size value by subtracting the
>>> - * last block size with (aligned size - original size)
>>> - */
>>> - original_size = drm_buddy_block_size(mm, block) -
>>> - (size - cur_size);
>>> - }
>>> + u64 alloc_size = min_t(u64, remaining_size, SZ_2G);
>>> +
>>> + err = drm_buddy_alloc_blocks(mm, (u64)place->fpfn <<
>>> PAGE_SHIFT,
>>> + (u64)lpfn << PAGE_SHIFT,
>>> + alloc_size,
>>> + min_page_size,
>>> + &vres->blocks,
>>> + vres->flags);
>>> + if (err)
>>> + goto error_free_blocks;
>>> - mutex_lock(&mgr->lock);
>>> - drm_buddy_block_trim(mm,
>>> - original_size,
>>> - trim_list);
>>> - mutex_unlock(&mgr->lock);
>>> + remaining_size -= alloc_size;
>>> + } while (remaining_size);
>>> - if (!list_empty(&temp))
>>> - list_splice_tail(trim_list, &vres->blocks);
>>> + if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>>> + if (!drm_buddy_block_trim(mm, vres->base.size, &vres->blocks))
>>> + size = vres->base.size;
>>> }
>>> + if (lpfn <= mgr->visible_size >> PAGE_SHIFT) {
>>> + vres->used_visible_size = size;
>>> + } else {
>>> + struct drm_buddy_block *block;
>>> +
>>> + list_for_each_entry(block, &vres->blocks, link) {
>>> + u64 start = drm_buddy_block_offset(block);
>>> +
>>> + if (start < mgr->visible_size) {
>>> + u64 end = start + drm_buddy_block_size(mm, block);
>>> +
>>> + vres->used_visible_size +=
>>> + min(end, mgr->visible_size) - start;
>>> + }
>>> + }
>>> + }
>>> +
>>> + mgr->visible_avail -= vres->used_visible_size;
>>> + mutex_unlock(&mgr->lock);
>>> +
>>> if (!(vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) &&
>>> xe_is_vram_mgr_blocks_contiguous(mm, &vres->blocks))
>>> vres->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
>>> @@ -213,7 +190,7 @@ static int xe_ttm_vram_mgr_new(struct
>>> ttm_resource_manager *man,
>>> ttm_resource_fini(man, &vres->base);
>>> kfree(vres);
>>> - return r;
>>> + return err;
>>> }
>>> static void xe_ttm_vram_mgr_del(struct ttm_resource_manager *man,
>>> @@ -226,6 +203,7 @@ static void xe_ttm_vram_mgr_del(struct
>>> ttm_resource_manager *man,
>>> mutex_lock(&mgr->lock);
>>> drm_buddy_free_list(mm, &vres->blocks);
>>> + mgr->visible_avail += vres->used_visible_size;
>>> mutex_unlock(&mgr->lock);
>>> ttm_resource_fini(man, res);
>>> @@ -240,6 +218,13 @@ static void xe_ttm_vram_mgr_debug(struct
>>> ttm_resource_manager *man,
>>> struct drm_buddy *mm = &mgr->mm;
>>> mutex_lock(&mgr->lock);
>>> + drm_printf(printer, "default_page_size: %lluKiB\n",
>>> + mgr->default_page_size >> 10);
>>> + drm_printf(printer, "visible_avail: %lluMiB\n",
>>> + (u64)mgr->visible_avail >> 20);
>>> + drm_printf(printer, "visible_size: %lluMiB\n",
>>> + (u64)mgr->visible_size >> 20);
>>> +
>>> drm_buddy_print(mm, printer);
>>> mutex_unlock(&mgr->lock);
>>> drm_printf(printer, "man size:%llu\n", man->size);
>>> @@ -262,6 +247,8 @@ static void ttm_vram_mgr_fini(struct drm_device
>>> *dev, void *arg)
>>> if (ttm_resource_manager_evict_all(&xe->ttm, man))
>>> return;
>>> + WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
>>> +
>>> drm_buddy_fini(&mgr->mm);
>>> ttm_resource_manager_cleanup(&mgr->manager);
>>> @@ -270,7 +257,8 @@ static void ttm_vram_mgr_fini(struct drm_device
>>> *dev, void *arg)
>>> }
>>> int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct
>>> xe_ttm_vram_mgr *mgr,
>>> - u32 mem_type, u64 size, u64 default_page_size)
>>> + u32 mem_type, u64 size, u64 io_size,
>>> + u64 default_page_size)
>>> {
>>> struct ttm_resource_manager *man = &mgr->manager;
>>> int err;
>>> @@ -279,6 +267,8 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe,
>>> struct xe_ttm_vram_mgr *mgr,
>>> mgr->mem_type = mem_type;
>>> mutex_init(&mgr->lock);
>>> mgr->default_page_size = default_page_size;
>>> + mgr->visible_size = io_size;
>>> + mgr->visible_avail = io_size;
>>> ttm_resource_manager_init(man, &xe->ttm, size);
>>> err = drm_buddy_init(&mgr->mm, man->size, default_page_size);
>>> @@ -298,7 +288,8 @@ int xe_ttm_vram_mgr_init(struct xe_gt *gt,
>>> struct xe_ttm_vram_mgr *mgr)
>>> mgr->gt = gt;
>>> return __xe_ttm_vram_mgr_init(xe, mgr, XE_PL_VRAM0 +
>>> gt->info.vram_id,
>>> - gt->mem.vram.size, PAGE_SIZE);
>>> + gt->mem.vram.size, gt->mem.vram.io_size,
>>> + PAGE_SIZE);
>>> }
>>> int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
>>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>>> index 78f332d26224..35e5367a79fb 100644
>>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>>> @@ -13,7 +13,8 @@ struct xe_device;
>>> struct xe_gt;
>>> int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct
>>> xe_ttm_vram_mgr *mgr,
>>> - u32 mem_type, u64 size, u64 default_page_size);
>>> + u32 mem_type, u64 size, u64 io_size,
>>> + u64 default_page_size);
>>> int xe_ttm_vram_mgr_init(struct xe_gt *gt, struct xe_ttm_vram_mgr
>>> *mgr);
>>> int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
>>> struct ttm_resource *res,
>>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>>> index 39aa2ec1b968..3d9417ff7434 100644
>>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>>> @@ -23,6 +23,10 @@ struct xe_ttm_vram_mgr {
>>> struct ttm_resource_manager manager;
>>> /** @mm: DRM buddy allocator which manages the VRAM */
>>> struct drm_buddy mm;
>>> + /** @visible_size: Proped size of the CPU visible portion */
>>> + u64 visible_size;
>>> + /** @visible_avail: CPU visible portion still unallocated */
>>> + u64 visible_avail;
>>> /** @default_page_size: default page size */
>>> u64 default_page_size;
>>> /** @lock: protects allocations of VRAM */
>>> @@ -39,6 +43,8 @@ struct xe_ttm_vram_mgr_resource {
>>> struct ttm_resource base;
>>> /** @blocks: list of DRM buddy blocks */
>>> struct list_head blocks;
>>> + /** @used_visible_size: How many CPU visible bytes this
>>> resource is using */
>>> + u64 used_visible_size;
>>> /** @flags: flags associated with the resource */
>>> unsigned long flags;
>>> };
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Intel-xe] [PATCH v2 12/14] drm/xe/display: annotate CC buffers with NEEDS_CPU_ACCESS
2023-02-28 15:20 ` Matthew Auld
@ 2023-03-02 11:51 ` Maarten Lankhorst
2023-03-03 12:12 ` Matthew Auld
0 siblings, 1 reply; 27+ messages in thread
From: Maarten Lankhorst @ 2023-03-02 11:51 UTC (permalink / raw)
To: Matthew Auld, intel-xe; +Cc: Lucas De Marchi
Hey,
On 2023-02-28 16:20, Matthew Auld wrote:
> On 28/02/2023 14:48, Maarten Lankhorst wrote:
>> Hey,
>>
>> On 2023-02-28 11:41, Matthew Auld wrote:
>>> The display code wants to read the clear color value from the buffer.
>>> However if the buffer is the non-mappable part of lmem then we fail the
>>> kmap. The simplest solution is to just mark the buffer with
>>> XE_BO_NEEDS_CPU_ACCESS, which will either allocate the buffer in the
>>> mappable part of lmem, or migrate it there.
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>> ---
>>> drivers/gpu/drm/xe/display/xe_fb_pin.c | 8 ++++++++
>>> 1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>> b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>> index 65c0bc28a3d1..66e1309e21d8 100644
>>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>> @@ -203,6 +203,14 @@ static struct i915_vma *__xe_pin_fb_vma(struct
>>> intel_framebuffer *fb,
>>> if (ret)
>>> goto err;
>>> + /*
>>> + * For this type of buffer we need to able to read from the CPU
>>> the
>>> + * clear color value found in the buffer. This doesn't do
>>> anything on
>>> + * non small-bar devices.
>>> + */
>>> + if (intel_fb_rc_ccs_cc_plane(&fb->base) >= 0)
>>> + bo->flags |= XE_BO_NEEDS_CPU_ACCESS;
>>
>> While we do need a change, do we need to do this inside pinning? We
>> could also require userspace to create VRAM BO's with CPU_ACCESS, and
>> reject CCS here.
>
> Do you mean reject CCS if CPU_ACCESS is not set? The trouble is that
> CPU_ACCESS also requires system memory as a spill option when
> specifying the placements, which then prevents using CCS. Or at least
> that's how it was in i915.
We should be able to handle it without moving to sysmem, if that's what
we need.
>>
>> Of course we should probably also add a UAPI to allow setting the
>> SCANOUT flag on externally imported DMA-BUF bo's, i don't think we
>> require CPU_ACCESS flag for that as it's not our BO.
>
> Hmm, maybe if SCANOUT then internally use the mappable part of lmem by
> default? For dumb buffers I think it does something like that. And
> keep the above as a fallback?
For dumb bo's, it is expected they are mapped, so we can set the
CPU_ACCESS flag. I don't know if we need it all the time, so for VRAM
bo's it could be specified as a separate flag, which is ignored for sysmem.
>
>>
>> Might require some more thinking on how we want to handle this.
>>
>> Other patches look good, except for the comments I had:
>>
>> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Intel-xe] [PATCH v2 12/14] drm/xe/display: annotate CC buffers with NEEDS_CPU_ACCESS
2023-03-02 11:51 ` Maarten Lankhorst
@ 2023-03-03 12:12 ` Matthew Auld
2023-03-03 12:58 ` Maarten Lankhorst
0 siblings, 1 reply; 27+ messages in thread
From: Matthew Auld @ 2023-03-03 12:12 UTC (permalink / raw)
To: Maarten Lankhorst, intel-xe; +Cc: Lucas De Marchi
On 02/03/2023 11:51, Maarten Lankhorst wrote:
> Hey,
>
> On 2023-02-28 16:20, Matthew Auld wrote:
>> On 28/02/2023 14:48, Maarten Lankhorst wrote:
>>> Hey,
>>>
>>> On 2023-02-28 11:41, Matthew Auld wrote:
>>>> The display code wants to read the clear color value from the buffer.
>>>> However if the buffer is the non-mappable part of lmem then we fail the
>>>> kmap. The simplest solution is to just mark the buffer with
>>>> XE_BO_NEEDS_CPU_ACCESS, which will either allocate the buffer in the
>>>> mappable part of lmem, or migrate it there.
>>>>
>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>>> ---
>>>> drivers/gpu/drm/xe/display/xe_fb_pin.c | 8 ++++++++
>>>> 1 file changed, 8 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>> b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>> index 65c0bc28a3d1..66e1309e21d8 100644
>>>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>> @@ -203,6 +203,14 @@ static struct i915_vma *__xe_pin_fb_vma(struct
>>>> intel_framebuffer *fb,
>>>> if (ret)
>>>> goto err;
>>>> + /*
>>>> + * For this type of buffer we need to able to read from the CPU
>>>> the
>>>> + * clear color value found in the buffer. This doesn't do
>>>> anything on
>>>> + * non small-bar devices.
>>>> + */
>>>> + if (intel_fb_rc_ccs_cc_plane(&fb->base) >= 0)
>>>> + bo->flags |= XE_BO_NEEDS_CPU_ACCESS;
>>>
>>> While we do need a change, do we need to do this inside pinning? We
>>> could also require userspace to create VRAM BO's with CPU_ACCESS, and
>>> reject CCS here.
>>
>> Do you mean reject CCS if CPU_ACCESS is not set? The trouble is that
>> CPU_ACCESS also requires system memory as a spill option when
>> specifying the placements, which then prevents using CCS. Or at least
>> that's how it was in i915.
> We should be able to handle it without moving to sysmem, if that's what
> we need.
Ok, so maybe drop NEEDS_CPU_ACCESS, and just have NEEDS_VISIBLE_VRAM
instead, and to get the old behaviour the user can just attach system
memory as an extra placement? And then for clear color stuff they can
use VISIBLE_VRAM by itself, and we can enforce that here?
>>>
>>> Of course we should probably also add a UAPI to allow setting the
>>> SCANOUT flag on externally imported DMA-BUF bo's, i don't think we
>>> require CPU_ACCESS flag for that as it's not our BO.
For discrete specifically, can you have externally imported dma-buf for
display (not originating from the same device)? The scanout needs vram
local to the device, right?
>>
>> Hmm, maybe if SCANOUT then internally use the mappable part of lmem by
>> default? For dumb buffers I think it does something like that. And
>> keep the above as a fallback?
> For dumb bo's, it is expected they are mapped, so we can set the
> CPU_ACCESS flag. I don't know if we need it all the time, so for VRAM
> bo's it could be specified as a separate flag, which is ignored for sysmem.
>>
>>>
>>> Might require some more thinking on how we want to handle this.
>>>
>>> Other patches look good, except for the comments I had:
>>>
>>> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Intel-xe] [PATCH v2 12/14] drm/xe/display: annotate CC buffers with NEEDS_CPU_ACCESS
2023-03-03 12:12 ` Matthew Auld
@ 2023-03-03 12:58 ` Maarten Lankhorst
0 siblings, 0 replies; 27+ messages in thread
From: Maarten Lankhorst @ 2023-03-03 12:58 UTC (permalink / raw)
To: Matthew Auld, intel-xe; +Cc: Lucas De Marchi
On 2023-03-03 13:12, Matthew Auld wrote:
> On 02/03/2023 11:51, Maarten Lankhorst wrote:
>> Hey,
>>
>> On 2023-02-28 16:20, Matthew Auld wrote:
>>> On 28/02/2023 14:48, Maarten Lankhorst wrote:
>>>> Hey,
>>>>
>>>> On 2023-02-28 11:41, Matthew Auld wrote:
>>>>> The display code wants to read the clear color value from the buffer.
>>>>> However if the buffer is the non-mappable part of lmem then we
>>>>> fail the
>>>>> kmap. The simplest solution is to just mark the buffer with
>>>>> XE_BO_NEEDS_CPU_ACCESS, which will either allocate the buffer in the
>>>>> mappable part of lmem, or migrate it there.
>>>>>
>>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>>>> ---
>>>>> drivers/gpu/drm/xe/display/xe_fb_pin.c | 8 ++++++++
>>>>> 1 file changed, 8 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>>> b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>>> index 65c0bc28a3d1..66e1309e21d8 100644
>>>>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>>> @@ -203,6 +203,14 @@ static struct i915_vma
>>>>> *__xe_pin_fb_vma(struct intel_framebuffer *fb,
>>>>> if (ret)
>>>>> goto err;
>>>>> + /*
>>>>> + * For this type of buffer we need to able to read from the
>>>>> CPU the
>>>>> + * clear color value found in the buffer. This doesn't do
>>>>> anything on
>>>>> + * non small-bar devices.
>>>>> + */
>>>>> + if (intel_fb_rc_ccs_cc_plane(&fb->base) >= 0)
>>>>> + bo->flags |= XE_BO_NEEDS_CPU_ACCESS;
>>>>
>>>> While we do need a change, do we need to do this inside pinning? We
>>>> could also require userspace to create VRAM BO's with CPU_ACCESS,
>>>> and reject CCS here.
>>>
>>> Do you mean reject CCS if CPU_ACCESS is not set? The trouble is that
>>> CPU_ACCESS also requires system memory as a spill option when
>>> specifying the placements, which then prevents using CCS. Or at
>>> least that's how it was in i915.
>> We should be able to handle it without moving to sysmem, if that's
>> what we need.
>
> Ok, so maybe drop NEEDS_CPU_ACCESS, and just have NEEDS_VISIBLE_VRAM
> instead, and to get the old behaviour the user can just attach system
> memory as an extra placement? And then for clear color stuff they can
> use VISIBLE_VRAM by itself, and we can enforce that here?
You can't put a FB In sysmem anyway, so we need to pass a flag to put it
in visible VRAM.
>>>>
>>>> Of course we should probably also add a UAPI to allow setting the
>>>> SCANOUT flag on externally imported DMA-BUF bo's, i don't think we
>>>> require CPU_ACCESS flag for that as it's not our BO.
>
> For discrete specifically, can you have externally imported dma-buf
> for display (not originating from the same device)? The scanout needs
> vram local to the device, right?
I don't think it's currently possible, I have no idea whether external
VRAM is possible, I think it would depend on the hardware, but if it was
supported, I would imagine sysmem would also have been.
>>>
>>> Hmm, maybe if SCANOUT then internally use the mappable part of lmem
>>> by default? For dumb buffers I think it does something like that.
>>> And keep the above as a fallback?
>> For dumb bo's, it is expected they are mapped, so we can set the
>> CPU_ACCESS flag. I don't know if we need it all the time, so for VRAM
>> bo's it could be specified as a separate flag, which is ignored for
>> sysmem.
>>>
>>>>
>>>> Might require some more thinking on how we want to handle this.
>>>>
>>>> Other patches look good, except for the comments I had:
>>>>
>>>> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>>
~Maarten
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2023-03-06 14:41 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-28 10:41 [Intel-xe] [PATCH v2 00/14] small-bar support Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 01/14] drm/xe/display: fix IS_ALDERLAKE_P() Matthew Auld
2023-02-28 14:09 ` Maarten Lankhorst
2023-02-28 17:57 ` Matt Roper
2023-02-28 18:54 ` Lucas De Marchi
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 02/14] drm/xe/display: fix bo leak when unloading module Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 03/14] drm/xe: prefer xe_bo_create_pin_map() Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 04/14] drm/xe/bo: explicitly reject zero sized BO Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 05/14] drm/xe/mmio: s/lmem/vram/ Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 06/14] drm/xe/vram: start tracking the io_size Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 07/14] drm/xe/buddy: remove the virtualized start Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 08/14] drm/xe/buddy: add visible tracking Matthew Auld
2023-02-28 14:24 ` Maarten Lankhorst
2023-02-28 15:05 ` Matthew Auld
2023-02-28 20:26 ` Maarten Lankhorst
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 09/14] drm/xe/buddy: add compatible and intersects hooks Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 10/14] drm/xe/bo: support tiered vram allocation for small-bar Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 11/14] drm/xe/migrate: retain CCS aux state for vram -> vram Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 12/14] drm/xe/display: annotate CC buffers with NEEDS_CPU_ACCESS Matthew Auld
2023-02-28 14:48 ` Maarten Lankhorst
2023-02-28 15:20 ` Matthew Auld
2023-03-02 11:51 ` Maarten Lankhorst
2023-03-03 12:12 ` Matthew Auld
2023-03-03 12:58 ` Maarten Lankhorst
2023-02-28 15:22 ` Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 13/14] drm/xe/uapi: add the userspace bits for small-bar Matthew Auld
2023-02-28 10:41 ` [Intel-xe] [PATCH v2 14/14] drm/xe: fully turn on small-bar support Matthew Auld
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.