* [PATCH v4 0/7] drm/panthor: support repeated mappings
@ 2025-07-07 17:04 Caterina Shablia
2025-07-07 17:04 ` [PATCH v4 1/7] drm/panthor: Add support for atomic page table updates Caterina Shablia
` (6 more replies)
0 siblings, 7 replies; 36+ messages in thread
From: Caterina Shablia @ 2025-07-07 17:04 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Danilo Krummrich, Boris Brezillon, Steven Price, Liviu Dudau,
Lucas De Marchi, Thomas Hellström, Rodrigo Vivi
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina,
Caterina Shablia
This patch series adds OP_MAP_REPEAT flag, which lets the user map a BO
region over an address range repeatedly with just one map operation.
Sparse resources in the Vulkan API let the user leave regions of a
resource unmapped (from the API perspective.) Accesses to such regions
must not result in program termination, but loads produce undefined
values.
To implement this feature on Mali hardware, Vulkan sparse unmap is
implemented by mapping the specified region to a "dummy bo" so that the
accesses do not fault. A newly created sparse resource starts off
unmapped, and therefore also has to be mapped to the "dummy bo". This
"dummy bo" is small (a page size) in comparison to the sizes of va
ranges that we might want to map to it, and a large number of vm_bind
ops can be necessary. For example, if the user were to create a
100e6-byte sparse resident resource, we'd have to poke VM_BIND with
ceil(100e6/0x1000)=24415 map operations.
OP_MAP_REPEAT addresses this particular inefficiency by letting us
implement a single Vulkan sparse unmap operation and sparse resident
resource initialization with just one map operation.
The panvk changes making use of this uapi can be found at
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35287
Changes in v4:
- Fixed the warnings reported by the kernel test robot.
https://lore.kernel.org/oe-kbuild-all/202507041635.WyDu3TQ1-lkp@intel.com/
- Fixed the warnings reported by the CI.
https://patchwork.freedesktop.org/series/151264/
No changes in v3.
Changes in v2:
- Make panthor use this stuff.
- Make it possible to express a repeated mappina of any suitably sized
and aligned range of a BO, rather than strictly the page size -sized
prefix, generalizing the API. Rename DRM_GPUVA_SINGLE_PAGE to
DRM_GPUVA_REPEAT.
- Clean up parts of drm/gpuvm affected by these changes.
Link to v1: https://lore.kernel.org/lkml/20250202-gpuvm-single-page-v1-0-8cbd44fdcbd4@asahilina.net/
Asahi Lina (2):
drm/gpuvm: Add a flags field to drm_gpuvm_map_req/drm_gpuva_op_map
drm/gpuvm: Add DRM_GPUVA_REPEAT flag and logic
Boris Brezillon (5):
drm/panthor: Add support for atomic page table updates
drm/gpuvm: Kill drm_gpuva_init()
drm/gpuvm: Pass map arguments through a struct
drm/gpuvm: Add a helper to check if two VA can be merged
drm/panthor: Add support for repeated mappings
drivers/gpu/drm/drm_gpuvm.c | 182 ++++++++++++++++++-------
drivers/gpu/drm/imagination/pvr_vm.c | 15 +-
drivers/gpu/drm/nouveau/nouveau_uvmm.c | 11 +-
drivers/gpu/drm/panthor/panthor_drv.c | 3 +-
drivers/gpu/drm/panthor/panthor_mmu.c | 154 +++++++++++++++++++--
drivers/gpu/drm/xe/xe_vm.c | 13 +-
include/drm/drm_gpuvm.h | 101 +++++++++++---
include/uapi/drm/panthor_drm.h | 23 ++++
8 files changed, 408 insertions(+), 94 deletions(-)
base-commit: 482c7e296edc0f594e8869a789a40be53c49bd6a
--
2.47.2
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v4 1/7] drm/panthor: Add support for atomic page table updates
2025-07-07 17:04 [PATCH v4 0/7] drm/panthor: support repeated mappings Caterina Shablia
@ 2025-07-07 17:04 ` Caterina Shablia
2025-07-11 13:30 ` Steven Price
2025-07-07 17:04 ` [PATCH v4 2/7] drm/gpuvm: Kill drm_gpuva_init() Caterina Shablia
` (5 subsequent siblings)
6 siblings, 1 reply; 36+ messages in thread
From: Caterina Shablia @ 2025-07-07 17:04 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Danilo Krummrich, Boris Brezillon, Steven Price, Liviu Dudau,
Lucas De Marchi, Thomas Hellström, Rodrigo Vivi
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina,
Caterina Shablia
From: Boris Brezillon <boris.brezillon@collabora.com>
Move the lock/flush_mem operations around the gpuvm_sm_map() calls so
we can implement true atomic page updates, where any access in the
locked range done by the GPU has to wait for the page table updates
to land before proceeding.
This is needed for vkQueueBindSparse(), so we can replace the dummy
page mapped over the entire object by actual BO backed pages in an atomic
way.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
---
drivers/gpu/drm/panthor/panthor_mmu.c | 65 +++++++++++++++++++++++++--
1 file changed, 62 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
index b39ea6acc6a9..9caaba03c5eb 100644
--- a/drivers/gpu/drm/panthor/panthor_mmu.c
+++ b/drivers/gpu/drm/panthor/panthor_mmu.c
@@ -387,6 +387,15 @@ struct panthor_vm {
* flagged as faulty as a result.
*/
bool unhandled_fault;
+
+ /** @locked_region: Information about the currently locked region currently. */
+ struct {
+ /** @locked_region.start: Start of the locked region. */
+ u64 start;
+
+ /** @locked_region.size: Size of the locked region. */
+ u64 size;
+ } locked_region;
};
/**
@@ -775,6 +784,10 @@ int panthor_vm_active(struct panthor_vm *vm)
}
ret = panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab, transcfg, vm->memattr);
+ if (!ret && vm->locked_region.size) {
+ lock_region(ptdev, vm->as.id, vm->locked_region.start, vm->locked_region.size);
+ ret = wait_ready(ptdev, vm->as.id);
+ }
out_make_active:
if (!ret) {
@@ -902,6 +915,9 @@ static int panthor_vm_unmap_pages(struct panthor_vm *vm, u64 iova, u64 size)
struct io_pgtable_ops *ops = vm->pgtbl_ops;
u64 offset = 0;
+ drm_WARN_ON(&ptdev->base,
+ (iova < vm->locked_region.start) ||
+ (iova + size > vm->locked_region.start + vm->locked_region.size));
drm_dbg(&ptdev->base, "unmap: as=%d, iova=%llx, len=%llx", vm->as.id, iova, size);
while (offset < size) {
@@ -915,13 +931,12 @@ static int panthor_vm_unmap_pages(struct panthor_vm *vm, u64 iova, u64 size)
iova + offset + unmapped_sz,
iova + offset + pgsize * pgcount,
iova, iova + size);
- panthor_vm_flush_range(vm, iova, offset + unmapped_sz);
return -EINVAL;
}
offset += unmapped_sz;
}
- return panthor_vm_flush_range(vm, iova, size);
+ return 0;
}
static int
@@ -938,6 +953,10 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64 iova, int prot,
if (!size)
return 0;
+ drm_WARN_ON(&ptdev->base,
+ (iova < vm->locked_region.start) ||
+ (iova + size > vm->locked_region.start + vm->locked_region.size));
+
for_each_sgtable_dma_sg(sgt, sgl, count) {
dma_addr_t paddr = sg_dma_address(sgl);
size_t len = sg_dma_len(sgl);
@@ -985,7 +1004,7 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64 iova, int prot,
offset = 0;
}
- return panthor_vm_flush_range(vm, start_iova, iova - start_iova);
+ return 0;
}
static int flags_to_prot(u32 flags)
@@ -1654,6 +1673,38 @@ static const char *access_type_name(struct panthor_device *ptdev,
}
}
+static int panthor_vm_lock_region(struct panthor_vm *vm, u64 start, u64 size)
+{
+ struct panthor_device *ptdev = vm->ptdev;
+ int ret = 0;
+
+ mutex_lock(&ptdev->mmu->as.slots_lock);
+ drm_WARN_ON(&ptdev->base, vm->locked_region.start || vm->locked_region.size);
+ vm->locked_region.start = start;
+ vm->locked_region.size = size;
+ if (vm->as.id >= 0) {
+ lock_region(ptdev, vm->as.id, start, size);
+ ret = wait_ready(ptdev, vm->as.id);
+ }
+ mutex_unlock(&ptdev->mmu->as.slots_lock);
+
+ return ret;
+}
+
+static void panthor_vm_unlock_region(struct panthor_vm *vm)
+{
+ struct panthor_device *ptdev = vm->ptdev;
+
+ mutex_lock(&ptdev->mmu->as.slots_lock);
+ if (vm->as.id >= 0) {
+ write_cmd(ptdev, vm->as.id, AS_COMMAND_FLUSH_MEM);
+ drm_WARN_ON(&ptdev->base, wait_ready(ptdev, vm->as.id));
+ }
+ vm->locked_region.start = 0;
+ vm->locked_region.size = 0;
+ mutex_unlock(&ptdev->mmu->as.slots_lock);
+}
+
static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status)
{
bool has_unhandled_faults = false;
@@ -2179,6 +2230,11 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct panthor_vm_op_ctx *op,
mutex_lock(&vm->op_lock);
vm->op_ctx = op;
+
+ ret = panthor_vm_lock_region(vm, op->va.addr, op->va.range);
+ if (ret)
+ goto out;
+
switch (op_type) {
case DRM_PANTHOR_VM_BIND_OP_TYPE_MAP:
if (vm->unusable) {
@@ -2199,6 +2255,9 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct panthor_vm_op_ctx *op,
break;
}
+ panthor_vm_unlock_region(vm);
+
+out:
if (ret && flag_vm_unusable_on_failure)
vm->unusable = true;
--
2.47.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v4 2/7] drm/gpuvm: Kill drm_gpuva_init()
2025-07-07 17:04 [PATCH v4 0/7] drm/panthor: support repeated mappings Caterina Shablia
2025-07-07 17:04 ` [PATCH v4 1/7] drm/panthor: Add support for atomic page table updates Caterina Shablia
@ 2025-07-07 17:04 ` Caterina Shablia
2025-07-07 18:41 ` Danilo Krummrich
2025-07-11 13:30 ` Steven Price
2025-07-07 17:04 ` [PATCH v4 3/7] drm/gpuvm: Pass map arguments through a struct Caterina Shablia
` (4 subsequent siblings)
6 siblings, 2 replies; 36+ messages in thread
From: Caterina Shablia @ 2025-07-07 17:04 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Danilo Krummrich, Boris Brezillon, Steven Price, Liviu Dudau,
Lucas De Marchi, Thomas Hellström, Rodrigo Vivi
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina,
Caterina Shablia
From: Boris Brezillon <boris.brezillon@collabora.com>
drm_gpuva_init() only has one internal user, and given we are about to
add new optional fields, it only add maintenance burden for no real
benefit, so let's kill the thing now.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
---
include/drm/drm_gpuvm.h | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)
diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
index 2a9629377633..6fdf2aff3e90 100644
--- a/include/drm/drm_gpuvm.h
+++ b/include/drm/drm_gpuvm.h
@@ -160,15 +160,6 @@ struct drm_gpuva *drm_gpuva_find_first(struct drm_gpuvm *gpuvm,
struct drm_gpuva *drm_gpuva_find_prev(struct drm_gpuvm *gpuvm, u64 start);
struct drm_gpuva *drm_gpuva_find_next(struct drm_gpuvm *gpuvm, u64 end);
-static inline void drm_gpuva_init(struct drm_gpuva *va, u64 addr, u64 range,
- struct drm_gem_object *obj, u64 offset)
-{
- va->va.addr = addr;
- va->va.range = range;
- va->gem.obj = obj;
- va->gem.offset = offset;
-}
-
/**
* drm_gpuva_invalidate() - sets whether the backing GEM of this &drm_gpuva is
* invalidated
@@ -1079,8 +1070,10 @@ void drm_gpuva_ops_free(struct drm_gpuvm *gpuvm,
static inline void drm_gpuva_init_from_op(struct drm_gpuva *va,
struct drm_gpuva_op_map *op)
{
- drm_gpuva_init(va, op->va.addr, op->va.range,
- op->gem.obj, op->gem.offset);
+ va->va.addr = op->va.addr;
+ va->va.range = op->va.range;
+ va->gem.obj = op->gem.obj;
+ va->gem.offset = op->gem.offset;
}
/**
--
2.47.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v4 3/7] drm/gpuvm: Pass map arguments through a struct
2025-07-07 17:04 [PATCH v4 0/7] drm/panthor: support repeated mappings Caterina Shablia
2025-07-07 17:04 ` [PATCH v4 1/7] drm/panthor: Add support for atomic page table updates Caterina Shablia
2025-07-07 17:04 ` [PATCH v4 2/7] drm/gpuvm: Kill drm_gpuva_init() Caterina Shablia
@ 2025-07-07 17:04 ` Caterina Shablia
2025-07-07 18:44 ` Danilo Krummrich
2025-07-07 17:04 ` [PATCH v4 4/7] drm/gpuvm: Add a helper to check if two VA can be merged Caterina Shablia
` (3 subsequent siblings)
6 siblings, 1 reply; 36+ messages in thread
From: Caterina Shablia @ 2025-07-07 17:04 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Danilo Krummrich, Boris Brezillon, Steven Price, Liviu Dudau,
Lucas De Marchi, Thomas Hellström, Rodrigo Vivi
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina,
Caterina Shablia
From: Boris Brezillon <boris.brezillon@collabora.com>
We are about to pass more arguments to drm_gpuvm_sm_map[_ops_create](),
so, before we do that, let's pass arguments through a struct instead
of changing each call site every time a new optional argument is added.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
---
drivers/gpu/drm/drm_gpuvm.c | 76 ++++++++++----------------
drivers/gpu/drm/imagination/pvr_vm.c | 15 +++--
drivers/gpu/drm/nouveau/nouveau_uvmm.c | 11 ++--
drivers/gpu/drm/panthor/panthor_mmu.c | 13 ++++-
drivers/gpu/drm/xe/xe_vm.c | 13 ++++-
include/drm/drm_gpuvm.h | 34 ++++++++++--
6 files changed, 97 insertions(+), 65 deletions(-)
diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
index e89b932e987c..05978c5c38b1 100644
--- a/drivers/gpu/drm/drm_gpuvm.c
+++ b/drivers/gpu/drm/drm_gpuvm.c
@@ -2054,16 +2054,15 @@ EXPORT_SYMBOL_GPL(drm_gpuva_unmap);
static int
op_map_cb(const struct drm_gpuvm_ops *fn, void *priv,
- u64 addr, u64 range,
- struct drm_gem_object *obj, u64 offset)
+ const struct drm_gpuvm_map_req *req)
{
struct drm_gpuva_op op = {};
op.op = DRM_GPUVA_OP_MAP;
- op.map.va.addr = addr;
- op.map.va.range = range;
- op.map.gem.obj = obj;
- op.map.gem.offset = offset;
+ op.map.va.addr = req->va.addr;
+ op.map.va.range = req->va.range;
+ op.map.gem.obj = req->gem.obj;
+ op.map.gem.offset = req->gem.offset;
return fn->sm_step_map(&op, priv);
}
@@ -2102,17 +2101,16 @@ op_unmap_cb(const struct drm_gpuvm_ops *fn, void *priv,
static int
__drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
const struct drm_gpuvm_ops *ops, void *priv,
- u64 req_addr, u64 req_range,
- struct drm_gem_object *req_obj, u64 req_offset)
+ const struct drm_gpuvm_map_req *req)
{
struct drm_gpuva *va, *next;
- u64 req_end = req_addr + req_range;
+ u64 req_end = req->va.addr + req->va.range;
int ret;
- if (unlikely(!drm_gpuvm_range_valid(gpuvm, req_addr, req_range)))
+ if (unlikely(!drm_gpuvm_range_valid(gpuvm, req->va.addr, req->va.range)))
return -EINVAL;
- drm_gpuvm_for_each_va_range_safe(va, next, gpuvm, req_addr, req_end) {
+ drm_gpuvm_for_each_va_range_safe(va, next, gpuvm, req->va.addr, req_end) {
struct drm_gem_object *obj = va->gem.obj;
u64 offset = va->gem.offset;
u64 addr = va->va.addr;
@@ -2120,9 +2118,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
u64 end = addr + range;
bool merge = !!va->gem.obj;
- if (addr == req_addr) {
- merge &= obj == req_obj &&
- offset == req_offset;
+ if (addr == req->va.addr) {
+ merge &= obj == req->gem.obj &&
+ offset == req->gem.offset;
if (end == req_end) {
ret = op_unmap_cb(ops, priv, va, merge);
@@ -2141,9 +2139,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
if (end > req_end) {
struct drm_gpuva_op_map n = {
.va.addr = req_end,
- .va.range = range - req_range,
+ .va.range = range - req->va.range,
.gem.obj = obj,
- .gem.offset = offset + req_range,
+ .gem.offset = offset + req->va.range,
};
struct drm_gpuva_op_unmap u = {
.va = va,
@@ -2155,8 +2153,8 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
return ret;
break;
}
- } else if (addr < req_addr) {
- u64 ls_range = req_addr - addr;
+ } else if (addr < req->va.addr) {
+ u64 ls_range = req->va.addr - addr;
struct drm_gpuva_op_map p = {
.va.addr = addr,
.va.range = ls_range,
@@ -2165,8 +2163,8 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
};
struct drm_gpuva_op_unmap u = { .va = va };
- merge &= obj == req_obj &&
- offset + ls_range == req_offset;
+ merge &= obj == req->gem.obj &&
+ offset + ls_range == req->gem.offset;
u.keep = merge;
if (end == req_end) {
@@ -2189,7 +2187,7 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
.va.range = end - req_end,
.gem.obj = obj,
.gem.offset = offset + ls_range +
- req_range,
+ req->va.range,
};
ret = op_remap_cb(ops, priv, &p, &n, &u);
@@ -2197,10 +2195,10 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
return ret;
break;
}
- } else if (addr > req_addr) {
- merge &= obj == req_obj &&
- offset == req_offset +
- (addr - req_addr);
+ } else if (addr > req->va.addr) {
+ merge &= obj == req->gem.obj &&
+ offset == req->gem.offset +
+ (addr - req->va.addr);
if (end == req_end) {
ret = op_unmap_cb(ops, priv, va, merge);
@@ -2236,9 +2234,7 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
}
}
- return op_map_cb(ops, priv,
- req_addr, req_range,
- req_obj, req_offset);
+ return op_map_cb(ops, priv, req);
}
static int
@@ -2302,11 +2298,8 @@ __drm_gpuvm_sm_unmap(struct drm_gpuvm *gpuvm,
/**
* drm_gpuvm_sm_map() - creates the &drm_gpuva_op split/merge steps
* @gpuvm: the &drm_gpuvm representing the GPU VA space
- * @req_addr: the start address of the new mapping
- * @req_range: the range of the new mapping
- * @req_obj: the &drm_gem_object to map
- * @req_offset: the offset within the &drm_gem_object
* @priv: pointer to a driver private data structure
+ * @req: map request information
*
* This function iterates the given range of the GPU VA space. It utilizes the
* &drm_gpuvm_ops to call back into the driver providing the split and merge
@@ -2333,8 +2326,7 @@ __drm_gpuvm_sm_unmap(struct drm_gpuvm *gpuvm,
*/
int
drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
- u64 req_addr, u64 req_range,
- struct drm_gem_object *req_obj, u64 req_offset)
+ const struct drm_gpuvm_map_req *req)
{
const struct drm_gpuvm_ops *ops = gpuvm->ops;
@@ -2343,9 +2335,7 @@ drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
ops->sm_step_unmap)))
return -EINVAL;
- return __drm_gpuvm_sm_map(gpuvm, ops, priv,
- req_addr, req_range,
- req_obj, req_offset);
+ return __drm_gpuvm_sm_map(gpuvm, ops, priv, req);
}
EXPORT_SYMBOL_GPL(drm_gpuvm_sm_map);
@@ -2485,10 +2475,7 @@ static const struct drm_gpuvm_ops gpuvm_list_ops = {
/**
* drm_gpuvm_sm_map_ops_create() - creates the &drm_gpuva_ops to split and merge
* @gpuvm: the &drm_gpuvm representing the GPU VA space
- * @req_addr: the start address of the new mapping
- * @req_range: the range of the new mapping
- * @req_obj: the &drm_gem_object to map
- * @req_offset: the offset within the &drm_gem_object
+ * @req: map request arguments
*
* This function creates a list of operations to perform splitting and merging
* of existent mapping(s) with the newly requested one.
@@ -2516,8 +2503,7 @@ static const struct drm_gpuvm_ops gpuvm_list_ops = {
*/
struct drm_gpuva_ops *
drm_gpuvm_sm_map_ops_create(struct drm_gpuvm *gpuvm,
- u64 req_addr, u64 req_range,
- struct drm_gem_object *req_obj, u64 req_offset)
+ const struct drm_gpuvm_map_req *req)
{
struct drm_gpuva_ops *ops;
struct {
@@ -2535,9 +2521,7 @@ drm_gpuvm_sm_map_ops_create(struct drm_gpuvm *gpuvm,
args.vm = gpuvm;
args.ops = ops;
- ret = __drm_gpuvm_sm_map(gpuvm, &gpuvm_list_ops, &args,
- req_addr, req_range,
- req_obj, req_offset);
+ ret = __drm_gpuvm_sm_map(gpuvm, &gpuvm_list_ops, &args, req);
if (ret)
goto err_free_ops;
diff --git a/drivers/gpu/drm/imagination/pvr_vm.c b/drivers/gpu/drm/imagination/pvr_vm.c
index 2896fa7501b1..abfdcd279363 100644
--- a/drivers/gpu/drm/imagination/pvr_vm.c
+++ b/drivers/gpu/drm/imagination/pvr_vm.c
@@ -185,12 +185,17 @@ struct pvr_vm_bind_op {
static int pvr_vm_bind_op_exec(struct pvr_vm_bind_op *bind_op)
{
switch (bind_op->type) {
- case PVR_VM_BIND_TYPE_MAP:
+ case PVR_VM_BIND_TYPE_MAP: {
+ const struct drm_gpuvm_map_req map_req = {
+ .va.addr = bind_op->device_addr,
+ .va.range = bind_op->size,
+ .gem.obj = gem_from_pvr_gem(bind_op->pvr_obj),
+ .gem.offset = bind_op->offset,
+ };
+
return drm_gpuvm_sm_map(&bind_op->vm_ctx->gpuvm_mgr,
- bind_op, bind_op->device_addr,
- bind_op->size,
- gem_from_pvr_gem(bind_op->pvr_obj),
- bind_op->offset);
+ bind_op, &map_req);
+ }
case PVR_VM_BIND_TYPE_UNMAP:
return drm_gpuvm_sm_unmap(&bind_op->vm_ctx->gpuvm_mgr,
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
index 48f105239f42..b481700be666 100644
--- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
@@ -1276,6 +1276,12 @@ nouveau_uvmm_bind_job_submit(struct nouveau_job *job,
break;
case OP_MAP: {
struct nouveau_uvma_region *reg;
+ struct drm_gpuvm_map_req map_req = {
+ .va.addr = op->va.addr,
+ .va.range = op->va.range,
+ .gem.obj = op->gem.obj,
+ .gem.offset = op->gem.offset,
+ };
reg = nouveau_uvma_region_find_first(uvmm,
op->va.addr,
@@ -1301,10 +1307,7 @@ nouveau_uvmm_bind_job_submit(struct nouveau_job *job,
}
op->ops = drm_gpuvm_sm_map_ops_create(&uvmm->base,
- op->va.addr,
- op->va.range,
- op->gem.obj,
- op->gem.offset);
+ &map_req);
if (IS_ERR(op->ops)) {
ret = PTR_ERR(op->ops);
goto unwind_continue;
diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
index 9caaba03c5eb..f0a22b775958 100644
--- a/drivers/gpu/drm/panthor/panthor_mmu.c
+++ b/drivers/gpu/drm/panthor/panthor_mmu.c
@@ -2236,15 +2236,22 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct panthor_vm_op_ctx *op,
goto out;
switch (op_type) {
- case DRM_PANTHOR_VM_BIND_OP_TYPE_MAP:
+ case DRM_PANTHOR_VM_BIND_OP_TYPE_MAP: {
+ const struct drm_gpuvm_map_req map_req = {
+ .va.addr = op->va.addr,
+ .va.range = op->va.range,
+ .gem.obj = op->map.vm_bo->obj,
+ .gem.offset = op->map.bo_offset,
+ };
+
if (vm->unusable) {
ret = -EINVAL;
break;
}
- ret = drm_gpuvm_sm_map(&vm->base, vm, op->va.addr, op->va.range,
- op->map.vm_bo->obj, op->map.bo_offset);
+ ret = drm_gpuvm_sm_map(&vm->base, vm, &map_req);
break;
+ }
case DRM_PANTHOR_VM_BIND_OP_TYPE_UNMAP:
ret = drm_gpuvm_sm_unmap(&vm->base, vm, op->va.addr, op->va.range);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 861577746929..80bc741bdb6b 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2246,10 +2246,17 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
switch (operation) {
case DRM_XE_VM_BIND_OP_MAP:
- case DRM_XE_VM_BIND_OP_MAP_USERPTR:
- ops = drm_gpuvm_sm_map_ops_create(&vm->gpuvm, addr, range,
- obj, bo_offset_or_userptr);
+ case DRM_XE_VM_BIND_OP_MAP_USERPTR: {
+ struct drm_gpuvm_map_req map_req = {
+ .va.addr = addr,
+ .va.range = range,
+ .gem.obj = obj,
+ .gem.offset = bo_offset_or_userptr,
+ };
+
+ ops = drm_gpuvm_sm_map_ops_create(&vm->gpuvm, &map_req);
break;
+ }
case DRM_XE_VM_BIND_OP_UNMAP:
ops = drm_gpuvm_sm_unmap_ops_create(&vm->gpuvm, addr, range);
break;
diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
index 6fdf2aff3e90..a6e6c33fc10b 100644
--- a/include/drm/drm_gpuvm.h
+++ b/include/drm/drm_gpuvm.h
@@ -1049,10 +1049,37 @@ struct drm_gpuva_ops {
*/
#define drm_gpuva_next_op(op) list_next_entry(op, entry)
+/**
+ * struct drm_gpuvm_map_req - arguments passed to drm_gpuvm_sm_map[_ops_create]()
+ */
+struct drm_gpuvm_map_req {
+ /** @va: virtual address related fields */
+ struct {
+ /** @va.addr: start of the virtual address range to map to */
+ u64 addr;
+
+ /** @va.size: size of the virtual address range to map to */
+ u64 range;
+ } va;
+
+ /** @gem: GEM related fields */
+ struct {
+ /**
+ * @obj: GEM object to map.
+ *
+ * Can be NULL if the virtual range is not backed by a GEM object.
+ */
+ struct drm_gem_object *obj;
+
+ /** @offset: offset in the GEM */
+ u64 offset;
+ } gem;
+};
+
struct drm_gpuva_ops *
drm_gpuvm_sm_map_ops_create(struct drm_gpuvm *gpuvm,
- u64 addr, u64 range,
- struct drm_gem_object *obj, u64 offset);
+ const struct drm_gpuvm_map_req *req);
+
struct drm_gpuva_ops *
drm_gpuvm_sm_unmap_ops_create(struct drm_gpuvm *gpuvm,
u64 addr, u64 range);
@@ -1198,8 +1225,7 @@ struct drm_gpuvm_ops {
};
int drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
- u64 addr, u64 range,
- struct drm_gem_object *obj, u64 offset);
+ const struct drm_gpuvm_map_req *req);
int drm_gpuvm_sm_unmap(struct drm_gpuvm *gpuvm, void *priv,
u64 addr, u64 range);
--
2.47.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v4 4/7] drm/gpuvm: Add a helper to check if two VA can be merged
2025-07-07 17:04 [PATCH v4 0/7] drm/panthor: support repeated mappings Caterina Shablia
` (2 preceding siblings ...)
2025-07-07 17:04 ` [PATCH v4 3/7] drm/gpuvm: Pass map arguments through a struct Caterina Shablia
@ 2025-07-07 17:04 ` Caterina Shablia
2025-07-07 19:00 ` Danilo Krummrich
2025-07-22 19:17 ` Adrian Larumbe
2025-07-07 17:04 ` [PATCH v4 5/7] drm/gpuvm: Add a flags field to drm_gpuvm_map_req/drm_gpuva_op_map Caterina Shablia
` (2 subsequent siblings)
6 siblings, 2 replies; 36+ messages in thread
From: Caterina Shablia @ 2025-07-07 17:04 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Danilo Krummrich, Boris Brezillon, Steven Price, Liviu Dudau,
Lucas De Marchi, Thomas Hellström, Rodrigo Vivi
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina,
Caterina Shablia
From: Boris Brezillon <boris.brezillon@collabora.com>
We are going to add flags/properties that will impact the VA merging
ability. Instead of sprinkling tests all over the place in
__drm_gpuvm_sm_map(), let's add a helper aggregating all these checks
can call it for every existing VA we walk through in the
__drm_gpuvm_sm_map() loop.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
---
drivers/gpu/drm/drm_gpuvm.c | 47 +++++++++++++++++++++++++++++--------
1 file changed, 37 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
index 05978c5c38b1..dc3c2f906400 100644
--- a/drivers/gpu/drm/drm_gpuvm.c
+++ b/drivers/gpu/drm/drm_gpuvm.c
@@ -2098,12 +2098,48 @@ op_unmap_cb(const struct drm_gpuvm_ops *fn, void *priv,
return fn->sm_step_unmap(&op, priv);
}
+static bool can_merge(struct drm_gpuvm *gpuvm, const struct drm_gpuva *a,
+ const struct drm_gpuva *b)
+{
+ /* Only GEM-based mappings can be merged, and they must point to
+ * the same GEM object.
+ */
+ if (a->gem.obj != b->gem.obj || !a->gem.obj)
+ return false;
+
+ /* Let's keep things simple for now and force all flags to match. */
+ if (a->flags != b->flags)
+ return false;
+
+ /* Order VAs for the rest of the checks. */
+ if (a->va.addr > b->va.addr)
+ swap(a, b);
+
+ /* We assume the caller already checked that VAs overlap or are
+ * contiguous.
+ */
+ if (drm_WARN_ON(gpuvm->drm, b->va.addr > a->va.addr + a->va.range))
+ return false;
+
+ /* We intentionally ignore u64 underflows because all we care about
+ * here is whether the VA diff matches the GEM offset diff.
+ */
+ return b->va.addr - a->va.addr == b->gem.offset - a->gem.offset;
+}
+
static int
__drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
const struct drm_gpuvm_ops *ops, void *priv,
const struct drm_gpuvm_map_req *req)
{
struct drm_gpuva *va, *next;
+ struct drm_gpuva reqva = {
+ .va.addr = req->va.addr,
+ .va.range = req->va.range,
+ .gem.offset = req->gem.offset,
+ .gem.obj = req->gem.obj,
+ .flags = req->flags,
+ };
u64 req_end = req->va.addr + req->va.range;
int ret;
@@ -2116,12 +2152,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
u64 addr = va->va.addr;
u64 range = va->va.range;
u64 end = addr + range;
- bool merge = !!va->gem.obj;
+ bool merge = can_merge(gpuvm, va, &reqva);
if (addr == req->va.addr) {
- merge &= obj == req->gem.obj &&
- offset == req->gem.offset;
-
if (end == req_end) {
ret = op_unmap_cb(ops, priv, va, merge);
if (ret)
@@ -2163,8 +2196,6 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
};
struct drm_gpuva_op_unmap u = { .va = va };
- merge &= obj == req->gem.obj &&
- offset + ls_range == req->gem.offset;
u.keep = merge;
if (end == req_end) {
@@ -2196,10 +2227,6 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
break;
}
} else if (addr > req->va.addr) {
- merge &= obj == req->gem.obj &&
- offset == req->gem.offset +
- (addr - req->va.addr);
-
if (end == req_end) {
ret = op_unmap_cb(ops, priv, va, merge);
if (ret)
--
2.47.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v4 5/7] drm/gpuvm: Add a flags field to drm_gpuvm_map_req/drm_gpuva_op_map
2025-07-07 17:04 [PATCH v4 0/7] drm/panthor: support repeated mappings Caterina Shablia
` (3 preceding siblings ...)
2025-07-07 17:04 ` [PATCH v4 4/7] drm/gpuvm: Add a helper to check if two VA can be merged Caterina Shablia
@ 2025-07-07 17:04 ` Caterina Shablia
2025-07-07 19:03 ` Danilo Krummrich
2025-07-22 19:21 ` Adrian Larumbe
2025-07-07 17:04 ` [PATCH v4 6/7] drm/gpuvm: Add DRM_GPUVA_REPEAT flag and logic Caterina Shablia
2025-07-07 17:04 ` [PATCH v4 7/7] drm/panthor: Add support for repeated mappings Caterina Shablia
6 siblings, 2 replies; 36+ messages in thread
From: Caterina Shablia @ 2025-07-07 17:04 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Danilo Krummrich, Boris Brezillon, Steven Price, Liviu Dudau,
Lucas De Marchi, Thomas Hellström, Rodrigo Vivi
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina,
Caterina Shablia
From: Asahi Lina <lina@asahilina.net>
drm_gpuva objects have a flags field. Currently, this can be managed by
drivers out-of-band, without any special handling in drm_gpuvm.
To be able to introduce flags that do affect the logic in the drm_gpuvm
core, we need to plumb it through the map calls. This will allow the
core to check the flags on map and alter the merge/split logic depending
on the requested flags and the flags of the existing drm_gpuva ranges
that are being split.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
---
drivers/gpu/drm/drm_gpuvm.c | 7 +++++++
include/drm/drm_gpuvm.h | 9 +++++++++
2 files changed, 16 insertions(+)
diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
index dc3c2f906400..dd949a8853b0 100644
--- a/drivers/gpu/drm/drm_gpuvm.c
+++ b/drivers/gpu/drm/drm_gpuvm.c
@@ -2063,6 +2063,7 @@ op_map_cb(const struct drm_gpuvm_ops *fn, void *priv,
op.map.va.range = req->va.range;
op.map.gem.obj = req->gem.obj;
op.map.gem.offset = req->gem.offset;
+ op.map.flags = req->flags;
return fn->sm_step_map(&op, priv);
}
@@ -2175,6 +2176,7 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
.va.range = range - req->va.range,
.gem.obj = obj,
.gem.offset = offset + req->va.range,
+ .flags = va->flags,
};
struct drm_gpuva_op_unmap u = {
.va = va,
@@ -2193,6 +2195,7 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
.va.range = ls_range,
.gem.obj = obj,
.gem.offset = offset,
+ .flags = va->flags,
};
struct drm_gpuva_op_unmap u = { .va = va };
@@ -2219,6 +2222,7 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
.gem.obj = obj,
.gem.offset = offset + ls_range +
req->va.range,
+ .flags = va->flags,
};
ret = op_remap_cb(ops, priv, &p, &n, &u);
@@ -2247,6 +2251,7 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
.va.range = end - req_end,
.gem.obj = obj,
.gem.offset = offset + req_end - addr,
+ .flags = va->flags,
};
struct drm_gpuva_op_unmap u = {
.va = va,
@@ -2290,6 +2295,7 @@ __drm_gpuvm_sm_unmap(struct drm_gpuvm *gpuvm,
prev.va.range = req_addr - addr;
prev.gem.obj = obj;
prev.gem.offset = offset;
+ prev.flags = va->flags;
prev_split = true;
}
@@ -2299,6 +2305,7 @@ __drm_gpuvm_sm_unmap(struct drm_gpuvm *gpuvm,
next.va.range = end - req_end;
next.gem.obj = obj;
next.gem.offset = offset + (req_end - addr);
+ next.flags = va->flags;
next_split = true;
}
diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
index a6e6c33fc10b..f77a89e791f1 100644
--- a/include/drm/drm_gpuvm.h
+++ b/include/drm/drm_gpuvm.h
@@ -847,6 +847,11 @@ struct drm_gpuva_op_map {
*/
struct drm_gem_object *obj;
} gem;
+
+ /**
+ * @flags: requested flags for the &drm_gpuva for this mapping
+ */
+ enum drm_gpuva_flags flags;
};
/**
@@ -1074,6 +1079,9 @@ struct drm_gpuvm_map_req {
/** @offset: offset in the GEM */
u64 offset;
} gem;
+
+ /** @flags: combination of DRM_GPUVA_ flags describing the mapping properties. */
+ enum drm_gpuva_flags flags;
};
struct drm_gpuva_ops *
@@ -1097,6 +1105,7 @@ void drm_gpuva_ops_free(struct drm_gpuvm *gpuvm,
static inline void drm_gpuva_init_from_op(struct drm_gpuva *va,
struct drm_gpuva_op_map *op)
{
+ va->flags = op->flags;
va->va.addr = op->va.addr;
va->va.range = op->va.range;
va->gem.obj = op->gem.obj;
--
2.47.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v4 6/7] drm/gpuvm: Add DRM_GPUVA_REPEAT flag and logic
2025-07-07 17:04 [PATCH v4 0/7] drm/panthor: support repeated mappings Caterina Shablia
` (4 preceding siblings ...)
2025-07-07 17:04 ` [PATCH v4 5/7] drm/gpuvm: Add a flags field to drm_gpuvm_map_req/drm_gpuva_op_map Caterina Shablia
@ 2025-07-07 17:04 ` Caterina Shablia
2025-07-07 19:33 ` Danilo Krummrich
2025-07-07 17:04 ` [PATCH v4 7/7] drm/panthor: Add support for repeated mappings Caterina Shablia
6 siblings, 1 reply; 36+ messages in thread
From: Caterina Shablia @ 2025-07-07 17:04 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Danilo Krummrich, Boris Brezillon, Steven Price, Liviu Dudau,
Lucas De Marchi, Thomas Hellström, Rodrigo Vivi
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina,
Caterina Shablia, Asahi Lina
From: Asahi Lina <lina+kernel@asahilina.net>
To be able to support "fake sparse" mappings without relying on GPU page
fault handling, drivers may need to create large (e.g. 4GiB) mappings of
the same page repeatedly (or same range of pages). Doing this through
individual mappings would be very wasteful. This can be handled better
by using a flag on map creation, but to do it safely, drm_gpuvm needs to
be aware of this special case.
Add a flag that signals that a given mapping is a page mapping, which is
repeated all over the entire requested VA range. This tweaks the
sm_map() logic to treat the GEM offsets differently when mappings are
a repeated ones so they are not incremented as they would be with regular
mappings.
The size of the GEM portion to repeat is passed through
drm_gpuva::gem::range. Most of the time it will be a page size, but
it can be bigger as long as it's less that drm_gpuva::va::range, and
drm_gpuva::gem::range is a multiple of drm_gpuva::va::range.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
---
drivers/gpu/drm/drm_gpuvm.c | 72 +++++++++++++++++++++++++++++++++----
include/drm/drm_gpuvm.h | 43 +++++++++++++++++++++-
2 files changed, 108 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
index dd949a8853b0..817393596149 100644
--- a/drivers/gpu/drm/drm_gpuvm.c
+++ b/drivers/gpu/drm/drm_gpuvm.c
@@ -2063,6 +2063,7 @@ op_map_cb(const struct drm_gpuvm_ops *fn, void *priv,
op.map.va.range = req->va.range;
op.map.gem.obj = req->gem.obj;
op.map.gem.offset = req->gem.offset;
+ op.map.gem.range = req->gem.range;
op.map.flags = req->flags;
return fn->sm_step_map(&op, priv);
@@ -2122,12 +2123,53 @@ static bool can_merge(struct drm_gpuvm *gpuvm, const struct drm_gpuva *a,
if (drm_WARN_ON(gpuvm->drm, b->va.addr > a->va.addr + a->va.range))
return false;
+ if (a->flags & DRM_GPUVA_REPEAT) {
+ u64 va_diff = b->va.addr - a->va.addr;
+
+ /* If this is a repeated mapping, both the GEM range
+ * and offset must match.
+ */
+ if (a->gem.range != b->gem.range ||
+ a->gem.offset != b->gem.offset)
+ return false;
+
+ /* The difference between the VA addresses must be a
+ * multiple of the repeated range, otherwise there's
+ * a shift.
+ */
+ if (do_div(va_diff, a->gem.range))
+ return false;
+
+ return true;
+ }
+
/* We intentionally ignore u64 underflows because all we care about
* here is whether the VA diff matches the GEM offset diff.
*/
return b->va.addr - a->va.addr == b->gem.offset - a->gem.offset;
}
+static int check_map_req(struct drm_gpuvm *gpuvm,
+ const struct drm_gpuvm_map_req *req)
+{
+ if (unlikely(!drm_gpuvm_range_valid(gpuvm, req->va.addr, req->va.range)))
+ return -EINVAL;
+
+ if (req->flags & DRM_GPUVA_REPEAT) {
+ u64 va_range = req->va.range;
+
+ /* For a repeated mapping, GEM range must be > 0
+ * and a multiple of the VA range.
+ */
+ if (unlikely(!req->gem.range ||
+ va_range < req->gem.range ||
+ do_div(va_range, req->gem.range)))
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static int
__drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
const struct drm_gpuvm_ops *ops, void *priv,
@@ -2137,6 +2179,7 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
struct drm_gpuva reqva = {
.va.addr = req->va.addr,
.va.range = req->va.range,
+ .gem.range = req->gem.range,
.gem.offset = req->gem.offset,
.gem.obj = req->gem.obj,
.flags = req->flags,
@@ -2144,7 +2187,8 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
u64 req_end = req->va.addr + req->va.range;
int ret;
- if (unlikely(!drm_gpuvm_range_valid(gpuvm, req->va.addr, req->va.range)))
+ ret = check_map_req(gpuvm, req);
+ if (unlikely(ret))
return -EINVAL;
drm_gpuvm_for_each_va_range_safe(va, next, gpuvm, req->va.addr, req_end) {
@@ -2175,7 +2219,8 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
.va.addr = req_end,
.va.range = range - req->va.range,
.gem.obj = obj,
- .gem.offset = offset + req->va.range,
+ .gem.range = va->gem.range,
+ .gem.offset = offset,
.flags = va->flags,
};
struct drm_gpuva_op_unmap u = {
@@ -2183,6 +2228,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
.keep = merge,
};
+ if (!(va->flags & DRM_GPUVA_REPEAT))
+ n.gem.offset += req->va.range;
+
ret = op_remap_cb(ops, priv, NULL, &n, &u);
if (ret)
return ret;
@@ -2194,6 +2242,7 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
.va.addr = addr,
.va.range = ls_range,
.gem.obj = obj,
+ .gem.range = va->gem.range,
.gem.offset = offset,
.flags = va->flags,
};
@@ -2220,11 +2269,14 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
.va.addr = req_end,
.va.range = end - req_end,
.gem.obj = obj,
- .gem.offset = offset + ls_range +
- req->va.range,
+ .gem.range = va->gem.range,
+ .gem.offset = offset,
.flags = va->flags,
};
+ if (!(va->flags & DRM_GPUVA_REPEAT))
+ n.gem.offset += ls_range + req->va.range;
+
ret = op_remap_cb(ops, priv, &p, &n, &u);
if (ret)
return ret;
@@ -2250,7 +2302,8 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
.va.addr = req_end,
.va.range = end - req_end,
.gem.obj = obj,
- .gem.offset = offset + req_end - addr,
+ .gem.range = va->gem.range,
+ .gem.offset = offset,
.flags = va->flags,
};
struct drm_gpuva_op_unmap u = {
@@ -2258,6 +2311,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
.keep = merge,
};
+ if (!(va->flags & DRM_GPUVA_REPEAT))
+ n.gem.offset += req_end - addr;
+
ret = op_remap_cb(ops, priv, NULL, &n, &u);
if (ret)
return ret;
@@ -2294,6 +2350,7 @@ __drm_gpuvm_sm_unmap(struct drm_gpuvm *gpuvm,
prev.va.addr = addr;
prev.va.range = req_addr - addr;
prev.gem.obj = obj;
+ prev.gem.range = va->gem.range;
prev.gem.offset = offset;
prev.flags = va->flags;
@@ -2304,7 +2361,10 @@ __drm_gpuvm_sm_unmap(struct drm_gpuvm *gpuvm,
next.va.addr = req_end;
next.va.range = end - req_end;
next.gem.obj = obj;
- next.gem.offset = offset + (req_end - addr);
+ prev.gem.range = va->gem.range;
+ next.gem.offset = offset;
+ if (!(va->flags & DRM_GPUVA_REPEAT))
+ next.gem.offset += req_end - addr;
next.flags = va->flags;
next_split = true;
diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
index f77a89e791f1..629e8508f99f 100644
--- a/include/drm/drm_gpuvm.h
+++ b/include/drm/drm_gpuvm.h
@@ -56,10 +56,19 @@ enum drm_gpuva_flags {
*/
DRM_GPUVA_SPARSE = (1 << 1),
+ /**
+ * @DRM_GPUVA_REPEAT:
+ *
+ * Flag indicating that the &drm_gpuva is a mapping of a GEM
+ * portion repeated multiple times to fill the virtual address
+ * range.
+ */
+ DRM_GPUVA_REPEAT = (1 << 2),
+
/**
* @DRM_GPUVA_USERBITS: user defined bits
*/
- DRM_GPUVA_USERBITS = (1 << 2),
+ DRM_GPUVA_USERBITS = (1 << 3),
};
/**
@@ -111,6 +120,18 @@ struct drm_gpuva {
*/
u64 offset;
+ /*
+ * @gem.range: the range of the GEM that is mapped
+ *
+ * When dealing with normal mappings, this must be zero.
+ * When flags has DRM_GPUVA_REPEAT set, this field must be
+ * smaller than va.range and va.range must be a multiple of
+ * gem.range.
+ * This is a u32 not a u64 because we expect repeated mappings
+ * to be pointing to relatively small portions of a GEM object.
+ */
+ u32 range;
+
/**
* @gem.obj: the mapped &drm_gem_object
*/
@@ -842,6 +863,17 @@ struct drm_gpuva_op_map {
*/
u64 offset;
+ /*
+ * @gem.range: the range of the GEM that is mapped
+ *
+ * When dealing with normal mappings, this must be zero.
+ * When flags has DRM_GPUVA_REPEAT set, it must be smaller
+ * and be a multiple of va.range. This is a u32 not a u64
+ * because we expect repeated mappings to be pointing to
+ * a relatively small portion of a GEM object.
+ */
+ u32 range;
+
/**
* @gem.obj: the &drm_gem_object to map
*/
@@ -1078,6 +1110,15 @@ struct drm_gpuvm_map_req {
/** @offset: offset in the GEM */
u64 offset;
+
+ /**
+ * @range: size of the range of the GEM object to map
+ *
+ * Must be zero unless flags has DRM_GPUVA_REPEAT set.
+ * If DRM_GPUVA_REPEAT is set, this field must be less than va.range,
+ * and va.range must be a multiple of gem.range.
+ */
+ u32 range;
} gem;
/** @flags: combination of DRM_GPUVA_ flags describing the mapping properties. */
--
2.47.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v4 7/7] drm/panthor: Add support for repeated mappings
2025-07-07 17:04 [PATCH v4 0/7] drm/panthor: support repeated mappings Caterina Shablia
` (5 preceding siblings ...)
2025-07-07 17:04 ` [PATCH v4 6/7] drm/gpuvm: Add DRM_GPUVA_REPEAT flag and logic Caterina Shablia
@ 2025-07-07 17:04 ` Caterina Shablia
2025-07-11 14:03 ` Steven Price
6 siblings, 1 reply; 36+ messages in thread
From: Caterina Shablia @ 2025-07-07 17:04 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Danilo Krummrich, Boris Brezillon, Steven Price, Liviu Dudau,
Lucas De Marchi, Thomas Hellström, Rodrigo Vivi
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina,
Caterina Shablia
From: Boris Brezillon <boris.brezillon@collabora.com>
This allows us to optimize mapping of a relatively small
portion of a BO over and over in a large VA range, which
is useful to support Vulkan sparse bindings in an efficient
way.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Co-developed-by: Caterina Shablia <caterina.shablia@collabora.com>
Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
---
drivers/gpu/drm/panthor/panthor_drv.c | 3 +-
drivers/gpu/drm/panthor/panthor_mmu.c | 78 ++++++++++++++++++++++++---
include/uapi/drm/panthor_drm.h | 23 ++++++++
3 files changed, 95 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 1116f2d2826e..585c07b07c42 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1608,6 +1608,7 @@ static void panthor_debugfs_init(struct drm_minor *minor)
* - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag
* - 1.4 - adds DRM_IOCTL_PANTHOR_BO_SET_LABEL ioctl
* - 1.5 - adds DRM_PANTHOR_SET_USER_MMIO_OFFSET ioctl
+ * - 1.6 - adds DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT flag
*/
static const struct drm_driver panthor_drm_driver = {
.driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ |
@@ -1621,7 +1622,7 @@ static const struct drm_driver panthor_drm_driver = {
.name = "panthor",
.desc = "Panthor DRM driver",
.major = 1,
- .minor = 5,
+ .minor = 6,
.gem_create_object = panthor_gem_create_object,
.gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table,
diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
index f0a22b775958..4ce9fff67d69 100644
--- a/drivers/gpu/drm/panthor/panthor_mmu.c
+++ b/drivers/gpu/drm/panthor/panthor_mmu.c
@@ -202,6 +202,9 @@ struct panthor_vm_op_ctx {
/** @map.bo_offset: Offset in the buffer object. */
u64 bo_offset;
+ /** @map.bo_repeat_range: Repeated BO range. */
+ u32 bo_repeat_range;
+
/**
* @map.sgt: sg-table pointing to pages backing the GEM object.
*
@@ -1007,6 +1010,26 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64 iova, int prot,
return 0;
}
+static int
+panthor_vm_repeated_map_pages(struct panthor_vm *vm, u64 iova, int prot,
+ struct sg_table *sgt, u64 offset, u64 size,
+ u64 count)
+{
+ /* FIXME: we really need to optimize this at the io_pgtable level. */
+ for (u64 i = 0; i < count; i++) {
+ int ret;
+
+ ret = panthor_vm_map_pages(vm, iova + (size * i), prot,
+ sgt, offset, size);
+ if (ret) {
+ panthor_vm_unmap_pages(vm, iova, size * (i - 1));
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
static int flags_to_prot(u32 flags)
{
int prot = 0;
@@ -1203,12 +1226,14 @@ panthor_vm_op_ctx_prealloc_vmas(struct panthor_vm_op_ctx *op_ctx)
(DRM_PANTHOR_VM_BIND_OP_MAP_READONLY | \
DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC | \
DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED | \
+ DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT | \
DRM_PANTHOR_VM_BIND_OP_TYPE_MASK)
static int panthor_vm_prepare_map_op_ctx(struct panthor_vm_op_ctx *op_ctx,
struct panthor_vm *vm,
struct panthor_gem_object *bo,
u64 offset,
+ u32 repeat_range,
u64 size, u64 va,
u32 flags)
{
@@ -1224,9 +1249,22 @@ static int panthor_vm_prepare_map_op_ctx(struct panthor_vm_op_ctx *op_ctx,
(flags & DRM_PANTHOR_VM_BIND_OP_TYPE_MASK) != DRM_PANTHOR_VM_BIND_OP_TYPE_MAP)
return -EINVAL;
- /* Make sure the VA and size are aligned and in-bounds. */
- if (size > bo->base.base.size || offset > bo->base.base.size - size)
- return -EINVAL;
+ if (!(flags & DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT)) {
+ /* Make sure the VA and size are in-bounds. */
+ if (size > bo->base.base.size || offset > bo->base.base.size - size)
+ return -EINVAL;
+ } else {
+ /* Make sure the repeat_range is in-bounds. */
+ if (repeat_range > bo->base.base.size || offset > bo->base.base.size - repeat_range)
+ return -EINVAL;
+
+ /* Make sure size is a multiple of repeat_range */
+
+ u64 repeat_count = size;
+
+ if (do_div(repeat_count, repeat_range))
+ return -EINVAL;
+ }
/* If the BO has an exclusive VM attached, it can't be mapped to other VMs. */
if (bo->exclusive_vm_root_gem &&
@@ -1295,6 +1333,7 @@ static int panthor_vm_prepare_map_op_ctx(struct panthor_vm_op_ctx *op_ctx,
drm_gem_shmem_unpin(&bo->base);
op_ctx->map.bo_offset = offset;
+ op_ctx->map.bo_repeat_range = repeat_range;
/* L1, L2 and L3 page tables.
* We could optimize L3 allocation by iterating over the sgt and merging
@@ -2112,9 +2151,22 @@ static int panthor_gpuva_sm_step_map(struct drm_gpuva_op *op, void *priv)
panthor_vma_init(vma, op_ctx->flags & PANTHOR_VM_MAP_FLAGS);
- ret = panthor_vm_map_pages(vm, op->map.va.addr, flags_to_prot(vma->flags),
- op_ctx->map.sgt, op->map.gem.offset,
- op->map.va.range);
+ if (op_ctx->flags & DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT) {
+ u64 repeat_count = op->map.va.range;
+
+ do_div(repeat_count, op->map.gem.range);
+ ret = panthor_vm_repeated_map_pages(vm, op->map.va.addr,
+ flags_to_prot(vma->flags),
+ op_ctx->map.sgt,
+ op->map.gem.offset,
+ op->map.gem.range,
+ repeat_count);
+ } else {
+ ret = panthor_vm_map_pages(vm, op->map.va.addr,
+ flags_to_prot(vma->flags),
+ op_ctx->map.sgt, op->map.gem.offset,
+ op->map.va.range);
+ }
if (ret)
return ret;
@@ -2237,7 +2289,7 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct panthor_vm_op_ctx *op,
switch (op_type) {
case DRM_PANTHOR_VM_BIND_OP_TYPE_MAP: {
- const struct drm_gpuvm_map_req map_req = {
+ struct drm_gpuvm_map_req map_req = {
.va.addr = op->va.addr,
.va.range = op->va.range,
.gem.obj = op->map.vm_bo->obj,
@@ -2249,6 +2301,11 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct panthor_vm_op_ctx *op,
break;
}
+ if (op->flags & DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT) {
+ map_req.flags |= DRM_GPUVA_REPEAT;
+ map_req.gem.range = op->map.bo_repeat_range;
+ }
+
ret = drm_gpuvm_sm_map(&vm->base, vm, &map_req);
break;
}
@@ -2497,6 +2554,7 @@ panthor_vm_bind_prepare_op_ctx(struct drm_file *file,
ret = panthor_vm_prepare_map_op_ctx(op_ctx, vm,
gem ? to_panthor_bo(gem) : NULL,
op->bo_offset,
+ op->bo_repeat_range,
op->size,
op->va,
op->flags);
@@ -2698,7 +2756,11 @@ int panthor_vm_map_bo_range(struct panthor_vm *vm, struct panthor_gem_object *bo
struct panthor_vm_op_ctx op_ctx;
int ret;
- ret = panthor_vm_prepare_map_op_ctx(&op_ctx, vm, bo, offset, size, va, flags);
+ /* TODO: would be nice to replace with assert instead */
+ if (flags & DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT)
+ return -EINVAL;
+
+ ret = panthor_vm_prepare_map_op_ctx(&op_ctx, vm, bo, offset, 0, size, va, flags);
if (ret)
return ret;
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index e1f43deb7eca..ad278bc234b0 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -496,6 +496,17 @@ enum drm_panthor_vm_bind_op_flags {
*/
DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED = 1 << 2,
+ /**
+ * @DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT: Repeat a BO range
+ *
+ * Only valid with DRM_PANTHOR_VM_BIND_OP_TYPE_MAP.
+ *
+ * When this is set, a BO range is repeated over the VA range.
+ * drm_panthor_vm_bind_op::bo_repeat_range defines the size of the
+ * BO range to repeat.
+ */
+ DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT = 1 << 3,
+
/**
* @DRM_PANTHOR_VM_BIND_OP_TYPE_MASK: Mask used to determine the type of operation.
*/
@@ -560,6 +571,18 @@ struct drm_panthor_vm_bind_op {
*/
struct drm_panthor_obj_array syncs;
+ /**
+ * @bo_repeat_range: The size of the range to be repeated.
+ *
+ * Must be zero if DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT is not set in
+ * flags.
+ *
+ * Size must be a multiple of bo_repeat_range.
+ */
+ __u32 bo_repeat_range;
+
+ /** @pad: Padding field. MBZ. */
+ __u32 pad;
};
/**
--
2.47.2
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v4 2/7] drm/gpuvm: Kill drm_gpuva_init()
2025-07-07 17:04 ` [PATCH v4 2/7] drm/gpuvm: Kill drm_gpuva_init() Caterina Shablia
@ 2025-07-07 18:41 ` Danilo Krummrich
2025-07-11 13:30 ` Steven Price
1 sibling, 0 replies; 36+ messages in thread
From: Danilo Krummrich @ 2025-07-07 18:41 UTC (permalink / raw)
To: Caterina Shablia
Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Boris Brezillon, Steven Price, Liviu Dudau, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, dri-devel, linux-kernel,
nouveau, intel-xe, asahi, Asahi Lina
On Mon Jul 7, 2025 at 7:04 PM CEST, Caterina Shablia wrote:
> From: Boris Brezillon <boris.brezillon@collabora.com>
>
> drm_gpuva_init() only has one internal user, and given we are about to
> add new optional fields, it only add maintenance burden for no real
> benefit, so let's kill the thing now.
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 3/7] drm/gpuvm: Pass map arguments through a struct
2025-07-07 17:04 ` [PATCH v4 3/7] drm/gpuvm: Pass map arguments through a struct Caterina Shablia
@ 2025-07-07 18:44 ` Danilo Krummrich
2025-08-21 11:53 ` Boris Brezillon
0 siblings, 1 reply; 36+ messages in thread
From: Danilo Krummrich @ 2025-07-07 18:44 UTC (permalink / raw)
To: Caterina Shablia
Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Boris Brezillon, Steven Price, Liviu Dudau, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, dri-devel, linux-kernel,
nouveau, intel-xe, asahi, Asahi Lina
On Mon Jul 7, 2025 at 7:04 PM CEST, Caterina Shablia wrote:
> +/**
> + * struct drm_gpuvm_map_req - arguments passed to drm_gpuvm_sm_map[_ops_create]()
> + */
> +struct drm_gpuvm_map_req {
> + /** @va: virtual address related fields */
> + struct {
> + /** @va.addr: start of the virtual address range to map to */
> + u64 addr;
> +
> + /** @va.size: size of the virtual address range to map to */
> + u64 range;
> + } va;
> +
> + /** @gem: GEM related fields */
> + struct {
> + /**
> + * @obj: GEM object to map.
> + *
> + * Can be NULL if the virtual range is not backed by a GEM object.
> + */
> + struct drm_gem_object *obj;
> +
> + /** @offset: offset in the GEM */
> + u64 offset;
> + } gem;
> +};
Can't we just use struct drm_gpuva_op_map instead? The structure is identical
and represents exactly what we want, a map operation.
Let's use that instead of redefining an identical structure.
With that,
Acked-by: Danilo Krummrich <dakr@kernel.org>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 4/7] drm/gpuvm: Add a helper to check if two VA can be merged
2025-07-07 17:04 ` [PATCH v4 4/7] drm/gpuvm: Add a helper to check if two VA can be merged Caterina Shablia
@ 2025-07-07 19:00 ` Danilo Krummrich
2025-07-07 19:06 ` Danilo Krummrich
2025-08-21 12:06 ` Boris Brezillon
2025-07-22 19:17 ` Adrian Larumbe
1 sibling, 2 replies; 36+ messages in thread
From: Danilo Krummrich @ 2025-07-07 19:00 UTC (permalink / raw)
To: Caterina Shablia
Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Boris Brezillon, Steven Price, Liviu Dudau, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, dri-devel, linux-kernel,
nouveau, intel-xe, asahi, Asahi Lina
On Mon Jul 7, 2025 at 7:04 PM CEST, Caterina Shablia wrote:
> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> index 05978c5c38b1..dc3c2f906400 100644
> --- a/drivers/gpu/drm/drm_gpuvm.c
> +++ b/drivers/gpu/drm/drm_gpuvm.c
> @@ -2098,12 +2098,48 @@ op_unmap_cb(const struct drm_gpuvm_ops *fn, void *priv,
> return fn->sm_step_unmap(&op, priv);
> }
>
> +static bool can_merge(struct drm_gpuvm *gpuvm, const struct drm_gpuva *a,
> + const struct drm_gpuva *b)
> +{
> + /* Only GEM-based mappings can be merged, and they must point to
> + * the same GEM object.
> + */
> + if (a->gem.obj != b->gem.obj || !a->gem.obj)
> + return false;
> +
> + /* Let's keep things simple for now and force all flags to match. */
> + if (a->flags != b->flags)
> + return false;
> +
> + /* Order VAs for the rest of the checks. */
> + if (a->va.addr > b->va.addr)
> + swap(a, b);
> +
> + /* We assume the caller already checked that VAs overlap or are
> + * contiguous.
> + */
> + if (drm_WARN_ON(gpuvm->drm, b->va.addr > a->va.addr + a->va.range))
> + return false;
> +
> + /* We intentionally ignore u64 underflows because all we care about
> + * here is whether the VA diff matches the GEM offset diff.
> + */
> + return b->va.addr - a->va.addr == b->gem.offset - a->gem.offset;
> +}
> +
> static int
> __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> const struct drm_gpuvm_ops *ops, void *priv,
> const struct drm_gpuvm_map_req *req)
> {
> struct drm_gpuva *va, *next;
> + struct drm_gpuva reqva = {
> + .va.addr = req->va.addr,
> + .va.range = req->va.range,
> + .gem.offset = req->gem.offset,
> + .gem.obj = req->gem.obj,
> + .flags = req->flags,
Huh? Where does req->flags come from? I don't remember that this flag exists in
struct drm_gpuvm_map_req in the preceding patch?
> + };
> u64 req_end = req->va.addr + req->va.range;
> int ret;
>
> @@ -2116,12 +2152,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> u64 addr = va->va.addr;
> u64 range = va->va.range;
> u64 end = addr + range;
> - bool merge = !!va->gem.obj;
> + bool merge = can_merge(gpuvm, va, &reqva);
I know you want to do the swap() trick above, but I don't like creating a
temporary struct drm_gpuva with all the other uninitialized fields.
If you really want this, can we please limit the scope? Maybe the following
helper:
static bool can_merge(struct drm_gpuvm *gpuvm,
const struct drm_gpuva *va,
struct drm_gpuvm_map_req *req)
{
struct drm_gpuva reqva = { ... };
return __can_merge(gpuvm, va, reqva);
}
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 5/7] drm/gpuvm: Add a flags field to drm_gpuvm_map_req/drm_gpuva_op_map
2025-07-07 17:04 ` [PATCH v4 5/7] drm/gpuvm: Add a flags field to drm_gpuvm_map_req/drm_gpuva_op_map Caterina Shablia
@ 2025-07-07 19:03 ` Danilo Krummrich
2025-07-22 19:21 ` Adrian Larumbe
1 sibling, 0 replies; 36+ messages in thread
From: Danilo Krummrich @ 2025-07-07 19:03 UTC (permalink / raw)
To: Caterina Shablia
Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Boris Brezillon, Steven Price, Liviu Dudau, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, dri-devel, linux-kernel,
nouveau, intel-xe, asahi, Asahi Lina
On Mon Jul 7, 2025 at 7:04 PM CEST, Caterina Shablia wrote:
> From: Asahi Lina <lina@asahilina.net>
>
> drm_gpuva objects have a flags field. Currently, this can be managed by
> drivers out-of-band, without any special handling in drm_gpuvm.
>
> To be able to introduce flags that do affect the logic in the drm_gpuvm
> core, we need to plumb it through the map calls. This will allow the
> core to check the flags on map and alter the merge/split logic depending
> on the requested flags and the flags of the existing drm_gpuva ranges
> that are being split.
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
> Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 4/7] drm/gpuvm: Add a helper to check if two VA can be merged
2025-07-07 19:00 ` Danilo Krummrich
@ 2025-07-07 19:06 ` Danilo Krummrich
2025-08-21 12:18 ` Boris Brezillon
2025-08-21 12:06 ` Boris Brezillon
1 sibling, 1 reply; 36+ messages in thread
From: Danilo Krummrich @ 2025-07-07 19:06 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Caterina Shablia, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
Matt Coster, Karol Herbst, Lyude Paul, Boris Brezillon,
Steven Price, Liviu Dudau, Lucas De Marchi, Thomas Hellström,
Rodrigo Vivi, dri-devel, linux-kernel, nouveau, intel-xe, asahi,
Asahi Lina
On Mon Jul 7, 2025 at 9:00 PM CEST, Danilo Krummrich wrote:
> On Mon Jul 7, 2025 at 7:04 PM CEST, Caterina Shablia wrote:
>> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
>> index 05978c5c38b1..dc3c2f906400 100644
>> --- a/drivers/gpu/drm/drm_gpuvm.c
>> +++ b/drivers/gpu/drm/drm_gpuvm.c
>> @@ -2098,12 +2098,48 @@ op_unmap_cb(const struct drm_gpuvm_ops *fn, void *priv,
>> return fn->sm_step_unmap(&op, priv);
>> }
>>
>> +static bool can_merge(struct drm_gpuvm *gpuvm, const struct drm_gpuva *a,
>> + const struct drm_gpuva *b)
>> +{
>> + /* Only GEM-based mappings can be merged, and they must point to
>> + * the same GEM object.
>> + */
>> + if (a->gem.obj != b->gem.obj || !a->gem.obj)
>> + return false;
>> +
>> + /* Let's keep things simple for now and force all flags to match. */
>> + if (a->flags != b->flags)
>> + return false;
Forgot to mention, this can include driver specific flags. How do we know from
the generic code whether this condition makes sense? *At least* it would need to
be documented.
However, I think it would be better to provide an optional callback for drivers
to check whether merge makes sense or not. This doesn't mean we need drivers to
do those common checks, this can remain here in the common code.
>> +
>> + /* Order VAs for the rest of the checks. */
>> + if (a->va.addr > b->va.addr)
>> + swap(a, b);
>> +
>> + /* We assume the caller already checked that VAs overlap or are
>> + * contiguous.
>> + */
>> + if (drm_WARN_ON(gpuvm->drm, b->va.addr > a->va.addr + a->va.range))
>> + return false;
>> +
>> + /* We intentionally ignore u64 underflows because all we care about
>> + * here is whether the VA diff matches the GEM offset diff.
>> + */
>> + return b->va.addr - a->va.addr == b->gem.offset - a->gem.offset;
>> +}
>> +
>> static int
>> __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>> const struct drm_gpuvm_ops *ops, void *priv,
>> const struct drm_gpuvm_map_req *req)
>> {
>> struct drm_gpuva *va, *next;
>> + struct drm_gpuva reqva = {
>> + .va.addr = req->va.addr,
>> + .va.range = req->va.range,
>> + .gem.offset = req->gem.offset,
>> + .gem.obj = req->gem.obj,
>> + .flags = req->flags,
>
> Huh? Where does req->flags come from? I don't remember that this flag exists in
> struct drm_gpuvm_map_req in the preceding patch?
>
>> + };
>> u64 req_end = req->va.addr + req->va.range;
>> int ret;
>>
>> @@ -2116,12 +2152,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
>> u64 addr = va->va.addr;
>> u64 range = va->va.range;
>> u64 end = addr + range;
>> - bool merge = !!va->gem.obj;
>> + bool merge = can_merge(gpuvm, va, &reqva);
>
> I know you want to do the swap() trick above, but I don't like creating a
> temporary struct drm_gpuva with all the other uninitialized fields.
>
> If you really want this, can we please limit the scope? Maybe the following
> helper:
>
> static bool can_merge(struct drm_gpuvm *gpuvm,
> const struct drm_gpuva *va,
> struct drm_gpuvm_map_req *req)
> {
> struct drm_gpuva reqva = { ... };
> return __can_merge(gpuvm, va, reqva);
> }
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 6/7] drm/gpuvm: Add DRM_GPUVA_REPEAT flag and logic
2025-07-07 17:04 ` [PATCH v4 6/7] drm/gpuvm: Add DRM_GPUVA_REPEAT flag and logic Caterina Shablia
@ 2025-07-07 19:33 ` Danilo Krummrich
2025-08-21 12:29 ` Boris Brezillon
0 siblings, 1 reply; 36+ messages in thread
From: Danilo Krummrich @ 2025-07-07 19:33 UTC (permalink / raw)
To: Caterina Shablia
Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Boris Brezillon, Steven Price, Liviu Dudau, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, dri-devel, linux-kernel,
nouveau, intel-xe, asahi, Asahi Lina, Asahi Lina
On Mon Jul 7, 2025 at 7:04 PM CEST, Caterina Shablia wrote:
> From: Asahi Lina <lina+kernel@asahilina.net>
>
> To be able to support "fake sparse" mappings without relying on GPU page
> fault handling, drivers may need to create large (e.g. 4GiB) mappings of
> the same page repeatedly (or same range of pages). Doing this through
> individual mappings would be very wasteful. This can be handled better
> by using a flag on map creation, but to do it safely, drm_gpuvm needs to
> be aware of this special case.
>
> Add a flag that signals that a given mapping is a page mapping, which is
> repeated all over the entire requested VA range. This tweaks the
> sm_map() logic to treat the GEM offsets differently when mappings are
> a repeated ones so they are not incremented as they would be with regular
> mappings.
>
> The size of the GEM portion to repeat is passed through
> drm_gpuva::gem::range. Most of the time it will be a page size, but
> it can be bigger as long as it's less that drm_gpuva::va::range, and
> drm_gpuva::gem::range is a multiple of drm_gpuva::va::range.
Should be "as long as it's less that drm_gpuva::va::range, and
drm_gpuva::va::range is a multiple of drm_gpuva::gem::range".
I also think this feature deserves its own section in the global GPUVM
documentation -- please add a corresponding paragraph.
> +static int check_map_req(struct drm_gpuvm *gpuvm,
Let's call this validate_map_request().
> + const struct drm_gpuvm_map_req *req)
> +{
> + if (unlikely(!drm_gpuvm_range_valid(gpuvm, req->va.addr, req->va.range)))
> + return -EINVAL;
> +
> + if (req->flags & DRM_GPUVA_REPEAT) {
> + u64 va_range = req->va.range;
> +
> + /* For a repeated mapping, GEM range must be > 0
> + * and a multiple of the VA range.
> + */
> + if (unlikely(!req->gem.range ||
> + va_range < req->gem.range ||
> + do_div(va_range, req->gem.range)))
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> static int
> __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> const struct drm_gpuvm_ops *ops, void *priv,
> @@ -2137,6 +2179,7 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> struct drm_gpuva reqva = {
> .va.addr = req->va.addr,
> .va.range = req->va.range,
> + .gem.range = req->gem.range,
> .gem.offset = req->gem.offset,
> .gem.obj = req->gem.obj,
> .flags = req->flags,
> @@ -2144,7 +2187,8 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> u64 req_end = req->va.addr + req->va.range;
> int ret;
>
> - if (unlikely(!drm_gpuvm_range_valid(gpuvm, req->va.addr, req->va.range)))
> + ret = check_map_req(gpuvm, req);
> + if (unlikely(ret))
> return -EINVAL;
>
> drm_gpuvm_for_each_va_range_safe(va, next, gpuvm, req->va.addr, req_end) {
> @@ -2175,7 +2219,8 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> .va.addr = req_end,
> .va.range = range - req->va.range,
> .gem.obj = obj,
> - .gem.offset = offset + req->va.range,
> + .gem.range = va->gem.range,
> + .gem.offset = offset,
Why change this from offset + req->va.range to just offset?
Same for similar other changes below.
Also it seems that we need to update the documentation which shows all potential
cases when calling __drm_gpuvm_sm_map() [1].
[1] https://docs.kernel.org/gpu/drm-mm.html#split-and-merge
> .flags = va->flags,
> };
> struct drm_gpuva_op_unmap u = {
> @@ -2183,6 +2228,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> .keep = merge,
> };
>
> + if (!(va->flags & DRM_GPUVA_REPEAT))
> + n.gem.offset += req->va.range;
> +
> ret = op_remap_cb(ops, priv, NULL, &n, &u);
> if (ret)
> return ret;
> @@ -2194,6 +2242,7 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> .va.addr = addr,
> .va.range = ls_range,
> .gem.obj = obj,
> + .gem.range = va->gem.range,
> .gem.offset = offset,
> .flags = va->flags,
> };
> @@ -2220,11 +2269,14 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> .va.addr = req_end,
> .va.range = end - req_end,
> .gem.obj = obj,
> - .gem.offset = offset + ls_range +
> - req->va.range,
> + .gem.range = va->gem.range,
> + .gem.offset = offset,
> .flags = va->flags,
> };
>
> + if (!(va->flags & DRM_GPUVA_REPEAT))
> + n.gem.offset += ls_range + req->va.range;
> +
> ret = op_remap_cb(ops, priv, &p, &n, &u);
> if (ret)
> return ret;
> @@ -2250,7 +2302,8 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> .va.addr = req_end,
> .va.range = end - req_end,
> .gem.obj = obj,
> - .gem.offset = offset + req_end - addr,
> + .gem.range = va->gem.range,
> + .gem.offset = offset,
> .flags = va->flags,
> };
> struct drm_gpuva_op_unmap u = {
> @@ -2258,6 +2311,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> .keep = merge,
> };
>
> + if (!(va->flags & DRM_GPUVA_REPEAT))
> + n.gem.offset += req_end - addr;
> +
> ret = op_remap_cb(ops, priv, NULL, &n, &u);
> if (ret)
> return ret;
> @@ -2294,6 +2350,7 @@ __drm_gpuvm_sm_unmap(struct drm_gpuvm *gpuvm,
> prev.va.addr = addr;
> prev.va.range = req_addr - addr;
> prev.gem.obj = obj;
> + prev.gem.range = va->gem.range;
> prev.gem.offset = offset;
> prev.flags = va->flags;
>
> @@ -2304,7 +2361,10 @@ __drm_gpuvm_sm_unmap(struct drm_gpuvm *gpuvm,
> next.va.addr = req_end;
> next.va.range = end - req_end;
> next.gem.obj = obj;
> - next.gem.offset = offset + (req_end - addr);
> + prev.gem.range = va->gem.range;
> + next.gem.offset = offset;
> + if (!(va->flags & DRM_GPUVA_REPEAT))
> + next.gem.offset += req_end - addr;
> next.flags = va->flags;
>
> next_split = true;
> diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
> index f77a89e791f1..629e8508f99f 100644
> --- a/include/drm/drm_gpuvm.h
> +++ b/include/drm/drm_gpuvm.h
> @@ -56,10 +56,19 @@ enum drm_gpuva_flags {
> */
> DRM_GPUVA_SPARSE = (1 << 1),
>
> + /**
> + * @DRM_GPUVA_REPEAT:
> + *
> + * Flag indicating that the &drm_gpuva is a mapping of a GEM
> + * portion repeated multiple times to fill the virtual address
"of a GEM object with a certain range that is repeated multiple times to ..."
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 1/7] drm/panthor: Add support for atomic page table updates
2025-07-07 17:04 ` [PATCH v4 1/7] drm/panthor: Add support for atomic page table updates Caterina Shablia
@ 2025-07-11 13:30 ` Steven Price
2025-07-15 15:08 ` Caterina Shablia
2025-08-21 11:36 ` Boris Brezillon
0 siblings, 2 replies; 36+ messages in thread
From: Steven Price @ 2025-07-11 13:30 UTC (permalink / raw)
To: Caterina Shablia, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
Matt Coster, Karol Herbst, Lyude Paul, Danilo Krummrich,
Boris Brezillon, Liviu Dudau, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
On 07/07/2025 18:04, Caterina Shablia wrote:
> From: Boris Brezillon <boris.brezillon@collabora.com>
>
> Move the lock/flush_mem operations around the gpuvm_sm_map() calls so
> we can implement true atomic page updates, where any access in the
> locked range done by the GPU has to wait for the page table updates
> to land before proceeding.
>
> This is needed for vkQueueBindSparse(), so we can replace the dummy
> page mapped over the entire object by actual BO backed pages in an atomic
> way.
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
> ---
> drivers/gpu/drm/panthor/panthor_mmu.c | 65 +++++++++++++++++++++++++--
> 1 file changed, 62 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> index b39ea6acc6a9..9caaba03c5eb 100644
> --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> @@ -387,6 +387,15 @@ struct panthor_vm {
> * flagged as faulty as a result.
> */
> bool unhandled_fault;
> +
> + /** @locked_region: Information about the currently locked region currently. */
> + struct {
> + /** @locked_region.start: Start of the locked region. */
> + u64 start;
> +
> + /** @locked_region.size: Size of the locked region. */
> + u64 size;
> + } locked_region;
> };
>
> /**
> @@ -775,6 +784,10 @@ int panthor_vm_active(struct panthor_vm *vm)
> }
>
> ret = panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab, transcfg, vm->memattr);
> + if (!ret && vm->locked_region.size) {
> + lock_region(ptdev, vm->as.id, vm->locked_region.start, vm->locked_region.size);
> + ret = wait_ready(ptdev, vm->as.id);
> + }
Do we need to do this? It seems odd to restore a MMU context and
immediately set a lock region. Is there a good reason we can't just
WARN_ON if there's a lock region set in panthor_vm_idle()?
I think we need to briefly take vm->op_lock to ensure synchronisation
but that doesn't seem a big issue. Or perhaps there's a good reason that
I'm missing?
>
> out_make_active:
> if (!ret) {
> @@ -902,6 +915,9 @@ static int panthor_vm_unmap_pages(struct panthor_vm *vm, u64 iova, u64 size)
> struct io_pgtable_ops *ops = vm->pgtbl_ops;
> u64 offset = 0;
>
> + drm_WARN_ON(&ptdev->base,
> + (iova < vm->locked_region.start) ||
> + (iova + size > vm->locked_region.start + vm->locked_region.size));
> drm_dbg(&ptdev->base, "unmap: as=%d, iova=%llx, len=%llx", vm->as.id, iova, size);
>
> while (offset < size) {
> @@ -915,13 +931,12 @@ static int panthor_vm_unmap_pages(struct panthor_vm *vm, u64 iova, u64 size)
> iova + offset + unmapped_sz,
> iova + offset + pgsize * pgcount,
> iova, iova + size);
> - panthor_vm_flush_range(vm, iova, offset + unmapped_sz);
> return -EINVAL;
> }
> offset += unmapped_sz;
> }
>
> - return panthor_vm_flush_range(vm, iova, size);
> + return 0;
> }
>
> static int
> @@ -938,6 +953,10 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64 iova, int prot,
> if (!size)
> return 0;
>
> + drm_WARN_ON(&ptdev->base,
> + (iova < vm->locked_region.start) ||
> + (iova + size > vm->locked_region.start + vm->locked_region.size));
> +
> for_each_sgtable_dma_sg(sgt, sgl, count) {
> dma_addr_t paddr = sg_dma_address(sgl);
> size_t len = sg_dma_len(sgl);
> @@ -985,7 +1004,7 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64 iova, int prot,
> offset = 0;
> }
>
> - return panthor_vm_flush_range(vm, start_iova, iova - start_iova);
> + return 0;
> }
>
> static int flags_to_prot(u32 flags)
> @@ -1654,6 +1673,38 @@ static const char *access_type_name(struct panthor_device *ptdev,
> }
> }
>
> +static int panthor_vm_lock_region(struct panthor_vm *vm, u64 start, u64 size)
> +{
> + struct panthor_device *ptdev = vm->ptdev;
> + int ret = 0;
> +
> + mutex_lock(&ptdev->mmu->as.slots_lock);
> + drm_WARN_ON(&ptdev->base, vm->locked_region.start || vm->locked_region.size);
> + vm->locked_region.start = start;
> + vm->locked_region.size = size;
> + if (vm->as.id >= 0) {
> + lock_region(ptdev, vm->as.id, start, size);
> + ret = wait_ready(ptdev, vm->as.id);
> + }
> + mutex_unlock(&ptdev->mmu->as.slots_lock);
> +
> + return ret;
> +}
> +
> +static void panthor_vm_unlock_region(struct panthor_vm *vm)
> +{
> + struct panthor_device *ptdev = vm->ptdev;
> +
> + mutex_lock(&ptdev->mmu->as.slots_lock);
> + if (vm->as.id >= 0) {
> + write_cmd(ptdev, vm->as.id, AS_COMMAND_FLUSH_MEM);
> + drm_WARN_ON(&ptdev->base, wait_ready(ptdev, vm->as.id));
> + }
> + vm->locked_region.start = 0;
> + vm->locked_region.size = 0;
> + mutex_unlock(&ptdev->mmu->as.slots_lock);
> +}
Do we need to include a drm_dev_enter() somewhere here? I note that
panthor_vm_flush_range() has one and you've effectively replaced that
code with the above.
Thanks,
Steve
> +
> static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status)
> {
> bool has_unhandled_faults = false;
> @@ -2179,6 +2230,11 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct panthor_vm_op_ctx *op,
>
> mutex_lock(&vm->op_lock);
> vm->op_ctx = op;
> +
> + ret = panthor_vm_lock_region(vm, op->va.addr, op->va.range);
> + if (ret)
> + goto out;
> +
> switch (op_type) {
> case DRM_PANTHOR_VM_BIND_OP_TYPE_MAP:
> if (vm->unusable) {
> @@ -2199,6 +2255,9 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct panthor_vm_op_ctx *op,
> break;
> }
>
> + panthor_vm_unlock_region(vm);
> +
> +out:
> if (ret && flag_vm_unusable_on_failure)
> vm->unusable = true;
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 2/7] drm/gpuvm: Kill drm_gpuva_init()
2025-07-07 17:04 ` [PATCH v4 2/7] drm/gpuvm: Kill drm_gpuva_init() Caterina Shablia
2025-07-07 18:41 ` Danilo Krummrich
@ 2025-07-11 13:30 ` Steven Price
1 sibling, 0 replies; 36+ messages in thread
From: Steven Price @ 2025-07-11 13:30 UTC (permalink / raw)
To: Caterina Shablia, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
Matt Coster, Karol Herbst, Lyude Paul, Danilo Krummrich,
Boris Brezillon, Liviu Dudau, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
On 07/07/2025 18:04, Caterina Shablia wrote:
> From: Boris Brezillon <boris.brezillon@collabora.com>
>
> drm_gpuva_init() only has one internal user, and given we are about to
> add new optional fields, it only add maintenance burden for no real
> benefit, so let's kill the thing now.
Commit 111fdd2198e6 ("drm/msm: drm_gpuvm conversion") has added another
use in msm.
Steve
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
> ---
> include/drm/drm_gpuvm.h | 15 ++++-----------
> 1 file changed, 4 insertions(+), 11 deletions(-)
>
> diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
> index 2a9629377633..6fdf2aff3e90 100644
> --- a/include/drm/drm_gpuvm.h
> +++ b/include/drm/drm_gpuvm.h
> @@ -160,15 +160,6 @@ struct drm_gpuva *drm_gpuva_find_first(struct drm_gpuvm *gpuvm,
> struct drm_gpuva *drm_gpuva_find_prev(struct drm_gpuvm *gpuvm, u64 start);
> struct drm_gpuva *drm_gpuva_find_next(struct drm_gpuvm *gpuvm, u64 end);
>
> -static inline void drm_gpuva_init(struct drm_gpuva *va, u64 addr, u64 range,
> - struct drm_gem_object *obj, u64 offset)
> -{
> - va->va.addr = addr;
> - va->va.range = range;
> - va->gem.obj = obj;
> - va->gem.offset = offset;
> -}
> -
> /**
> * drm_gpuva_invalidate() - sets whether the backing GEM of this &drm_gpuva is
> * invalidated
> @@ -1079,8 +1070,10 @@ void drm_gpuva_ops_free(struct drm_gpuvm *gpuvm,
> static inline void drm_gpuva_init_from_op(struct drm_gpuva *va,
> struct drm_gpuva_op_map *op)
> {
> - drm_gpuva_init(va, op->va.addr, op->va.range,
> - op->gem.obj, op->gem.offset);
> + va->va.addr = op->va.addr;
> + va->va.range = op->va.range;
> + va->gem.obj = op->gem.obj;
> + va->gem.offset = op->gem.offset;
> }
>
> /**
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 7/7] drm/panthor: Add support for repeated mappings
2025-07-07 17:04 ` [PATCH v4 7/7] drm/panthor: Add support for repeated mappings Caterina Shablia
@ 2025-07-11 14:03 ` Steven Price
2025-07-15 15:17 ` Caterina Shablia
0 siblings, 1 reply; 36+ messages in thread
From: Steven Price @ 2025-07-11 14:03 UTC (permalink / raw)
To: Caterina Shablia, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
Matt Coster, Karol Herbst, Lyude Paul, Danilo Krummrich,
Boris Brezillon, Liviu Dudau, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
On 07/07/2025 18:04, Caterina Shablia wrote:
> From: Boris Brezillon <boris.brezillon@collabora.com>
>
> This allows us to optimize mapping of a relatively small
> portion of a BO over and over in a large VA range, which
> is useful to support Vulkan sparse bindings in an efficient
> way.
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Co-developed-by: Caterina Shablia <caterina.shablia@collabora.com>
> Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
This looks like the right sort of shape. From an uAPI perspective I'm
not sure whether u32 is the right type for bo_repeat_range. While I
can't immediately see a use for having a large repeat range it seems a
little silly to close it off when we're going to have padding afterwards
anyway. Obviously the kernel would reject large values because the
internal APIs are only u32. But it would enable this to be fixed if we
ever discover a usecase without a uAPI change.
> ---
> drivers/gpu/drm/panthor/panthor_drv.c | 3 +-
> drivers/gpu/drm/panthor/panthor_mmu.c | 78 ++++++++++++++++++++++++---
> include/uapi/drm/panthor_drm.h | 23 ++++++++
> 3 files changed, 95 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index 1116f2d2826e..585c07b07c42 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -1608,6 +1608,7 @@ static void panthor_debugfs_init(struct drm_minor *minor)
> * - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag
> * - 1.4 - adds DRM_IOCTL_PANTHOR_BO_SET_LABEL ioctl
> * - 1.5 - adds DRM_PANTHOR_SET_USER_MMIO_OFFSET ioctl
> + * - 1.6 - adds DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT flag
> */
> static const struct drm_driver panthor_drm_driver = {
> .driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ |
> @@ -1621,7 +1622,7 @@ static const struct drm_driver panthor_drm_driver = {
> .name = "panthor",
> .desc = "Panthor DRM driver",
> .major = 1,
> - .minor = 5,
> + .minor = 6,
>
> .gem_create_object = panthor_gem_create_object,
> .gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table,
> diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> index f0a22b775958..4ce9fff67d69 100644
> --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> @@ -202,6 +202,9 @@ struct panthor_vm_op_ctx {
> /** @map.bo_offset: Offset in the buffer object. */
> u64 bo_offset;
>
> + /** @map.bo_repeat_range: Repeated BO range. */
> + u32 bo_repeat_range;
> +
> /**
> * @map.sgt: sg-table pointing to pages backing the GEM object.
> *
> @@ -1007,6 +1010,26 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64 iova, int prot,
> return 0;
> }
>
> +static int
> +panthor_vm_repeated_map_pages(struct panthor_vm *vm, u64 iova, int prot,
> + struct sg_table *sgt, u64 offset, u64 size,
> + u64 count)
> +{
> + /* FIXME: we really need to optimize this at the io_pgtable level. */
Do you have plans for optimizing this? How bad is the performance
without optimizing?
> + for (u64 i = 0; i < count; i++) {
> + int ret;
> +
> + ret = panthor_vm_map_pages(vm, iova + (size * i), prot,
> + sgt, offset, size);
> + if (ret) {
> + panthor_vm_unmap_pages(vm, iova, size * (i - 1));
> + return ret;
> + }
> + }
> +
> + return 0;
> +}
> +
> static int flags_to_prot(u32 flags)
> {
> int prot = 0;
> @@ -1203,12 +1226,14 @@ panthor_vm_op_ctx_prealloc_vmas(struct panthor_vm_op_ctx *op_ctx)
> (DRM_PANTHOR_VM_BIND_OP_MAP_READONLY | \
> DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC | \
> DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED | \
> + DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT | \
> DRM_PANTHOR_VM_BIND_OP_TYPE_MASK)
>
> static int panthor_vm_prepare_map_op_ctx(struct panthor_vm_op_ctx *op_ctx,
> struct panthor_vm *vm,
> struct panthor_gem_object *bo,
> u64 offset,
> + u32 repeat_range,
> u64 size, u64 va,
> u32 flags)
> {
> @@ -1224,9 +1249,22 @@ static int panthor_vm_prepare_map_op_ctx(struct panthor_vm_op_ctx *op_ctx,
> (flags & DRM_PANTHOR_VM_BIND_OP_TYPE_MASK) != DRM_PANTHOR_VM_BIND_OP_TYPE_MAP)
> return -EINVAL;
>
> - /* Make sure the VA and size are aligned and in-bounds. */
> - if (size > bo->base.base.size || offset > bo->base.base.size - size)
> - return -EINVAL;
> + if (!(flags & DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT)) {
> + /* Make sure the VA and size are in-bounds. */
> + if (size > bo->base.base.size || offset > bo->base.base.size - size)
> + return -EINVAL;
> + } else {
> + /* Make sure the repeat_range is in-bounds. */
> + if (repeat_range > bo->base.base.size || offset > bo->base.base.size - repeat_range)
> + return -EINVAL;
> +
> + /* Make sure size is a multiple of repeat_range */
> +
> + u64 repeat_count = size;
> +
> + if (do_div(repeat_count, repeat_range))
> + return -EINVAL;
> + }
>
> /* If the BO has an exclusive VM attached, it can't be mapped to other VMs. */
> if (bo->exclusive_vm_root_gem &&
> @@ -1295,6 +1333,7 @@ static int panthor_vm_prepare_map_op_ctx(struct panthor_vm_op_ctx *op_ctx,
> drm_gem_shmem_unpin(&bo->base);
>
> op_ctx->map.bo_offset = offset;
> + op_ctx->map.bo_repeat_range = repeat_range;
>
> /* L1, L2 and L3 page tables.
> * We could optimize L3 allocation by iterating over the sgt and merging
> @@ -2112,9 +2151,22 @@ static int panthor_gpuva_sm_step_map(struct drm_gpuva_op *op, void *priv)
>
> panthor_vma_init(vma, op_ctx->flags & PANTHOR_VM_MAP_FLAGS);
>
> - ret = panthor_vm_map_pages(vm, op->map.va.addr, flags_to_prot(vma->flags),
> - op_ctx->map.sgt, op->map.gem.offset,
> - op->map.va.range);
> + if (op_ctx->flags & DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT) {
> + u64 repeat_count = op->map.va.range;
> +
> + do_div(repeat_count, op->map.gem.range);
> + ret = panthor_vm_repeated_map_pages(vm, op->map.va.addr,
> + flags_to_prot(vma->flags),
> + op_ctx->map.sgt,
> + op->map.gem.offset,
> + op->map.gem.range,
> + repeat_count);
> + } else {
> + ret = panthor_vm_map_pages(vm, op->map.va.addr,
> + flags_to_prot(vma->flags),
> + op_ctx->map.sgt, op->map.gem.offset,
> + op->map.va.range);
> + }
> if (ret)
> return ret;
>
> @@ -2237,7 +2289,7 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct panthor_vm_op_ctx *op,
>
> switch (op_type) {
> case DRM_PANTHOR_VM_BIND_OP_TYPE_MAP: {
> - const struct drm_gpuvm_map_req map_req = {
> + struct drm_gpuvm_map_req map_req = {
> .va.addr = op->va.addr,
> .va.range = op->va.range,
> .gem.obj = op->map.vm_bo->obj,
> @@ -2249,6 +2301,11 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct panthor_vm_op_ctx *op,
> break;
> }
>
> + if (op->flags & DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT) {
> + map_req.flags |= DRM_GPUVA_REPEAT;
> + map_req.gem.range = op->map.bo_repeat_range;
> + }
> +
> ret = drm_gpuvm_sm_map(&vm->base, vm, &map_req);
> break;
> }
> @@ -2497,6 +2554,7 @@ panthor_vm_bind_prepare_op_ctx(struct drm_file *file,
> ret = panthor_vm_prepare_map_op_ctx(op_ctx, vm,
> gem ? to_panthor_bo(gem) : NULL,
> op->bo_offset,
> + op->bo_repeat_range,
> op->size,
> op->va,
> op->flags);
> @@ -2698,7 +2756,11 @@ int panthor_vm_map_bo_range(struct panthor_vm *vm, struct panthor_gem_object *bo
> struct panthor_vm_op_ctx op_ctx;
> int ret;
>
> - ret = panthor_vm_prepare_map_op_ctx(&op_ctx, vm, bo, offset, size, va, flags);
> + /* TODO: would be nice to replace with assert instead */
If you don't expect this to happen then this can be a "if (WARN_ON(...))".
> + if (flags & DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT)
> + return -EINVAL;
> +
> + ret = panthor_vm_prepare_map_op_ctx(&op_ctx, vm, bo, offset, 0, size, va, flags);
> if (ret)
> return ret;
>
> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> index e1f43deb7eca..ad278bc234b0 100644
> --- a/include/uapi/drm/panthor_drm.h
> +++ b/include/uapi/drm/panthor_drm.h
> @@ -496,6 +496,17 @@ enum drm_panthor_vm_bind_op_flags {
> */
> DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED = 1 << 2,
>
> + /**
> + * @DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT: Repeat a BO range
> + *
> + * Only valid with DRM_PANTHOR_VM_BIND_OP_TYPE_MAP.
> + *
> + * When this is set, a BO range is repeated over the VA range.
> + * drm_panthor_vm_bind_op::bo_repeat_range defines the size of the
> + * BO range to repeat.
> + */
> + DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT = 1 << 3,
> +
> /**
> * @DRM_PANTHOR_VM_BIND_OP_TYPE_MASK: Mask used to determine the type of operation.
> */
> @@ -560,6 +571,18 @@ struct drm_panthor_vm_bind_op {
> */
> struct drm_panthor_obj_array syncs;
>
> + /**
> + * @bo_repeat_range: The size of the range to be repeated.
> + *
> + * Must be zero if DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT is not set in
> + * flags.
> + *
> + * Size must be a multiple of bo_repeat_range.
> + */
> + __u32 bo_repeat_range;
> +
> + /** @pad: Padding field. MBZ. */
> + __u32 pad;
If we're going to have the padding then the kernel needs to check that
this padding is zero, so that it can be available for future use.
Steve
> };
>
> /**
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 1/7] drm/panthor: Add support for atomic page table updates
2025-07-11 13:30 ` Steven Price
@ 2025-07-15 15:08 ` Caterina Shablia
2025-07-15 15:33 ` Caterina Shablia
2025-07-15 16:09 ` Caterina Shablia
2025-08-21 11:36 ` Boris Brezillon
1 sibling, 2 replies; 36+ messages in thread
From: Caterina Shablia @ 2025-07-15 15:08 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Danilo Krummrich, Boris Brezillon, Liviu Dudau, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Steven Price
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
El viernes, 11 de julio de 2025 15:30:21 (hora de verano de Europa central),
Steven Price escribió:
> On 07/07/2025 18:04, Caterina Shablia wrote:
> > From: Boris Brezillon <boris.brezillon@collabora.com>
> >
> > Move the lock/flush_mem operations around the gpuvm_sm_map() calls so
> > we can implement true atomic page updates, where any access in the
> > locked range done by the GPU has to wait for the page table updates
> > to land before proceeding.
> >
> > This is needed for vkQueueBindSparse(), so we can replace the dummy
> > page mapped over the entire object by actual BO backed pages in an atomic
> > way.
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
> > ---
> >
> > drivers/gpu/drm/panthor/panthor_mmu.c | 65 +++++++++++++++++++++++++--
> > 1 file changed, 62 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c
> > b/drivers/gpu/drm/panthor/panthor_mmu.c index b39ea6acc6a9..9caaba03c5eb
> > 100644
> > --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> > +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> > @@ -387,6 +387,15 @@ struct panthor_vm {
> >
> > * flagged as faulty as a result.
> > */
> >
> > bool unhandled_fault;
> >
> > +
> > + /** @locked_region: Information about the currently locked region
> > currently. */ + struct {
> > + /** @locked_region.start: Start of the locked region.
*/
> > + u64 start;
> > +
> > + /** @locked_region.size: Size of the locked region. */
> > + u64 size;
> > + } locked_region;
> >
> > };
> >
> > /**
> >
> > @@ -775,6 +784,10 @@ int panthor_vm_active(struct panthor_vm *vm)
> >
> > }
> >
> > ret = panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab,
transcfg,
> > vm->memattr);>
> > + if (!ret && vm->locked_region.size) {
> > + lock_region(ptdev, vm->as.id, vm->locked_region.start,
> > vm->locked_region.size); + ret = wait_ready(ptdev, vm-
>as.id);
> > + }
>
> Do we need to do this? It seems odd to restore a MMU context and
> immediately set a lock region. Is there a good reason we can't just
> WARN_ON if there's a lock region set in panthor_vm_idle()?
So IIUC, when things are otherwise idle and we do a vm_bind, the vm will be
inactive, in which case we're not going to poke the mmu to inform it of the
locked region, because it literally is not aware of this vm. Now if in the
meanwhile something submits a job and activates the vm, we need to inform the
mmu of the locked region, as vm_bind job might still be going on. I don't see
why panthor_vm_idle while a region is locked would be a problem? That can
arise e.g. when a job completes but there's a concurrent vm_bind going on?
>
> I think we need to briefly take vm->op_lock to ensure synchronisation
> but that doesn't seem a big issue. Or perhaps there's a good reason that
> I'm missing?
I think you're right, all other accesses to locked_region are guarded by
op_lock. GPU job submit poke vm_active concurrently with vm_bind jobs doing
region {,un}locks.
>
> > out_make_active:
> > if (!ret) {
> >
> > @@ -902,6 +915,9 @@ static int panthor_vm_unmap_pages(struct panthor_vm
> > *vm, u64 iova, u64 size)>
> > struct io_pgtable_ops *ops = vm->pgtbl_ops;
> > u64 offset = 0;
> >
> > + drm_WARN_ON(&ptdev->base,
> > + (iova < vm->locked_region.start) ||
> > + (iova + size > vm->locked_region.start + vm-
>locked_region.size));
> >
> > drm_dbg(&ptdev->base, "unmap: as=%d, iova=%llx, len=%llx", vm-
>as.id,
> > iova, size);
> >
> > while (offset < size) {
> >
> > @@ -915,13 +931,12 @@ static int panthor_vm_unmap_pages(struct panthor_vm
> > *vm, u64 iova, u64 size)>
> > iova + offset + unmapped_sz,
> > iova + offset + pgsize * pgcount,
> > iova, iova + size);
> >
> > - panthor_vm_flush_range(vm, iova, offset +
unmapped_sz);
> >
> > return -EINVAL;
> >
> > }
> > offset += unmapped_sz;
> >
> > }
> >
> > - return panthor_vm_flush_range(vm, iova, size);
> > + return 0;
> >
> > }
> >
> > static int
> >
> > @@ -938,6 +953,10 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64 iova,
> > int prot,>
> > if (!size)
> >
> > return 0;
> >
> > + drm_WARN_ON(&ptdev->base,
> > + (iova < vm->locked_region.start) ||
> > + (iova + size > vm->locked_region.start + vm-
>locked_region.size));
> > +
> >
> > for_each_sgtable_dma_sg(sgt, sgl, count) {
> >
> > dma_addr_t paddr = sg_dma_address(sgl);
> > size_t len = sg_dma_len(sgl);
> >
> > @@ -985,7 +1004,7 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64 iova,
> > int prot,>
> > offset = 0;
> >
> > }
> >
> > - return panthor_vm_flush_range(vm, start_iova, iova - start_iova);
> > + return 0;
> >
> > }
> >
> > static int flags_to_prot(u32 flags)
> >
> > @@ -1654,6 +1673,38 @@ static const char *access_type_name(struct
> > panthor_device *ptdev,>
> > }
> >
> > }
> >
> > +static int panthor_vm_lock_region(struct panthor_vm *vm, u64 start, u64
> > size) +{
> > + struct panthor_device *ptdev = vm->ptdev;
> > + int ret = 0;
> > +
> > + mutex_lock(&ptdev->mmu->as.slots_lock);
> > + drm_WARN_ON(&ptdev->base, vm->locked_region.start ||
> > vm->locked_region.size); + vm->locked_region.start = start;
> > + vm->locked_region.size = size;
> > + if (vm->as.id >= 0) {
> > + lock_region(ptdev, vm->as.id, start, size);
> > + ret = wait_ready(ptdev, vm->as.id);
> > + }
> > + mutex_unlock(&ptdev->mmu->as.slots_lock);
> > +
> > + return ret;
> > +}
> > +
> > +static void panthor_vm_unlock_region(struct panthor_vm *vm)
> > +{
> > + struct panthor_device *ptdev = vm->ptdev;
> > +
> > + mutex_lock(&ptdev->mmu->as.slots_lock);
> > + if (vm->as.id >= 0) {
> > + write_cmd(ptdev, vm->as.id, AS_COMMAND_FLUSH_MEM);
> > + drm_WARN_ON(&ptdev->base, wait_ready(ptdev, vm-
>as.id));
> > + }
> > + vm->locked_region.start = 0;
> > + vm->locked_region.size = 0;
> > + mutex_unlock(&ptdev->mmu->as.slots_lock);
> > +}
>
> Do we need to include a drm_dev_enter() somewhere here? I note that
> panthor_vm_flush_range() has one and you've effectively replaced that
> code with the above.
Looks like we should.
>
> Thanks,
> Steve
>
> > +
> >
> > static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32
> > status) {
> >
> > bool has_unhandled_faults = false;
> >
> > @@ -2179,6 +2230,11 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct
> > panthor_vm_op_ctx *op,>
> > mutex_lock(&vm->op_lock);
> > vm->op_ctx = op;
> >
> > +
> > + ret = panthor_vm_lock_region(vm, op->va.addr, op->va.range);
> > + if (ret)
> > + goto out;
> > +
> >
> > switch (op_type) {
> >
> > case DRM_PANTHOR_VM_BIND_OP_TYPE_MAP:
> > if (vm->unusable) {
> >
> > @@ -2199,6 +2255,9 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct
> > panthor_vm_op_ctx *op,>
> > break;
> >
> > }
> >
> > + panthor_vm_unlock_region(vm);
> > +
> >
> > +out:
> > if (ret && flag_vm_unusable_on_failure)
> >
> > vm->unusable = true;
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 7/7] drm/panthor: Add support for repeated mappings
2025-07-11 14:03 ` Steven Price
@ 2025-07-15 15:17 ` Caterina Shablia
2025-07-16 15:59 ` Steven Price
0 siblings, 1 reply; 36+ messages in thread
From: Caterina Shablia @ 2025-07-15 15:17 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Danilo Krummrich, Boris Brezillon, Liviu Dudau, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Steven Price
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
El viernes, 11 de julio de 2025 16:03:26 (hora de verano de Europa central),
Steven Price escribió:
> On 07/07/2025 18:04, Caterina Shablia wrote:
> > From: Boris Brezillon <boris.brezillon@collabora.com>
> >
> > This allows us to optimize mapping of a relatively small
> > portion of a BO over and over in a large VA range, which
> > is useful to support Vulkan sparse bindings in an efficient
> > way.
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > Co-developed-by: Caterina Shablia <caterina.shablia@collabora.com>
> > Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
>
> This looks like the right sort of shape. From an uAPI perspective I'm
> not sure whether u32 is the right type for bo_repeat_range. While I
> can't immediately see a use for having a large repeat range it seems a
> little silly to close it off when we're going to have padding afterwards
> anyway. Obviously the kernel would reject large values because the
> internal APIs are only u32. But it would enable this to be fixed if we
> ever discover a usecase without a uAPI change.
>
> > ---
> >
> > drivers/gpu/drm/panthor/panthor_drv.c | 3 +-
> > drivers/gpu/drm/panthor/panthor_mmu.c | 78 ++++++++++++++++++++++++---
> > include/uapi/drm/panthor_drm.h | 23 ++++++++
> > 3 files changed, 95 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_drv.c
> > b/drivers/gpu/drm/panthor/panthor_drv.c index 1116f2d2826e..585c07b07c42
> > 100644
> > --- a/drivers/gpu/drm/panthor/panthor_drv.c
> > +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> > @@ -1608,6 +1608,7 @@ static void panthor_debugfs_init(struct drm_minor
> > *minor)>
> > * - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag
> > * - 1.4 - adds DRM_IOCTL_PANTHOR_BO_SET_LABEL ioctl
> > * - 1.5 - adds DRM_PANTHOR_SET_USER_MMIO_OFFSET ioctl
> >
> > + * - 1.6 - adds DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT flag
> >
> > */
> >
> > static const struct drm_driver panthor_drm_driver = {
> >
> > .driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ |
> >
> > @@ -1621,7 +1622,7 @@ static const struct drm_driver panthor_drm_driver =
> > {
> >
> > .name = "panthor",
> > .desc = "Panthor DRM driver",
> > .major = 1,
> >
> > - .minor = 5,
> > + .minor = 6,
> >
> > .gem_create_object = panthor_gem_create_object,
> > .gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table,
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c
> > b/drivers/gpu/drm/panthor/panthor_mmu.c index f0a22b775958..4ce9fff67d69
> > 100644
> > --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> > +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> > @@ -202,6 +202,9 @@ struct panthor_vm_op_ctx {
> >
> > /** @map.bo_offset: Offset in the buffer object. */
> > u64 bo_offset;
> >
> > + /** @map.bo_repeat_range: Repeated BO range. */
> > + u32 bo_repeat_range;
> > +
> >
> > /**
> >
> > * @map.sgt: sg-table pointing to pages backing the GEM
object.
> > *
> >
> > @@ -1007,6 +1010,26 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64
> > iova, int prot,>
> > return 0;
> >
> > }
> >
> > +static int
> > +panthor_vm_repeated_map_pages(struct panthor_vm *vm, u64 iova, int prot,
> > + struct sg_table *sgt, u64 offset, u64
size,
> > + u64 count)
> > +{
> > + /* FIXME: we really need to optimize this at the io_pgtable level.
*/
>
> Do you have plans for optimizing this?
I personally don't have any plans, no, but maybe Boris does. I'll forward this
question to him once he's back from his vacation.
> How bad is the performance
> without optimizing?
It's better than the alternative of poking vm_bind with a morbillion
drm_panthor_vm_bind_ops. More seriously, I don't really have any workloads
beside VK CTS to measure, for now. There's some stuff we should try to do in
panvk first, like using a 2M dummy_page and doing some gymnastics when mapping
it so we get huge mappings, which will hopefully lessen the pressure on this
loop.
>
> > + for (u64 i = 0; i < count; i++) {
> > + int ret;
> > +
> > + ret = panthor_vm_map_pages(vm, iova + (size * i), prot,
> > + sgt, offset, size);
> > + if (ret) {
> > + panthor_vm_unmap_pages(vm, iova, size * (i -
1));
> > + return ret;
> > + }
> > + }
> > +
> > + return 0;
> > +}
> > +
> >
> > static int flags_to_prot(u32 flags)
> > {
> >
> > int prot = 0;
> >
> > @@ -1203,12 +1226,14 @@ panthor_vm_op_ctx_prealloc_vmas(struct
> > panthor_vm_op_ctx *op_ctx)>
> > (DRM_PANTHOR_VM_BIND_OP_MAP_READONLY | \
> >
> > DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC | \
> > DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED | \
> >
> > + DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT | \
> >
> > DRM_PANTHOR_VM_BIND_OP_TYPE_MASK)
> >
> > static int panthor_vm_prepare_map_op_ctx(struct panthor_vm_op_ctx
> > *op_ctx,
> >
> > struct panthor_vm
*vm,
> > struct
panthor_gem_object *bo,
> > u64 offset,
> >
> > + u32 repeat_range,
> >
> > u64 size, u64 va,
> > u32 flags)
> >
> > {
> >
> > @@ -1224,9 +1249,22 @@ static int panthor_vm_prepare_map_op_ctx(struct
> > panthor_vm_op_ctx *op_ctx,>
> > (flags & DRM_PANTHOR_VM_BIND_OP_TYPE_MASK) !=
> > DRM_PANTHOR_VM_BIND_OP_TYPE_MAP)>
> > return -EINVAL;
> >
> > - /* Make sure the VA and size are aligned and in-bounds. */
> > - if (size > bo->base.base.size || offset > bo->base.base.size -
size)
> > - return -EINVAL;
> > + if (!(flags & DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT)) {
> > + /* Make sure the VA and size are in-bounds. */
> > + if (size > bo->base.base.size || offset > bo-
>base.base.size - size)
> > + return -EINVAL;
> > + } else {
> > + /* Make sure the repeat_range is in-bounds. */
> > + if (repeat_range > bo->base.base.size || offset > bo-
>base.base.size -
> > repeat_range) + return -EINVAL;
> > +
> > + /* Make sure size is a multiple of repeat_range */
> > +
> > + u64 repeat_count = size;
> > +
> > + if (do_div(repeat_count, repeat_range))
> > + return -EINVAL;
> > + }
> >
> > /* If the BO has an exclusive VM attached, it can't be mapped to
other
> > VMs. */ if (bo->exclusive_vm_root_gem &&
> >
> > @@ -1295,6 +1333,7 @@ static int panthor_vm_prepare_map_op_ctx(struct
> > panthor_vm_op_ctx *op_ctx,>
> > drm_gem_shmem_unpin(&bo->base);
> >
> > op_ctx->map.bo_offset = offset;
> >
> > + op_ctx->map.bo_repeat_range = repeat_range;
> >
> > /* L1, L2 and L3 page tables.
> >
> > * We could optimize L3 allocation by iterating over the sgt and
merging
> >
> > @@ -2112,9 +2151,22 @@ static int panthor_gpuva_sm_step_map(struct
> > drm_gpuva_op *op, void *priv)>
> > panthor_vma_init(vma, op_ctx->flags & PANTHOR_VM_MAP_FLAGS);
> >
> > - ret = panthor_vm_map_pages(vm, op->map.va.addr,
> > flags_to_prot(vma->flags), -
op_ctx->map.sgt, op->map.gem.offset,
> > - op->map.va.range);
> > + if (op_ctx->flags & DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT) {
> > + u64 repeat_count = op->map.va.range;
> > +
> > + do_div(repeat_count, op->map.gem.range);
> > + ret = panthor_vm_repeated_map_pages(vm, op-
>map.va.addr,
> > +
flags_to_prot(vma->flags),
> > + op_ctx-
>map.sgt,
> > + op-
>map.gem.offset,
> > + op-
>map.gem.range,
> > +
repeat_count);
> > + } else {
> > + ret = panthor_vm_map_pages(vm, op->map.va.addr,
> > + flags_to_prot(vma-
>flags),
> > + op_ctx->map.sgt,
op->map.gem.offset,
> > + op->map.va.range);
> > + }
> >
> > if (ret)
> >
> > return ret;
> >
> > @@ -2237,7 +2289,7 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct
> > panthor_vm_op_ctx *op,>
> > switch (op_type) {
> > case DRM_PANTHOR_VM_BIND_OP_TYPE_MAP: {
> >
> > - const struct drm_gpuvm_map_req map_req = {
> > + struct drm_gpuvm_map_req map_req = {
> >
> > .va.addr = op->va.addr,
> > .va.range = op->va.range,
> > .gem.obj = op->map.vm_bo->obj,
> >
> > @@ -2249,6 +2301,11 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct
> > panthor_vm_op_ctx *op,>
> > break;
> >
> > }
> >
> > + if (op->flags & DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT) {
> > + map_req.flags |= DRM_GPUVA_REPEAT;
> > + map_req.gem.range = op->map.bo_repeat_range;
> > + }
> > +
> >
> > ret = drm_gpuvm_sm_map(&vm->base, vm, &map_req);
> > break;
> >
> > }
> >
> > @@ -2497,6 +2554,7 @@ panthor_vm_bind_prepare_op_ctx(struct drm_file
> > *file,
> >
> > ret = panthor_vm_prepare_map_op_ctx(op_ctx, vm,
> >
> > gem ?
to_panthor_bo(gem) : NULL,
> > op-
>bo_offset,
> >
> > + op-
>bo_repeat_range,
> >
> > op-
>size,
> > op->va,
> > op-
>flags);
> >
> > @@ -2698,7 +2756,11 @@ int panthor_vm_map_bo_range(struct panthor_vm *vm,
> > struct panthor_gem_object *bo>
> > struct panthor_vm_op_ctx op_ctx;
> > int ret;
> >
> > - ret = panthor_vm_prepare_map_op_ctx(&op_ctx, vm, bo, offset, size,
va,
> > flags); + /* TODO: would be nice to replace with assert instead */
>
> If you don't expect this to happen then this can be a "if (WARN_ON(...))".
>
> > + if (flags & DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT)
> > + return -EINVAL;
> > +
> > + ret = panthor_vm_prepare_map_op_ctx(&op_ctx, vm, bo, offset, 0,
size,
> > va, flags);>
> > if (ret)
> >
> > return ret;
> >
> > diff --git a/include/uapi/drm/panthor_drm.h
> > b/include/uapi/drm/panthor_drm.h index e1f43deb7eca..ad278bc234b0 100644
> > --- a/include/uapi/drm/panthor_drm.h
> > +++ b/include/uapi/drm/panthor_drm.h
> > @@ -496,6 +496,17 @@ enum drm_panthor_vm_bind_op_flags {
> >
> > */
> >
> > DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED = 1 << 2,
> >
> > + /**
> > + * @DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT: Repeat a BO range
> > + *
> > + * Only valid with DRM_PANTHOR_VM_BIND_OP_TYPE_MAP.
> > + *
> > + * When this is set, a BO range is repeated over the VA range.
> > + * drm_panthor_vm_bind_op::bo_repeat_range defines the size of the
> > + * BO range to repeat.
> > + */
> > + DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT = 1 << 3,
> > +
> >
> > /**
> >
> > * @DRM_PANTHOR_VM_BIND_OP_TYPE_MASK: Mask used to determine the
type of
> > operation. */
> >
> > @@ -560,6 +571,18 @@ struct drm_panthor_vm_bind_op {
> >
> > */
> >
> > struct drm_panthor_obj_array syncs;
> >
> > + /**
> > + * @bo_repeat_range: The size of the range to be repeated.
> > + *
> > + * Must be zero if DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT is not set in
> > + * flags.
> > + *
> > + * Size must be a multiple of bo_repeat_range.
> > + */
> > + __u32 bo_repeat_range;
> > +
> > + /** @pad: Padding field. MBZ. */
> > + __u32 pad;
>
> If we're going to have the padding then the kernel needs to check that
> this padding is zero, so that it can be available for future use.
I decided to go with your suggestion to change bo_repeat_range to be an __u64,
but rejecting vm_binds with values above 2^32-1 for now.
>
> Steve
>
> > };
> >
> > /**
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 1/7] drm/panthor: Add support for atomic page table updates
2025-07-15 15:08 ` Caterina Shablia
@ 2025-07-15 15:33 ` Caterina Shablia
2025-07-16 15:43 ` Steven Price
2025-07-15 16:09 ` Caterina Shablia
1 sibling, 1 reply; 36+ messages in thread
From: Caterina Shablia @ 2025-07-15 15:33 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Danilo Krummrich, Boris Brezillon, Liviu Dudau, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Steven Price
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
El martes, 15 de julio de 2025 17:08:09 (hora de verano de Europa central),
Caterina Shablia escribió:
> El viernes, 11 de julio de 2025 15:30:21 (hora de verano de Europa central),
> Steven Price escribió:
> > On 07/07/2025 18:04, Caterina Shablia wrote:
> > > From: Boris Brezillon <boris.brezillon@collabora.com>
> > >
> > > Move the lock/flush_mem operations around the gpuvm_sm_map() calls so
> > > we can implement true atomic page updates, where any access in the
> > > locked range done by the GPU has to wait for the page table updates
> > > to land before proceeding.
> > >
> > > This is needed for vkQueueBindSparse(), so we can replace the dummy
> > > page mapped over the entire object by actual BO backed pages in an
> > > atomic
> > > way.
> > >
> > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
> > > ---
> > >
> > > drivers/gpu/drm/panthor/panthor_mmu.c | 65 +++++++++++++++++++++++++--
> > > 1 file changed, 62 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c
> > > b/drivers/gpu/drm/panthor/panthor_mmu.c index b39ea6acc6a9..9caaba03c5eb
> > > 100644
> > > --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> > > +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> > > @@ -387,6 +387,15 @@ struct panthor_vm {
> > >
> > > * flagged as faulty as a result.
> > > */
> > >
> > > bool unhandled_fault;
> > >
> > > +
> > > + /** @locked_region: Information about the currently locked region
> > > currently. */ + struct {
> > > + /** @locked_region.start: Start of the locked region.
>
> */
>
> > > + u64 start;
> > > +
> > > + /** @locked_region.size: Size of the locked region. */
> > > + u64 size;
> > > + } locked_region;
> > >
> > > };
> > >
> > > /**
> > >
> > > @@ -775,6 +784,10 @@ int panthor_vm_active(struct panthor_vm *vm)
> > >
> > > }
> > >
> > > ret = panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab,
>
> transcfg,
>
> > > vm->memattr);>
> > >
> > > + if (!ret && vm->locked_region.size) {
> > > + lock_region(ptdev, vm->as.id, vm->locked_region.start,
> > > vm->locked_region.size); + ret = wait_ready(ptdev, vm-
> >
> >as.id);
> >
> > > + }
> >
> > Do we need to do this? It seems odd to restore a MMU context and
> > immediately set a lock region. Is there a good reason we can't just
> > WARN_ON if there's a lock region set in panthor_vm_idle()?
>
> So IIUC, when things are otherwise idle and we do a vm_bind, the vm will be
> inactive, in which case we're not going to poke the mmu to inform it of the
> locked region, because it literally is not aware of this vm. Now if in the
> meanwhile something submits a job and activates the vm, we need to inform
> the mmu of the locked region, as vm_bind job might still be going on. I
> don't see why panthor_vm_idle while a region is locked would be a problem?
> That can arise e.g. when a job completes but there's a concurrent vm_bind
> going on?
> > I think we need to briefly take vm->op_lock to ensure synchronisation
> > but that doesn't seem a big issue. Or perhaps there's a good reason that
> > I'm missing?
>
> I think you're right, all other accesses to locked_region are guarded by
> op_lock. GPU job submit poke vm_active concurrently with vm_bind jobs doing
> region {,un}locks.
Actually no, that's not necessary. Access to locked_region is protected by
slots_lock, which is held here. Trying to lock vm->op_lock would also be
detrimental here, because these locks are often taken together and slots_lock
is taken after op_lock is taken, so taking op_lock here would be extremely
deadlockful.
>
> > > out_make_active:
> > > if (!ret) {
> > >
> > > @@ -902,6 +915,9 @@ static int panthor_vm_unmap_pages(struct panthor_vm
> > > *vm, u64 iova, u64 size)>
> > >
> > > struct io_pgtable_ops *ops = vm->pgtbl_ops;
> > > u64 offset = 0;
> > >
> > > + drm_WARN_ON(&ptdev->base,
> > > + (iova < vm->locked_region.start) ||
> > > + (iova + size > vm->locked_region.start + vm-
> >
> >locked_region.size));
> >
> > > drm_dbg(&ptdev->base, "unmap: as=%d, iova=%llx, len=%llx", vm-
> >
> >as.id,
> >
> > > iova, size);
> > >
> > > while (offset < size) {
> > >
> > > @@ -915,13 +931,12 @@ static int panthor_vm_unmap_pages(struct
> > > panthor_vm
> > > *vm, u64 iova, u64 size)>
> > >
> > > iova + offset + unmapped_sz,
> > > iova + offset + pgsize * pgcount,
> > > iova, iova + size);
> > >
> > > - panthor_vm_flush_range(vm, iova, offset +
>
> unmapped_sz);
>
> > > return -EINVAL;
> > >
> > > }
> > > offset += unmapped_sz;
> > >
> > > }
> > >
> > > - return panthor_vm_flush_range(vm, iova, size);
> > > + return 0;
> > >
> > > }
> > >
> > > static int
> > >
> > > @@ -938,6 +953,10 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64
> > > iova,
> > > int prot,>
> > >
> > > if (!size)
> > >
> > > return 0;
> > >
> > > + drm_WARN_ON(&ptdev->base,
> > > + (iova < vm->locked_region.start) ||
> > > + (iova + size > vm->locked_region.start + vm-
> >
> >locked_region.size));
> >
> > > +
> > >
> > > for_each_sgtable_dma_sg(sgt, sgl, count) {
> > >
> > > dma_addr_t paddr = sg_dma_address(sgl);
> > > size_t len = sg_dma_len(sgl);
> > >
> > > @@ -985,7 +1004,7 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64
> > > iova,
> > > int prot,>
> > >
> > > offset = 0;
> > >
> > > }
> > >
> > > - return panthor_vm_flush_range(vm, start_iova, iova - start_iova);
> > > + return 0;
> > >
> > > }
> > >
> > > static int flags_to_prot(u32 flags)
> > >
> > > @@ -1654,6 +1673,38 @@ static const char *access_type_name(struct
> > > panthor_device *ptdev,>
> > >
> > > }
> > >
> > > }
> > >
> > > +static int panthor_vm_lock_region(struct panthor_vm *vm, u64 start, u64
> > > size) +{
> > > + struct panthor_device *ptdev = vm->ptdev;
> > > + int ret = 0;
> > > +
> > > + mutex_lock(&ptdev->mmu->as.slots_lock);
> > > + drm_WARN_ON(&ptdev->base, vm->locked_region.start ||
> > > vm->locked_region.size); + vm->locked_region.start = start;
> > > + vm->locked_region.size = size;
> > > + if (vm->as.id >= 0) {
> > > + lock_region(ptdev, vm->as.id, start, size);
> > > + ret = wait_ready(ptdev, vm->as.id);
> > > + }
> > > + mutex_unlock(&ptdev->mmu->as.slots_lock);
> > > +
> > > + return ret;
> > > +}
> > > +
> > > +static void panthor_vm_unlock_region(struct panthor_vm *vm)
> > > +{
> > > + struct panthor_device *ptdev = vm->ptdev;
> > > +
> > > + mutex_lock(&ptdev->mmu->as.slots_lock);
> > > + if (vm->as.id >= 0) {
> > > + write_cmd(ptdev, vm->as.id, AS_COMMAND_FLUSH_MEM);
> > > + drm_WARN_ON(&ptdev->base, wait_ready(ptdev, vm-
> >
> >as.id));
> >
> > > + }
> > > + vm->locked_region.start = 0;
> > > + vm->locked_region.size = 0;
> > > + mutex_unlock(&ptdev->mmu->as.slots_lock);
> > > +}
> >
> > Do we need to include a drm_dev_enter() somewhere here? I note that
> > panthor_vm_flush_range() has one and you've effectively replaced that
> > code with the above.
>
> Looks like we should.
>
> > Thanks,
> > Steve
> >
> > > +
> > >
> > > static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32
> > > status) {
> > >
> > > bool has_unhandled_faults = false;
> > >
> > > @@ -2179,6 +2230,11 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct
> > > panthor_vm_op_ctx *op,>
> > >
> > > mutex_lock(&vm->op_lock);
> > > vm->op_ctx = op;
> > >
> > > +
> > > + ret = panthor_vm_lock_region(vm, op->va.addr, op->va.range);
> > > + if (ret)
> > > + goto out;
> > > +
> > >
> > > switch (op_type) {
> > >
> > > case DRM_PANTHOR_VM_BIND_OP_TYPE_MAP:
> > > if (vm->unusable) {
> > >
> > > @@ -2199,6 +2255,9 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct
> > > panthor_vm_op_ctx *op,>
> > >
> > > break;
> > >
> > > }
> > >
> > > + panthor_vm_unlock_region(vm);
> > > +
> > >
> > > +out:
> > > if (ret && flag_vm_unusable_on_failure)
> > >
> > > vm->unusable = true;
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 1/7] drm/panthor: Add support for atomic page table updates
2025-07-15 15:08 ` Caterina Shablia
2025-07-15 15:33 ` Caterina Shablia
@ 2025-07-15 16:09 ` Caterina Shablia
2025-07-16 15:53 ` Steven Price
1 sibling, 1 reply; 36+ messages in thread
From: Caterina Shablia @ 2025-07-15 16:09 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Danilo Krummrich, Boris Brezillon, Liviu Dudau, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi, Steven Price
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
El martes, 15 de julio de 2025 17:08:09 (hora de verano de Europa central),
Caterina Shablia escribió:
> El viernes, 11 de julio de 2025 15:30:21 (hora de verano de Europa central),
> Steven Price escribió:
> > On 07/07/2025 18:04, Caterina Shablia wrote:
> > > From: Boris Brezillon <boris.brezillon@collabora.com>
> > >
> > > Move the lock/flush_mem operations around the gpuvm_sm_map() calls so
> > > we can implement true atomic page updates, where any access in the
> > > locked range done by the GPU has to wait for the page table updates
> > > to land before proceeding.
> > >
> > > This is needed for vkQueueBindSparse(), so we can replace the dummy
> > > page mapped over the entire object by actual BO backed pages in an
> > > atomic
> > > way.
> > >
> > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
> > > ---
> > >
> > > drivers/gpu/drm/panthor/panthor_mmu.c | 65 +++++++++++++++++++++++++--
> > > 1 file changed, 62 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c
> > > b/drivers/gpu/drm/panthor/panthor_mmu.c index b39ea6acc6a9..9caaba03c5eb
> > > 100644
> > > --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> > > +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> > > @@ -387,6 +387,15 @@ struct panthor_vm {
> > >
> > > * flagged as faulty as a result.
> > > */
> > >
> > > bool unhandled_fault;
> > >
> > > +
> > > + /** @locked_region: Information about the currently locked region
> > > currently. */ + struct {
> > > + /** @locked_region.start: Start of the locked region.
>
> */
>
> > > + u64 start;
> > > +
> > > + /** @locked_region.size: Size of the locked region. */
> > > + u64 size;
> > > + } locked_region;
> > >
> > > };
> > >
> > > /**
> > >
> > > @@ -775,6 +784,10 @@ int panthor_vm_active(struct panthor_vm *vm)
> > >
> > > }
> > >
> > > ret = panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab,
>
> transcfg,
>
> > > vm->memattr);>
> > >
> > > + if (!ret && vm->locked_region.size) {
> > > + lock_region(ptdev, vm->as.id, vm->locked_region.start,
> > > vm->locked_region.size); + ret = wait_ready(ptdev, vm-
> >
> >as.id);
> >
> > > + }
> >
> > Do we need to do this? It seems odd to restore a MMU context and
> > immediately set a lock region. Is there a good reason we can't just
> > WARN_ON if there's a lock region set in panthor_vm_idle()?
>
> So IIUC, when things are otherwise idle and we do a vm_bind, the vm will be
> inactive, in which case we're not going to poke the mmu to inform it of the
> locked region, because it literally is not aware of this vm. Now if in the
> meanwhile something submits a job and activates the vm, we need to inform
> the mmu of the locked region, as vm_bind job might still be going on. I
> don't see why panthor_vm_idle while a region is locked would be a problem?
> That can arise e.g. when a job completes but there's a concurrent vm_bind
> going on?
> > I think we need to briefly take vm->op_lock to ensure synchronisation
> > but that doesn't seem a big issue. Or perhaps there's a good reason that
> > I'm missing?
>
> I think you're right, all other accesses to locked_region are guarded by
> op_lock. GPU job submit poke vm_active concurrently with vm_bind jobs doing
> region {,un}locks.
>
> > > out_make_active:
> > > if (!ret) {
> > >
> > > @@ -902,6 +915,9 @@ static int panthor_vm_unmap_pages(struct panthor_vm
> > > *vm, u64 iova, u64 size)>
> > >
> > > struct io_pgtable_ops *ops = vm->pgtbl_ops;
> > > u64 offset = 0;
> > >
> > > + drm_WARN_ON(&ptdev->base,
> > > + (iova < vm->locked_region.start) ||
> > > + (iova + size > vm->locked_region.start + vm-
> >
> >locked_region.size));
> >
> > > drm_dbg(&ptdev->base, "unmap: as=%d, iova=%llx, len=%llx", vm-
> >
> >as.id,
> >
> > > iova, size);
> > >
> > > while (offset < size) {
> > >
> > > @@ -915,13 +931,12 @@ static int panthor_vm_unmap_pages(struct
> > > panthor_vm
> > > *vm, u64 iova, u64 size)>
> > >
> > > iova + offset + unmapped_sz,
> > > iova + offset + pgsize * pgcount,
> > > iova, iova + size);
> > >
> > > - panthor_vm_flush_range(vm, iova, offset +
>
> unmapped_sz);
>
> > > return -EINVAL;
> > >
> > > }
> > > offset += unmapped_sz;
> > >
> > > }
> > >
> > > - return panthor_vm_flush_range(vm, iova, size);
> > > + return 0;
> > >
> > > }
> > >
> > > static int
> > >
> > > @@ -938,6 +953,10 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64
> > > iova,
> > > int prot,>
> > >
> > > if (!size)
> > >
> > > return 0;
> > >
> > > + drm_WARN_ON(&ptdev->base,
> > > + (iova < vm->locked_region.start) ||
> > > + (iova + size > vm->locked_region.start + vm-
> >
> >locked_region.size));
> >
> > > +
> > >
> > > for_each_sgtable_dma_sg(sgt, sgl, count) {
> > >
> > > dma_addr_t paddr = sg_dma_address(sgl);
> > > size_t len = sg_dma_len(sgl);
> > >
> > > @@ -985,7 +1004,7 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64
> > > iova,
> > > int prot,>
> > >
> > > offset = 0;
> > >
> > > }
> > >
> > > - return panthor_vm_flush_range(vm, start_iova, iova - start_iova);
> > > + return 0;
> > >
> > > }
> > >
> > > static int flags_to_prot(u32 flags)
> > >
> > > @@ -1654,6 +1673,38 @@ static const char *access_type_name(struct
> > > panthor_device *ptdev,>
> > >
> > > }
> > >
> > > }
> > >
> > > +static int panthor_vm_lock_region(struct panthor_vm *vm, u64 start, u64
> > > size) +{
> > > + struct panthor_device *ptdev = vm->ptdev;
> > > + int ret = 0;
> > > +
> > > + mutex_lock(&ptdev->mmu->as.slots_lock);
> > > + drm_WARN_ON(&ptdev->base, vm->locked_region.start ||
> > > vm->locked_region.size); + vm->locked_region.start = start;
> > > + vm->locked_region.size = size;
> > > + if (vm->as.id >= 0) {
> > > + lock_region(ptdev, vm->as.id, start, size);
> > > + ret = wait_ready(ptdev, vm->as.id);
> > > + }
> > > + mutex_unlock(&ptdev->mmu->as.slots_lock);
> > > +
> > > + return ret;
> > > +}
> > > +
> > > +static void panthor_vm_unlock_region(struct panthor_vm *vm)
> > > +{
> > > + struct panthor_device *ptdev = vm->ptdev;
> > > +
> > > + mutex_lock(&ptdev->mmu->as.slots_lock);
> > > + if (vm->as.id >= 0) {
> > > + write_cmd(ptdev, vm->as.id, AS_COMMAND_FLUSH_MEM);
> > > + drm_WARN_ON(&ptdev->base, wait_ready(ptdev, vm-
> >
> >as.id));
> >
> > > + }
> > > + vm->locked_region.start = 0;
> > > + vm->locked_region.size = 0;
> > > + mutex_unlock(&ptdev->mmu->as.slots_lock);
> > > +}
> >
> > Do we need to include a drm_dev_enter() somewhere here? I note that
> > panthor_vm_flush_range() has one and you've effectively replaced that
> > code with the above.
>
> Looks like we should.
Actually not sure. I think I'm either misunderstanding what drm_dev_enter is,
or there's other things that should be doing it. Notably
panthor_mmu_as_{en,dis}able or their callers aren't doing drm_dev_enter, yet
are poking the hw, so that seems to me like that code also runs the risk of
poking the hw while/after it was unplugged, but I'm not confident in my
understanding at all. I guess an extra drm_dev_enter here or there isn't going
to harm anyone as it's recurrent so I'll put one around the call to
lock_region in panthor_vm_lock_region as well.
>
> > Thanks,
> > Steve
> >
> > > +
> > >
> > > static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32
> > > status) {
> > >
> > > bool has_unhandled_faults = false;
> > >
> > > @@ -2179,6 +2230,11 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct
> > > panthor_vm_op_ctx *op,>
> > >
> > > mutex_lock(&vm->op_lock);
> > > vm->op_ctx = op;
> > >
> > > +
> > > + ret = panthor_vm_lock_region(vm, op->va.addr, op->va.range);
> > > + if (ret)
> > > + goto out;
> > > +
> > >
> > > switch (op_type) {
> > >
> > > case DRM_PANTHOR_VM_BIND_OP_TYPE_MAP:
> > > if (vm->unusable) {
> > >
> > > @@ -2199,6 +2255,9 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct
> > > panthor_vm_op_ctx *op,>
> > >
> > > break;
> > >
> > > }
> > >
> > > + panthor_vm_unlock_region(vm);
> > > +
> > >
> > > +out:
> > > if (ret && flag_vm_unusable_on_failure)
> > >
> > > vm->unusable = true;
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 1/7] drm/panthor: Add support for atomic page table updates
2025-07-15 15:33 ` Caterina Shablia
@ 2025-07-16 15:43 ` Steven Price
2025-08-21 11:51 ` Boris Brezillon
0 siblings, 1 reply; 36+ messages in thread
From: Steven Price @ 2025-07-16 15:43 UTC (permalink / raw)
To: Caterina Shablia, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
Matt Coster, Karol Herbst, Lyude Paul, Danilo Krummrich,
Boris Brezillon, Liviu Dudau, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
On 15/07/2025 16:33, Caterina Shablia wrote:
> El martes, 15 de julio de 2025 17:08:09 (hora de verano de Europa central),
> Caterina Shablia escribió:
>> El viernes, 11 de julio de 2025 15:30:21 (hora de verano de Europa central),
>> Steven Price escribió:
>>> On 07/07/2025 18:04, Caterina Shablia wrote:
>>>> From: Boris Brezillon <boris.brezillon@collabora.com>
>>>>
>>>> Move the lock/flush_mem operations around the gpuvm_sm_map() calls so
>>>> we can implement true atomic page updates, where any access in the
>>>> locked range done by the GPU has to wait for the page table updates
>>>> to land before proceeding.
>>>>
>>>> This is needed for vkQueueBindSparse(), so we can replace the dummy
>>>> page mapped over the entire object by actual BO backed pages in an
>>>> atomic
>>>> way.
>>>>
>>>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
>>>> Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
>>>> ---
>>>>
>>>> drivers/gpu/drm/panthor/panthor_mmu.c | 65 +++++++++++++++++++++++++--
>>>> 1 file changed, 62 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c
>>>> b/drivers/gpu/drm/panthor/panthor_mmu.c index b39ea6acc6a9..9caaba03c5eb
>>>> 100644
>>>> --- a/drivers/gpu/drm/panthor/panthor_mmu.c
>>>> +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
>>>> @@ -387,6 +387,15 @@ struct panthor_vm {
>>>>
>>>> * flagged as faulty as a result.
>>>> */
>>>>
>>>> bool unhandled_fault;
>>>>
>>>> +
>>>> + /** @locked_region: Information about the currently locked region
>>>> currently. */ + struct {
>>>> + /** @locked_region.start: Start of the locked region.
>>
>> */
>>
>>>> + u64 start;
>>>> +
>>>> + /** @locked_region.size: Size of the locked region. */
>>>> + u64 size;
>>>> + } locked_region;
>>>>
>>>> };
>>>>
>>>> /**
>>>>
>>>> @@ -775,6 +784,10 @@ int panthor_vm_active(struct panthor_vm *vm)
>>>>
>>>> }
>>>>
>>>> ret = panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab,
>>
>> transcfg,
>>
>>>> vm->memattr);>
>>>>
>>>> + if (!ret && vm->locked_region.size) {
>>>> + lock_region(ptdev, vm->as.id, vm->locked_region.start,
>>>> vm->locked_region.size); + ret = wait_ready(ptdev, vm-
>>>
>>> as.id);
>>>
>>>> + }
>>>
>>> Do we need to do this? It seems odd to restore a MMU context and
>>> immediately set a lock region. Is there a good reason we can't just
>>> WARN_ON if there's a lock region set in panthor_vm_idle()?
>>
>> So IIUC, when things are otherwise idle and we do a vm_bind, the vm will be
>> inactive, in which case we're not going to poke the mmu to inform it of the
>> locked region, because it literally is not aware of this vm. Now if in the
>> meanwhile something submits a job and activates the vm, we need to inform
>> the mmu of the locked region, as vm_bind job might still be going on. I
>> don't see why panthor_vm_idle while a region is locked would be a problem?
>> That can arise e.g. when a job completes but there's a concurrent vm_bind
>> going on?
So it's absolutely fine (and normal) to perform a vm_bind while the VM
is idle. In this case there's no need to perform the lock because
there's nothing running on the GPU which could be affected by the page
tables being (temporarily) inconsistent.
What I'm questioning is the design where if, in the middle of a vm_bind
operation, a job comes in and then we set the lock region rather than
just waiting for the vm_bind operation to complete. AFAICT simply
synchronising on the vm->op_lock should achieve this.
>>> I think we need to briefly take vm->op_lock to ensure synchronisation
>>> but that doesn't seem a big issue. Or perhaps there's a good reason that
>>> I'm missing?
>>
>> I think you're right, all other accesses to locked_region are guarded by
>> op_lock. GPU job submit poke vm_active concurrently with vm_bind jobs doing
>> region {,un}locks.
> Actually no, that's not necessary. Access to locked_region is protected by
> slots_lock, which is held here. Trying to lock vm->op_lock would also be
> detrimental here, because these locks are often taken together and slots_lock
> is taken after op_lock is taken, so taking op_lock here would be extremely
> deadlockful.
It would obviously be necessary to acquire vm->op_lock before
as.slots_lock as you say to avoid deadlocks. Note that as soon as
as.slots_lock is held vm->op_lock can be dropped.
I just find the current approach a little odd, and unless there's a good
reason for it would prefer that we don't enable a VM on a new address
space while there's an outstanding vm_bind still running. Obviously if
there's a good reason (e.g. we really do expect long running vm_bind
operations) then that just need documenting in the commit message. But
I'm not aware that's the case here.
Although in general I'm a bit wary of relying on the whole lock region
feature - previous GPUs have an errata. But maybe I'm being over
cautious there.
Thanks,
Steve
>>
>>>> out_make_active:
>>>> if (!ret) {
>>>>
>>>> @@ -902,6 +915,9 @@ static int panthor_vm_unmap_pages(struct panthor_vm
>>>> *vm, u64 iova, u64 size)>
>>>>
>>>> struct io_pgtable_ops *ops = vm->pgtbl_ops;
>>>> u64 offset = 0;
>>>>
>>>> + drm_WARN_ON(&ptdev->base,
>>>> + (iova < vm->locked_region.start) ||
>>>> + (iova + size > vm->locked_region.start + vm-
>>>
>>> locked_region.size));
>>>
>>>> drm_dbg(&ptdev->base, "unmap: as=%d, iova=%llx, len=%llx", vm-
>>>
>>> as.id,
>>>
>>>> iova, size);
>>>>
>>>> while (offset < size) {
>>>>
>>>> @@ -915,13 +931,12 @@ static int panthor_vm_unmap_pages(struct
>>>> panthor_vm
>>>> *vm, u64 iova, u64 size)>
>>>>
>>>> iova + offset + unmapped_sz,
>>>> iova + offset + pgsize * pgcount,
>>>> iova, iova + size);
>>>>
>>>> - panthor_vm_flush_range(vm, iova, offset +
>>
>> unmapped_sz);
>>
>>>> return -EINVAL;
>>>>
>>>> }
>>>> offset += unmapped_sz;
>>>>
>>>> }
>>>>
>>>> - return panthor_vm_flush_range(vm, iova, size);
>>>> + return 0;
>>>>
>>>> }
>>>>
>>>> static int
>>>>
>>>> @@ -938,6 +953,10 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64
>>>> iova,
>>>> int prot,>
>>>>
>>>> if (!size)
>>>>
>>>> return 0;
>>>>
>>>> + drm_WARN_ON(&ptdev->base,
>>>> + (iova < vm->locked_region.start) ||
>>>> + (iova + size > vm->locked_region.start + vm-
>>>
>>> locked_region.size));
>>>
>>>> +
>>>>
>>>> for_each_sgtable_dma_sg(sgt, sgl, count) {
>>>>
>>>> dma_addr_t paddr = sg_dma_address(sgl);
>>>> size_t len = sg_dma_len(sgl);
>>>>
>>>> @@ -985,7 +1004,7 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64
>>>> iova,
>>>> int prot,>
>>>>
>>>> offset = 0;
>>>>
>>>> }
>>>>
>>>> - return panthor_vm_flush_range(vm, start_iova, iova - start_iova);
>>>> + return 0;
>>>>
>>>> }
>>>>
>>>> static int flags_to_prot(u32 flags)
>>>>
>>>> @@ -1654,6 +1673,38 @@ static const char *access_type_name(struct
>>>> panthor_device *ptdev,>
>>>>
>>>> }
>>>>
>>>> }
>>>>
>>>> +static int panthor_vm_lock_region(struct panthor_vm *vm, u64 start, u64
>>>> size) +{
>>>> + struct panthor_device *ptdev = vm->ptdev;
>>>> + int ret = 0;
>>>> +
>>>> + mutex_lock(&ptdev->mmu->as.slots_lock);
>>>> + drm_WARN_ON(&ptdev->base, vm->locked_region.start ||
>>>> vm->locked_region.size); + vm->locked_region.start = start;
>>>> + vm->locked_region.size = size;
>>>> + if (vm->as.id >= 0) {
>>>> + lock_region(ptdev, vm->as.id, start, size);
>>>> + ret = wait_ready(ptdev, vm->as.id);
>>>> + }
>>>> + mutex_unlock(&ptdev->mmu->as.slots_lock);
>>>> +
>>>> + return ret;
>>>> +}
>>>> +
>>>> +static void panthor_vm_unlock_region(struct panthor_vm *vm)
>>>> +{
>>>> + struct panthor_device *ptdev = vm->ptdev;
>>>> +
>>>> + mutex_lock(&ptdev->mmu->as.slots_lock);
>>>> + if (vm->as.id >= 0) {
>>>> + write_cmd(ptdev, vm->as.id, AS_COMMAND_FLUSH_MEM);
>>>> + drm_WARN_ON(&ptdev->base, wait_ready(ptdev, vm-
>>>
>>> as.id));
>>>
>>>> + }
>>>> + vm->locked_region.start = 0;
>>>> + vm->locked_region.size = 0;
>>>> + mutex_unlock(&ptdev->mmu->as.slots_lock);
>>>> +}
>>>
>>> Do we need to include a drm_dev_enter() somewhere here? I note that
>>> panthor_vm_flush_range() has one and you've effectively replaced that
>>> code with the above.
>>
>> Looks like we should.
>>
>>> Thanks,
>>> Steve
>>>
>>>> +
>>>>
>>>> static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32
>>>> status) {
>>>>
>>>> bool has_unhandled_faults = false;
>>>>
>>>> @@ -2179,6 +2230,11 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct
>>>> panthor_vm_op_ctx *op,>
>>>>
>>>> mutex_lock(&vm->op_lock);
>>>> vm->op_ctx = op;
>>>>
>>>> +
>>>> + ret = panthor_vm_lock_region(vm, op->va.addr, op->va.range);
>>>> + if (ret)
>>>> + goto out;
>>>> +
>>>>
>>>> switch (op_type) {
>>>>
>>>> case DRM_PANTHOR_VM_BIND_OP_TYPE_MAP:
>>>> if (vm->unusable) {
>>>>
>>>> @@ -2199,6 +2255,9 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct
>>>> panthor_vm_op_ctx *op,>
>>>>
>>>> break;
>>>>
>>>> }
>>>>
>>>> + panthor_vm_unlock_region(vm);
>>>> +
>>>>
>>>> +out:
>>>> if (ret && flag_vm_unusable_on_failure)
>>>>
>>>> vm->unusable = true;
>
>
>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 1/7] drm/panthor: Add support for atomic page table updates
2025-07-15 16:09 ` Caterina Shablia
@ 2025-07-16 15:53 ` Steven Price
0 siblings, 0 replies; 36+ messages in thread
From: Steven Price @ 2025-07-16 15:53 UTC (permalink / raw)
To: Caterina Shablia, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
Matt Coster, Karol Herbst, Lyude Paul, Danilo Krummrich,
Boris Brezillon, Liviu Dudau, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
On 15/07/2025 17:09, Caterina Shablia wrote:
> El martes, 15 de julio de 2025 17:08:09 (hora de verano de Europa central),
> Caterina Shablia escribió:
>> El viernes, 11 de julio de 2025 15:30:21 (hora de verano de Europa central),
>> Steven Price escribió:
>>> On 07/07/2025 18:04, Caterina Shablia wrote:
>>>> From: Boris Brezillon <boris.brezillon@collabora.com>
>>>>
>>>> Move the lock/flush_mem operations around the gpuvm_sm_map() calls so
>>>> we can implement true atomic page updates, where any access in the
>>>> locked range done by the GPU has to wait for the page table updates
>>>> to land before proceeding.
>>>>
>>>> This is needed for vkQueueBindSparse(), so we can replace the dummy
>>>> page mapped over the entire object by actual BO backed pages in an
>>>> atomic
>>>> way.
>>>>
>>>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
>>>> Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
>>>> ---
>>>>
>>>> drivers/gpu/drm/panthor/panthor_mmu.c | 65 +++++++++++++++++++++++++--
>>>> 1 file changed, 62 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c
>>>> b/drivers/gpu/drm/panthor/panthor_mmu.c index b39ea6acc6a9..9caaba03c5eb
>>>> 100644
>>>> --- a/drivers/gpu/drm/panthor/panthor_mmu.c
>>>> +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
>>>> @@ -387,6 +387,15 @@ struct panthor_vm {
>>>>
>>>> * flagged as faulty as a result.
>>>> */
>>>>
>>>> bool unhandled_fault;
>>>>
>>>> +
>>>> + /** @locked_region: Information about the currently locked region
>>>> currently. */ + struct {
>>>> + /** @locked_region.start: Start of the locked region.
>>
>> */
>>
>>>> + u64 start;
>>>> +
>>>> + /** @locked_region.size: Size of the locked region. */
>>>> + u64 size;
>>>> + } locked_region;
>>>>
>>>> };
>>>>
>>>> /**
>>>>
>>>> @@ -775,6 +784,10 @@ int panthor_vm_active(struct panthor_vm *vm)
>>>>
>>>> }
>>>>
>>>> ret = panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab,
>>
>> transcfg,
>>
>>>> vm->memattr);>
>>>>
>>>> + if (!ret && vm->locked_region.size) {
>>>> + lock_region(ptdev, vm->as.id, vm->locked_region.start,
>>>> vm->locked_region.size); + ret = wait_ready(ptdev, vm-
>>>
>>> as.id);
>>>
>>>> + }
>>>
>>> Do we need to do this? It seems odd to restore a MMU context and
>>> immediately set a lock region. Is there a good reason we can't just
>>> WARN_ON if there's a lock region set in panthor_vm_idle()?
>>
>> So IIUC, when things are otherwise idle and we do a vm_bind, the vm will be
>> inactive, in which case we're not going to poke the mmu to inform it of the
>> locked region, because it literally is not aware of this vm. Now if in the
>> meanwhile something submits a job and activates the vm, we need to inform
>> the mmu of the locked region, as vm_bind job might still be going on. I
>> don't see why panthor_vm_idle while a region is locked would be a problem?
>> That can arise e.g. when a job completes but there's a concurrent vm_bind
>> going on?
>>> I think we need to briefly take vm->op_lock to ensure synchronisation
>>> but that doesn't seem a big issue. Or perhaps there's a good reason that
>>> I'm missing?
>>
>> I think you're right, all other accesses to locked_region are guarded by
>> op_lock. GPU job submit poke vm_active concurrently with vm_bind jobs doing
>> region {,un}locks.
>>
>>>> out_make_active:
>>>> if (!ret) {
>>>>
>>>> @@ -902,6 +915,9 @@ static int panthor_vm_unmap_pages(struct panthor_vm
>>>> *vm, u64 iova, u64 size)>
>>>>
>>>> struct io_pgtable_ops *ops = vm->pgtbl_ops;
>>>> u64 offset = 0;
>>>>
>>>> + drm_WARN_ON(&ptdev->base,
>>>> + (iova < vm->locked_region.start) ||
>>>> + (iova + size > vm->locked_region.start + vm-
>>>
>>> locked_region.size));
>>>
>>>> drm_dbg(&ptdev->base, "unmap: as=%d, iova=%llx, len=%llx", vm-
>>>
>>> as.id,
>>>
>>>> iova, size);
>>>>
>>>> while (offset < size) {
>>>>
>>>> @@ -915,13 +931,12 @@ static int panthor_vm_unmap_pages(struct
>>>> panthor_vm
>>>> *vm, u64 iova, u64 size)>
>>>>
>>>> iova + offset + unmapped_sz,
>>>> iova + offset + pgsize * pgcount,
>>>> iova, iova + size);
>>>>
>>>> - panthor_vm_flush_range(vm, iova, offset +
>>
>> unmapped_sz);
>>
>>>> return -EINVAL;
>>>>
>>>> }
>>>> offset += unmapped_sz;
>>>>
>>>> }
>>>>
>>>> - return panthor_vm_flush_range(vm, iova, size);
>>>> + return 0;
>>>>
>>>> }
>>>>
>>>> static int
>>>>
>>>> @@ -938,6 +953,10 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64
>>>> iova,
>>>> int prot,>
>>>>
>>>> if (!size)
>>>>
>>>> return 0;
>>>>
>>>> + drm_WARN_ON(&ptdev->base,
>>>> + (iova < vm->locked_region.start) ||
>>>> + (iova + size > vm->locked_region.start + vm-
>>>
>>> locked_region.size));
>>>
>>>> +
>>>>
>>>> for_each_sgtable_dma_sg(sgt, sgl, count) {
>>>>
>>>> dma_addr_t paddr = sg_dma_address(sgl);
>>>> size_t len = sg_dma_len(sgl);
>>>>
>>>> @@ -985,7 +1004,7 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64
>>>> iova,
>>>> int prot,>
>>>>
>>>> offset = 0;
>>>>
>>>> }
>>>>
>>>> - return panthor_vm_flush_range(vm, start_iova, iova - start_iova);
>>>> + return 0;
>>>>
>>>> }
>>>>
>>>> static int flags_to_prot(u32 flags)
>>>>
>>>> @@ -1654,6 +1673,38 @@ static const char *access_type_name(struct
>>>> panthor_device *ptdev,>
>>>>
>>>> }
>>>>
>>>> }
>>>>
>>>> +static int panthor_vm_lock_region(struct panthor_vm *vm, u64 start, u64
>>>> size) +{
>>>> + struct panthor_device *ptdev = vm->ptdev;
>>>> + int ret = 0;
>>>> +
>>>> + mutex_lock(&ptdev->mmu->as.slots_lock);
>>>> + drm_WARN_ON(&ptdev->base, vm->locked_region.start ||
>>>> vm->locked_region.size); + vm->locked_region.start = start;
>>>> + vm->locked_region.size = size;
>>>> + if (vm->as.id >= 0) {
>>>> + lock_region(ptdev, vm->as.id, start, size);
>>>> + ret = wait_ready(ptdev, vm->as.id);
>>>> + }
>>>> + mutex_unlock(&ptdev->mmu->as.slots_lock);
>>>> +
>>>> + return ret;
>>>> +}
>>>> +
>>>> +static void panthor_vm_unlock_region(struct panthor_vm *vm)
>>>> +{
>>>> + struct panthor_device *ptdev = vm->ptdev;
>>>> +
>>>> + mutex_lock(&ptdev->mmu->as.slots_lock);
>>>> + if (vm->as.id >= 0) {
>>>> + write_cmd(ptdev, vm->as.id, AS_COMMAND_FLUSH_MEM);
>>>> + drm_WARN_ON(&ptdev->base, wait_ready(ptdev, vm-
>>>
>>> as.id));
>>>
>>>> + }
>>>> + vm->locked_region.start = 0;
>>>> + vm->locked_region.size = 0;
>>>> + mutex_unlock(&ptdev->mmu->as.slots_lock);
>>>> +}
>>>
>>> Do we need to include a drm_dev_enter() somewhere here? I note that
>>> panthor_vm_flush_range() has one and you've effectively replaced that
>>> code with the above.
>>
>> Looks like we should.
> Actually not sure. I think I'm either misunderstanding what drm_dev_enter is,
> or there's other things that should be doing it. Notably
> panthor_mmu_as_{en,dis}able or their callers aren't doing drm_dev_enter, yet
> are poking the hw, so that seems to me like that code also runs the risk of
> poking the hw while/after it was unplugged, but I'm not confident in my
> understanding at all. I guess an extra drm_dev_enter here or there isn't going
> to harm anyone as it's recurrent so I'll put one around the call to
> lock_region in panthor_vm_lock_region as well.
In theory all paths that touch GPU registers should be wrapped in a
drm_dev_enter() somewhere so that we don't poke the hardware when it's
gone. Of course in practice we don't really have hotpluggable hardware
so this isn't exactly well tested. So there might be existing bugs. As
you say it is possible to nest drm_dev_enter() so that might be easiest
in this case.
We do have the 'automotive' GPUs (not currently supported by panthor[1])
which support virtualisation where it is potentially possible to "rip
out" the GPU from a guest. So I suspect in the future we will start to
care more about this.
Thanks,
Steve
[1] I think the only public version is Mali-G78AE which is a job manager
GPU which would be panfrost anyway. I've no idea what the roadmap is for
future GPUs.
>>> Thanks,
>>> Steve
>>>
>>>> +
>>>>
>>>> static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32
>>>> status) {
>>>>
>>>> bool has_unhandled_faults = false;
>>>>
>>>> @@ -2179,6 +2230,11 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct
>>>> panthor_vm_op_ctx *op,>
>>>>
>>>> mutex_lock(&vm->op_lock);
>>>> vm->op_ctx = op;
>>>>
>>>> +
>>>> + ret = panthor_vm_lock_region(vm, op->va.addr, op->va.range);
>>>> + if (ret)
>>>> + goto out;
>>>> +
>>>>
>>>> switch (op_type) {
>>>>
>>>> case DRM_PANTHOR_VM_BIND_OP_TYPE_MAP:
>>>> if (vm->unusable) {
>>>>
>>>> @@ -2199,6 +2255,9 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct
>>>> panthor_vm_op_ctx *op,>
>>>>
>>>> break;
>>>>
>>>> }
>>>>
>>>> + panthor_vm_unlock_region(vm);
>>>> +
>>>>
>>>> +out:
>>>> if (ret && flag_vm_unusable_on_failure)
>>>>
>>>> vm->unusable = true;
>
>
>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 7/7] drm/panthor: Add support for repeated mappings
2025-07-15 15:17 ` Caterina Shablia
@ 2025-07-16 15:59 ` Steven Price
0 siblings, 0 replies; 36+ messages in thread
From: Steven Price @ 2025-07-16 15:59 UTC (permalink / raw)
To: Caterina Shablia, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
Matt Coster, Karol Herbst, Lyude Paul, Danilo Krummrich,
Boris Brezillon, Liviu Dudau, Lucas De Marchi,
Thomas Hellström, Rodrigo Vivi
Cc: dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
On 15/07/2025 16:17, Caterina Shablia wrote:
> El viernes, 11 de julio de 2025 16:03:26 (hora de verano de Europa central),
> Steven Price escribió:
>> On 07/07/2025 18:04, Caterina Shablia wrote:
>>> From: Boris Brezillon <boris.brezillon@collabora.com>
>>>
>>> This allows us to optimize mapping of a relatively small
>>> portion of a BO over and over in a large VA range, which
>>> is useful to support Vulkan sparse bindings in an efficient
>>> way.
>>>
>>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
>>> Co-developed-by: Caterina Shablia <caterina.shablia@collabora.com>
>>> Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
>>
>> This looks like the right sort of shape. From an uAPI perspective I'm
>> not sure whether u32 is the right type for bo_repeat_range. While I
>> can't immediately see a use for having a large repeat range it seems a
>> little silly to close it off when we're going to have padding afterwards
>> anyway. Obviously the kernel would reject large values because the
>> internal APIs are only u32. But it would enable this to be fixed if we
>> ever discover a usecase without a uAPI change.
>>
>>> ---
>>>
>>> drivers/gpu/drm/panthor/panthor_drv.c | 3 +-
>>> drivers/gpu/drm/panthor/panthor_mmu.c | 78 ++++++++++++++++++++++++---
>>> include/uapi/drm/panthor_drm.h | 23 ++++++++
>>> 3 files changed, 95 insertions(+), 9 deletions(-)
>>>
[...]
>>> +static int
>>> +panthor_vm_repeated_map_pages(struct panthor_vm *vm, u64 iova, int prot,
>>> + struct sg_table *sgt, u64 offset, u64
> size,
>>> + u64 count)
>>> +{
>>> + /* FIXME: we really need to optimize this at the io_pgtable level.
> */
>>
>> Do you have plans for optimizing this?
> I personally don't have any plans, no, but maybe Boris does. I'll forward this
> question to him once he's back from his vacation.
>> How bad is the performance
>> without optimizing?
> It's better than the alternative of poking vm_bind with a morbillion
> drm_panthor_vm_bind_ops. More seriously, I don't really have any workloads
> beside VK CTS to measure, for now. There's some stuff we should try to do in
> panvk first, like using a 2M dummy_page and doing some gymnastics when mapping
> it so we get huge mappings, which will hopefully lessen the pressure on this
> loop.
Ok, sounds like some more investigation is needed. I'm a little wary of
a new feature which has a FIXME like this as it sounds like the design
hasn't been tested yet. I'm happy with the current code if it's indeed
"good enough", but if it's still too slow to be actually usable then I
think we need to fix it before merging rather than have a new feature
which isn't actually fast enough to use.
>>
>>> + for (u64 i = 0; i < count; i++) {
>>> + int ret;
>>> +
>>> + ret = panthor_vm_map_pages(vm, iova + (size * i), prot,
>>> + sgt, offset, size);
>>> + if (ret) {
>>> + panthor_vm_unmap_pages(vm, iova, size * (i -
> 1));
>>> + return ret;
>>> + }
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +
>>>
[...]
>>>
>>> + /**
>>> + * @bo_repeat_range: The size of the range to be repeated.
>>> + *
>>> + * Must be zero if DRM_PANTHOR_VM_BIND_OP_MAP_REPEAT is not set in
>>> + * flags.
>>> + *
>>> + * Size must be a multiple of bo_repeat_range.
>>> + */
>>> + __u32 bo_repeat_range;
>>> +
>>> + /** @pad: Padding field. MBZ. */
>>> + __u32 pad;
>>
>> If we're going to have the padding then the kernel needs to check that
>> this padding is zero, so that it can be available for future use.
> I decided to go with your suggestion to change bo_repeat_range to be an __u64,
> but rejecting vm_binds with values above 2^32-1 for now.
Yeah I think that's cleanest. Please do include a comment in the uapi
file about the 2^32-1 limit though.
Thanks,
Steve
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 4/7] drm/gpuvm: Add a helper to check if two VA can be merged
2025-07-07 17:04 ` [PATCH v4 4/7] drm/gpuvm: Add a helper to check if two VA can be merged Caterina Shablia
2025-07-07 19:00 ` Danilo Krummrich
@ 2025-07-22 19:17 ` Adrian Larumbe
2025-08-21 11:54 ` Boris Brezillon
1 sibling, 1 reply; 36+ messages in thread
From: Adrian Larumbe @ 2025-07-22 19:17 UTC (permalink / raw)
To: Caterina Shablia
Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Danilo Krummrich, Boris Brezillon, Steven Price, Liviu Dudau,
Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, dri-devel,
linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
On 07.07.2025 17:04, Caterina Shablia wrote:
> From: Boris Brezillon <boris.brezillon@collabora.com>
>
> We are going to add flags/properties that will impact the VA merging
> ability. Instead of sprinkling tests all over the place in
> __drm_gpuvm_sm_map(), let's add a helper aggregating all these checks
> can call it for every existing VA we walk through in the
> __drm_gpuvm_sm_map() loop.
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
> ---
> drivers/gpu/drm/drm_gpuvm.c | 47 +++++++++++++++++++++++++++++--------
> 1 file changed, 37 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> index 05978c5c38b1..dc3c2f906400 100644
> --- a/drivers/gpu/drm/drm_gpuvm.c
> +++ b/drivers/gpu/drm/drm_gpuvm.c
> @@ -2098,12 +2098,48 @@ op_unmap_cb(const struct drm_gpuvm_ops *fn, void *priv,
> return fn->sm_step_unmap(&op, priv);
> }
>
> +static bool can_merge(struct drm_gpuvm *gpuvm, const struct drm_gpuva *a,
> + const struct drm_gpuva *b)
> +{
> + /* Only GEM-based mappings can be merged, and they must point to
> + * the same GEM object.
> + */
> + if (a->gem.obj != b->gem.obj || !a->gem.obj)
> + return false;
> +
> + /* Let's keep things simple for now and force all flags to match. */
> + if (a->flags != b->flags)
> + return false;
> +
> + /* Order VAs for the rest of the checks. */
> + if (a->va.addr > b->va.addr)
> + swap(a, b);
> +
> + /* We assume the caller already checked that VAs overlap or are
> + * contiguous.
> + */
> + if (drm_WARN_ON(gpuvm->drm, b->va.addr > a->va.addr + a->va.range))
> + return false;
> +
> + /* We intentionally ignore u64 underflows because all we care about
> + * here is whether the VA diff matches the GEM offset diff.
> + */
> + return b->va.addr - a->va.addr == b->gem.offset - a->gem.offset;
If we're reordering the VAs for the rest of the checks, when could underflow happen?
> +}
> +
> static int
> __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> const struct drm_gpuvm_ops *ops, void *priv,
> const struct drm_gpuvm_map_req *req)
> {
> struct drm_gpuva *va, *next;
> + struct drm_gpuva reqva = {
> + .va.addr = req->va.addr,
> + .va.range = req->va.range,
> + .gem.offset = req->gem.offset,
> + .gem.obj = req->gem.obj,
> + .flags = req->flags,
> + };
> u64 req_end = req->va.addr + req->va.range;
> int ret;
>
> @@ -2116,12 +2152,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> u64 addr = va->va.addr;
> u64 range = va->va.range;
> u64 end = addr + range;
> - bool merge = !!va->gem.obj;
> + bool merge = can_merge(gpuvm, va, &reqva);
>
> if (addr == req->va.addr) {
> - merge &= obj == req->gem.obj &&
> - offset == req->gem.offset;
> -
> if (end == req_end) {
> ret = op_unmap_cb(ops, priv, va, merge);
> if (ret)
> @@ -2163,8 +2196,6 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> };
> struct drm_gpuva_op_unmap u = { .va = va };
>
> - merge &= obj == req->gem.obj &&
> - offset + ls_range == req->gem.offset;
> u.keep = merge;
>
> if (end == req_end) {
> @@ -2196,10 +2227,6 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> break;
> }
> } else if (addr > req->va.addr) {
> - merge &= obj == req->gem.obj &&
> - offset == req->gem.offset +
> - (addr - req->va.addr);
> -
> if (end == req_end) {
> ret = op_unmap_cb(ops, priv, va, merge);
> if (ret)
> --
> 2.47.2
Adrian Larumbe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 5/7] drm/gpuvm: Add a flags field to drm_gpuvm_map_req/drm_gpuva_op_map
2025-07-07 17:04 ` [PATCH v4 5/7] drm/gpuvm: Add a flags field to drm_gpuvm_map_req/drm_gpuva_op_map Caterina Shablia
2025-07-07 19:03 ` Danilo Krummrich
@ 2025-07-22 19:21 ` Adrian Larumbe
2025-08-21 12:21 ` Boris Brezillon
1 sibling, 1 reply; 36+ messages in thread
From: Adrian Larumbe @ 2025-07-22 19:21 UTC (permalink / raw)
To: Caterina Shablia
Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Frank Binns, Matt Coster, Karol Herbst, Lyude Paul,
Danilo Krummrich, Boris Brezillon, Steven Price, Liviu Dudau,
Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, dri-devel,
linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
On 07.07.2025 17:04, Caterina Shablia wrote:
> From: Asahi Lina <lina@asahilina.net>
>
> drm_gpuva objects have a flags field. Currently, this can be managed by
> drivers out-of-band, without any special handling in drm_gpuvm.
>
> To be able to introduce flags that do affect the logic in the drm_gpuvm
> core, we need to plumb it through the map calls. This will allow the
> core to check the flags on map and alter the merge/split logic depending
> on the requested flags and the flags of the existing drm_gpuva ranges
> that are being split.
>
> Signed-off-by: Asahi Lina <lina@asahilina.net>
> Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
> ---
> drivers/gpu/drm/drm_gpuvm.c | 7 +++++++
> include/drm/drm_gpuvm.h | 9 +++++++++
> 2 files changed, 16 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> index dc3c2f906400..dd949a8853b0 100644
> --- a/drivers/gpu/drm/drm_gpuvm.c
> +++ b/drivers/gpu/drm/drm_gpuvm.c
> @@ -2063,6 +2063,7 @@ op_map_cb(const struct drm_gpuvm_ops *fn, void *priv,
> op.map.va.range = req->va.range;
> op.map.gem.obj = req->gem.obj;
> op.map.gem.offset = req->gem.offset;
> + op.map.flags = req->flags;
>
> return fn->sm_step_map(&op, priv);
> }
> @@ -2175,6 +2176,7 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> .va.range = range - req->va.range,
> .gem.obj = obj,
> .gem.offset = offset + req->va.range,
> + .flags = va->flags,
> };
> struct drm_gpuva_op_unmap u = {
> .va = va,
> @@ -2193,6 +2195,7 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> .va.range = ls_range,
> .gem.obj = obj,
> .gem.offset = offset,
> + .flags = va->flags,
> };
> struct drm_gpuva_op_unmap u = { .va = va };
>
> @@ -2219,6 +2222,7 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> .gem.obj = obj,
> .gem.offset = offset + ls_range +
> req->va.range,
> + .flags = va->flags,
> };
>
> ret = op_remap_cb(ops, priv, &p, &n, &u);
> @@ -2247,6 +2251,7 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> .va.range = end - req_end,
> .gem.obj = obj,
> .gem.offset = offset + req_end - addr,
> + .flags = va->flags,
> };
> struct drm_gpuva_op_unmap u = {
> .va = va,
> @@ -2290,6 +2295,7 @@ __drm_gpuvm_sm_unmap(struct drm_gpuvm *gpuvm,
> prev.va.range = req_addr - addr;
> prev.gem.obj = obj;
> prev.gem.offset = offset;
> + prev.flags = va->flags;
>
> prev_split = true;
> }
> @@ -2299,6 +2305,7 @@ __drm_gpuvm_sm_unmap(struct drm_gpuvm *gpuvm,
> next.va.range = end - req_end;
> next.gem.obj = obj;
> next.gem.offset = offset + (req_end - addr);
> + next.flags = va->flags;
>
> next_split = true;
> }
> diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
> index a6e6c33fc10b..f77a89e791f1 100644
> --- a/include/drm/drm_gpuvm.h
> +++ b/include/drm/drm_gpuvm.h
> @@ -847,6 +847,11 @@ struct drm_gpuva_op_map {
> */
> struct drm_gem_object *obj;
> } gem;
> +
> + /**
> + * @flags: requested flags for the &drm_gpuva for this mapping
> + */
> + enum drm_gpuva_flags flags;
> };
>
> /**
> @@ -1074,6 +1079,9 @@ struct drm_gpuvm_map_req {
> /** @offset: offset in the GEM */
> u64 offset;
> } gem;
> +
> + /** @flags: combination of DRM_GPUVA_ flags describing the mapping properties. */
> + enum drm_gpuva_flags flags;
Wouldn't this be better expressed as a u32 combination of enum drm_gpuva_flags flags?
Calling it 'flags' makes me feel like any OR'd combination of enum values would be possible.
> };
>
> struct drm_gpuva_ops *
> @@ -1097,6 +1105,7 @@ void drm_gpuva_ops_free(struct drm_gpuvm *gpuvm,
> static inline void drm_gpuva_init_from_op(struct drm_gpuva *va,
> struct drm_gpuva_op_map *op)
> {
> + va->flags = op->flags;
> va->va.addr = op->va.addr;
> va->va.range = op->va.range;
> va->gem.obj = op->gem.obj;
> --
> 2.47.2
Adrian Larumbe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 1/7] drm/panthor: Add support for atomic page table updates
2025-07-11 13:30 ` Steven Price
2025-07-15 15:08 ` Caterina Shablia
@ 2025-08-21 11:36 ` Boris Brezillon
1 sibling, 0 replies; 36+ messages in thread
From: Boris Brezillon @ 2025-08-21 11:36 UTC (permalink / raw)
To: Steven Price
Cc: Caterina Shablia, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
Matt Coster, Karol Herbst, Lyude Paul, Danilo Krummrich,
Liviu Dudau, Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
On Fri, 11 Jul 2025 14:30:21 +0100
Steven Price <steven.price@arm.com> wrote:
> On 07/07/2025 18:04, Caterina Shablia wrote:
> > From: Boris Brezillon <boris.brezillon@collabora.com>
> >
> > Move the lock/flush_mem operations around the gpuvm_sm_map() calls so
> > we can implement true atomic page updates, where any access in the
> > locked range done by the GPU has to wait for the page table updates
> > to land before proceeding.
> >
> > This is needed for vkQueueBindSparse(), so we can replace the dummy
> > page mapped over the entire object by actual BO backed pages in an atomic
> > way.
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
> > ---
> > drivers/gpu/drm/panthor/panthor_mmu.c | 65 +++++++++++++++++++++++++--
> > 1 file changed, 62 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c
> > index b39ea6acc6a9..9caaba03c5eb 100644
> > --- a/drivers/gpu/drm/panthor/panthor_mmu.c
> > +++ b/drivers/gpu/drm/panthor/panthor_mmu.c
> > @@ -387,6 +387,15 @@ struct panthor_vm {
> > * flagged as faulty as a result.
> > */
> > bool unhandled_fault;
> > +
> > + /** @locked_region: Information about the currently locked region currently. */
> > + struct {
> > + /** @locked_region.start: Start of the locked region. */
> > + u64 start;
> > +
> > + /** @locked_region.size: Size of the locked region. */
> > + u64 size;
> > + } locked_region;
> > };
> >
> > /**
> > @@ -775,6 +784,10 @@ int panthor_vm_active(struct panthor_vm *vm)
> > }
> >
> > ret = panthor_mmu_as_enable(vm->ptdev, vm->as.id, transtab, transcfg, vm->memattr);
> > + if (!ret && vm->locked_region.size) {
> > + lock_region(ptdev, vm->as.id, vm->locked_region.start, vm->locked_region.size);
> > + ret = wait_ready(ptdev, vm->as.id);
> > + }
>
> Do we need to do this? It seems odd to restore a MMU context and
> immediately set a lock region. Is there a good reason we can't just
> WARN_ON if there's a lock region set in panthor_vm_idle()?
>
> I think we need to briefly take vm->op_lock to ensure synchronisation
> but that doesn't seem a big issue. Or perhaps there's a good reason that
> I'm missing?
Hm, I wish I had written a big fat comment along this change, because
indeed, it seems simpler to take the op_lock to ensure any in-flight PT
update is flushed before we make the AS active, and I definitely don't
remember why I didn't do that. Could be some locking order inversion of
some sort between the slot lock, or maybe I overthought this at the
time. In any case, it looks like it's worth a try.
>
> >
> > out_make_active:
> > if (!ret) {
> > @@ -902,6 +915,9 @@ static int panthor_vm_unmap_pages(struct panthor_vm *vm, u64 iova, u64 size)
> > struct io_pgtable_ops *ops = vm->pgtbl_ops;
> > u64 offset = 0;
> >
> > + drm_WARN_ON(&ptdev->base,
> > + (iova < vm->locked_region.start) ||
> > + (iova + size > vm->locked_region.start + vm->locked_region.size));
> > drm_dbg(&ptdev->base, "unmap: as=%d, iova=%llx, len=%llx", vm->as.id, iova, size);
> >
> > while (offset < size) {
> > @@ -915,13 +931,12 @@ static int panthor_vm_unmap_pages(struct panthor_vm *vm, u64 iova, u64 size)
> > iova + offset + unmapped_sz,
> > iova + offset + pgsize * pgcount,
> > iova, iova + size);
> > - panthor_vm_flush_range(vm, iova, offset + unmapped_sz);
> > return -EINVAL;
> > }
> > offset += unmapped_sz;
> > }
> >
> > - return panthor_vm_flush_range(vm, iova, size);
> > + return 0;
> > }
> >
> > static int
> > @@ -938,6 +953,10 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64 iova, int prot,
> > if (!size)
> > return 0;
> >
> > + drm_WARN_ON(&ptdev->base,
> > + (iova < vm->locked_region.start) ||
> > + (iova + size > vm->locked_region.start + vm->locked_region.size));
> > +
> > for_each_sgtable_dma_sg(sgt, sgl, count) {
> > dma_addr_t paddr = sg_dma_address(sgl);
> > size_t len = sg_dma_len(sgl);
> > @@ -985,7 +1004,7 @@ panthor_vm_map_pages(struct panthor_vm *vm, u64 iova, int prot,
> > offset = 0;
> > }
> >
> > - return panthor_vm_flush_range(vm, start_iova, iova - start_iova);
> > + return 0;
> > }
> >
> > static int flags_to_prot(u32 flags)
> > @@ -1654,6 +1673,38 @@ static const char *access_type_name(struct panthor_device *ptdev,
> > }
> > }
> >
> > +static int panthor_vm_lock_region(struct panthor_vm *vm, u64 start, u64 size)
> > +{
> > + struct panthor_device *ptdev = vm->ptdev;
> > + int ret = 0;
> > +
> > + mutex_lock(&ptdev->mmu->as.slots_lock);
> > + drm_WARN_ON(&ptdev->base, vm->locked_region.start || vm->locked_region.size);
> > + vm->locked_region.start = start;
> > + vm->locked_region.size = size;
> > + if (vm->as.id >= 0) {
> > + lock_region(ptdev, vm->as.id, start, size);
> > + ret = wait_ready(ptdev, vm->as.id);
> > + }
> > + mutex_unlock(&ptdev->mmu->as.slots_lock);
> > +
> > + return ret;
> > +}
> > +
> > +static void panthor_vm_unlock_region(struct panthor_vm *vm)
> > +{
> > + struct panthor_device *ptdev = vm->ptdev;
> > +
> > + mutex_lock(&ptdev->mmu->as.slots_lock);
> > + if (vm->as.id >= 0) {
> > + write_cmd(ptdev, vm->as.id, AS_COMMAND_FLUSH_MEM);
> > + drm_WARN_ON(&ptdev->base, wait_ready(ptdev, vm->as.id));
> > + }
> > + vm->locked_region.start = 0;
> > + vm->locked_region.size = 0;
> > + mutex_unlock(&ptdev->mmu->as.slots_lock);
> > +}
>
> Do we need to include a drm_dev_enter() somewhere here? I note that
> panthor_vm_flush_range() has one and you've effectively replaced that
> code with the above.
>
> Thanks,
> Steve
>
> > +
> > static void panthor_mmu_irq_handler(struct panthor_device *ptdev, u32 status)
> > {
> > bool has_unhandled_faults = false;
> > @@ -2179,6 +2230,11 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct panthor_vm_op_ctx *op,
> >
> > mutex_lock(&vm->op_lock);
> > vm->op_ctx = op;
> > +
> > + ret = panthor_vm_lock_region(vm, op->va.addr, op->va.range);
> > + if (ret)
> > + goto out;
> > +
> > switch (op_type) {
> > case DRM_PANTHOR_VM_BIND_OP_TYPE_MAP:
> > if (vm->unusable) {
> > @@ -2199,6 +2255,9 @@ panthor_vm_exec_op(struct panthor_vm *vm, struct panthor_vm_op_ctx *op,
> > break;
> > }
> >
> > + panthor_vm_unlock_region(vm);
> > +
> > +out:
> > if (ret && flag_vm_unusable_on_failure)
> > vm->unusable = true;
> >
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 1/7] drm/panthor: Add support for atomic page table updates
2025-07-16 15:43 ` Steven Price
@ 2025-08-21 11:51 ` Boris Brezillon
2025-08-21 15:02 ` Steven Price
0 siblings, 1 reply; 36+ messages in thread
From: Boris Brezillon @ 2025-08-21 11:51 UTC (permalink / raw)
To: Steven Price
Cc: Caterina Shablia, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
Matt Coster, Karol Herbst, Lyude Paul, Danilo Krummrich,
Liviu Dudau, Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
On Wed, 16 Jul 2025 16:43:24 +0100
Steven Price <steven.price@arm.com> wrote:
> >>> I think we need to briefly take vm->op_lock to ensure synchronisation
> >>> but that doesn't seem a big issue. Or perhaps there's a good reason that
> >>> I'm missing?
> >>
> >> I think you're right, all other accesses to locked_region are guarded by
> >> op_lock. GPU job submit poke vm_active concurrently with vm_bind jobs doing
> >> region {,un}locks.
> > Actually no, that's not necessary. Access to locked_region is protected by
> > slots_lock, which is held here. Trying to lock vm->op_lock would also be
> > detrimental here, because these locks are often taken together and slots_lock
> > is taken after op_lock is taken, so taking op_lock here would be extremely
> > deadlockful.
>
> It would obviously be necessary to acquire vm->op_lock before
> as.slots_lock as you say to avoid deadlocks. Note that as soon as
> as.slots_lock is held vm->op_lock can be dropped.
Yeah, lock ordering is not an issue, because we take slots_lock in this
function, so we're in full control of the ordering. And I wouldn't even
consider releasing op_lock as soon as we acquire slots_lock because
- that make things harder to reason about
- the locked section is not blocking on any sort of external event
- the locked section is pretty straightforward (so no excessive delays
expected here)
>
> I just find the current approach a little odd, and unless there's a good
> reason for it would prefer that we don't enable a VM on a new address
> space while there's an outstanding vm_bind still running. Obviously if
> there's a good reason (e.g. we really do expect long running vm_bind
> operations) then that just need documenting in the commit message. But
> I'm not aware that's the case here.
I fully agree here. If there's no obvious reason to not serialize
vm_active() on VM bind ops, I'd opt for taking the VM op_lock and
calling it a day. And I honestly can't think of any:
- the VM op logic is all synchronous/non-blocking
- it's expected to be fast
- AS rotation is something I hope is not happening too often, otherwise
we'll have other things to worry about (the whole CSG slot scheduling
logic is quite involved, and I'd expect the
BIND-while-making-AS-active to be rare enough that it becomes noise
in the overall overhead of kernel-side GPU scheduling happening in
Panthor)
>
> Although in general I'm a bit wary of relying on the whole lock region
> feature - previous GPUs have an errata. But maybe I'm being over
> cautious there.
We're heavily relying on it already to allow updates of the VM while
the GPU is executing stuff. If that's problematic on v10+, I'd rather
know early :D.
Regards,
Boris
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 3/7] drm/gpuvm: Pass map arguments through a struct
2025-07-07 18:44 ` Danilo Krummrich
@ 2025-08-21 11:53 ` Boris Brezillon
0 siblings, 0 replies; 36+ messages in thread
From: Boris Brezillon @ 2025-08-21 11:53 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Caterina Shablia, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
Matt Coster, Karol Herbst, Lyude Paul, Steven Price, Liviu Dudau,
Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, dri-devel,
linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
On Mon, 07 Jul 2025 20:44:49 +0200
"Danilo Krummrich" <dakr@kernel.org> wrote:
> On Mon Jul 7, 2025 at 7:04 PM CEST, Caterina Shablia wrote:
> > +/**
> > + * struct drm_gpuvm_map_req - arguments passed to drm_gpuvm_sm_map[_ops_create]()
> > + */
> > +struct drm_gpuvm_map_req {
> > + /** @va: virtual address related fields */
> > + struct {
> > + /** @va.addr: start of the virtual address range to map to */
> > + u64 addr;
> > +
> > + /** @va.size: size of the virtual address range to map to */
> > + u64 range;
> > + } va;
> > +
> > + /** @gem: GEM related fields */
> > + struct {
> > + /**
> > + * @obj: GEM object to map.
> > + *
> > + * Can be NULL if the virtual range is not backed by a GEM object.
> > + */
> > + struct drm_gem_object *obj;
> > +
> > + /** @offset: offset in the GEM */
> > + u64 offset;
> > + } gem;
> > +};
>
> Can't we just use struct drm_gpuva_op_map instead? The structure is identical
> and represents exactly what we want, a map operation.
Sure, we can do that. The only reason I didn't is because I thought it
was preferable to distinguish a map request from the sub-operation it
might be split into (the latter being described by drm_gpuva_op_map).
>
> Let's use that instead of redefining an identical structure.
>
> With that,
>
> Acked-by: Danilo Krummrich <dakr@kernel.org>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 4/7] drm/gpuvm: Add a helper to check if two VA can be merged
2025-07-22 19:17 ` Adrian Larumbe
@ 2025-08-21 11:54 ` Boris Brezillon
0 siblings, 0 replies; 36+ messages in thread
From: Boris Brezillon @ 2025-08-21 11:54 UTC (permalink / raw)
To: Adrian Larumbe
Cc: Caterina Shablia, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
Matt Coster, Karol Herbst, Lyude Paul, Danilo Krummrich,
Steven Price, Liviu Dudau, Lucas De Marchi, Thomas Hellström,
Rodrigo Vivi, dri-devel, linux-kernel, nouveau, intel-xe, asahi,
Asahi Lina
On Tue, 22 Jul 2025 20:17:14 +0100
Adrian Larumbe <adrian.larumbe@collabora.com> wrote:
> On 07.07.2025 17:04, Caterina Shablia wrote:
> > From: Boris Brezillon <boris.brezillon@collabora.com>
> >
> > We are going to add flags/properties that will impact the VA merging
> > ability. Instead of sprinkling tests all over the place in
> > __drm_gpuvm_sm_map(), let's add a helper aggregating all these checks
> > can call it for every existing VA we walk through in the
> > __drm_gpuvm_sm_map() loop.
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > Signed-off-by: Caterina Shablia <caterina.shablia@collabora.com>
> > ---
> > drivers/gpu/drm/drm_gpuvm.c | 47 +++++++++++++++++++++++++++++--------
> > 1 file changed, 37 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> > index 05978c5c38b1..dc3c2f906400 100644
> > --- a/drivers/gpu/drm/drm_gpuvm.c
> > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > @@ -2098,12 +2098,48 @@ op_unmap_cb(const struct drm_gpuvm_ops *fn, void *priv,
> > return fn->sm_step_unmap(&op, priv);
> > }
> >
> > +static bool can_merge(struct drm_gpuvm *gpuvm, const struct drm_gpuva *a,
> > + const struct drm_gpuva *b)
> > +{
> > + /* Only GEM-based mappings can be merged, and they must point to
> > + * the same GEM object.
> > + */
> > + if (a->gem.obj != b->gem.obj || !a->gem.obj)
> > + return false;
> > +
> > + /* Let's keep things simple for now and force all flags to match. */
> > + if (a->flags != b->flags)
> > + return false;
> > +
> > + /* Order VAs for the rest of the checks. */
> > + if (a->va.addr > b->va.addr)
> > + swap(a, b);
> > +
> > + /* We assume the caller already checked that VAs overlap or are
> > + * contiguous.
> > + */
> > + if (drm_WARN_ON(gpuvm->drm, b->va.addr > a->va.addr + a->va.range))
> > + return false;
> > +
> > + /* We intentionally ignore u64 underflows because all we care about
> > + * here is whether the VA diff matches the GEM offset diff.
> > + */
> > + return b->va.addr - a->va.addr == b->gem.offset - a->gem.offset;
>
> If we're reordering the VAs for the rest of the checks, when could underflow happen?
I think this comments predates the re-ordering (I originally tried not
to order VAs).
>
> > +}
> > +
> > static int
> > __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> > const struct drm_gpuvm_ops *ops, void *priv,
> > const struct drm_gpuvm_map_req *req)
> > {
> > struct drm_gpuva *va, *next;
> > + struct drm_gpuva reqva = {
> > + .va.addr = req->va.addr,
> > + .va.range = req->va.range,
> > + .gem.offset = req->gem.offset,
> > + .gem.obj = req->gem.obj,
> > + .flags = req->flags,
> > + };
> > u64 req_end = req->va.addr + req->va.range;
> > int ret;
> >
> > @@ -2116,12 +2152,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> > u64 addr = va->va.addr;
> > u64 range = va->va.range;
> > u64 end = addr + range;
> > - bool merge = !!va->gem.obj;
> > + bool merge = can_merge(gpuvm, va, &reqva);
> >
> > if (addr == req->va.addr) {
> > - merge &= obj == req->gem.obj &&
> > - offset == req->gem.offset;
> > -
> > if (end == req_end) {
> > ret = op_unmap_cb(ops, priv, va, merge);
> > if (ret)
> > @@ -2163,8 +2196,6 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> > };
> > struct drm_gpuva_op_unmap u = { .va = va };
> >
> > - merge &= obj == req->gem.obj &&
> > - offset + ls_range == req->gem.offset;
> > u.keep = merge;
> >
> > if (end == req_end) {
> > @@ -2196,10 +2227,6 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> > break;
> > }
> > } else if (addr > req->va.addr) {
> > - merge &= obj == req->gem.obj &&
> > - offset == req->gem.offset +
> > - (addr - req->va.addr);
> > -
> > if (end == req_end) {
> > ret = op_unmap_cb(ops, priv, va, merge);
> > if (ret)
> > --
> > 2.47.2
>
>
> Adrian Larumbe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 4/7] drm/gpuvm: Add a helper to check if two VA can be merged
2025-07-07 19:00 ` Danilo Krummrich
2025-07-07 19:06 ` Danilo Krummrich
@ 2025-08-21 12:06 ` Boris Brezillon
1 sibling, 0 replies; 36+ messages in thread
From: Boris Brezillon @ 2025-08-21 12:06 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Caterina Shablia, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
Matt Coster, Karol Herbst, Lyude Paul, Steven Price, Liviu Dudau,
Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, dri-devel,
linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
On Mon, 07 Jul 2025 21:00:54 +0200
"Danilo Krummrich" <dakr@kernel.org> wrote:
> On Mon Jul 7, 2025 at 7:04 PM CEST, Caterina Shablia wrote:
> > diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> > index 05978c5c38b1..dc3c2f906400 100644
> > --- a/drivers/gpu/drm/drm_gpuvm.c
> > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > @@ -2098,12 +2098,48 @@ op_unmap_cb(const struct drm_gpuvm_ops *fn, void *priv,
> > return fn->sm_step_unmap(&op, priv);
> > }
> >
> > +static bool can_merge(struct drm_gpuvm *gpuvm, const struct drm_gpuva *a,
> > + const struct drm_gpuva *b)
> > +{
> > + /* Only GEM-based mappings can be merged, and they must point to
> > + * the same GEM object.
> > + */
> > + if (a->gem.obj != b->gem.obj || !a->gem.obj)
> > + return false;
> > +
> > + /* Let's keep things simple for now and force all flags to match. */
> > + if (a->flags != b->flags)
> > + return false;
> > +
> > + /* Order VAs for the rest of the checks. */
> > + if (a->va.addr > b->va.addr)
> > + swap(a, b);
> > +
> > + /* We assume the caller already checked that VAs overlap or are
> > + * contiguous.
> > + */
> > + if (drm_WARN_ON(gpuvm->drm, b->va.addr > a->va.addr + a->va.range))
> > + return false;
> > +
> > + /* We intentionally ignore u64 underflows because all we care about
> > + * here is whether the VA diff matches the GEM offset diff.
> > + */
> > + return b->va.addr - a->va.addr == b->gem.offset - a->gem.offset;
> > +}
> > +
> > static int
> > __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> > const struct drm_gpuvm_ops *ops, void *priv,
> > const struct drm_gpuvm_map_req *req)
> > {
> > struct drm_gpuva *va, *next;
> > + struct drm_gpuva reqva = {
> > + .va.addr = req->va.addr,
> > + .va.range = req->va.range,
> > + .gem.offset = req->gem.offset,
> > + .gem.obj = req->gem.obj,
> > + .flags = req->flags,
>
> Huh? Where does req->flags come from? I don't remember that this flag exists in
> struct drm_gpuvm_map_req in the preceding patch?
Oops, I re-ordered commits, and forgot to verify that the series was
bisectable. This should be part of patch 4 actually.
>
> > + };
> > u64 req_end = req->va.addr + req->va.range;
> > int ret;
> >
> > @@ -2116,12 +2152,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> > u64 addr = va->va.addr;
> > u64 range = va->va.range;
> > u64 end = addr + range;
> > - bool merge = !!va->gem.obj;
> > + bool merge = can_merge(gpuvm, va, &reqva);
>
> I know you want to do the swap() trick above, but I don't like creating a
> temporary struct drm_gpuva with all the other uninitialized fields.
I mean, I could do it the other way around (gpuva -> op_map), but it
means doing it on each va with cross.
>
> If you really want this, can we please limit the scope? Maybe the following
> helper:
>
> static bool can_merge(struct drm_gpuvm *gpuvm,
> const struct drm_gpuva *va,
> struct drm_gpuvm_map_req *req)
> {
> struct drm_gpuva reqva = { ... };
> return __can_merge(gpuvm, va, reqva);
It's a bit of a shame though, because then this reqva is
initialized every time can_merge() is called, instead of once at the
beginning of an sm_map() operation. But maybe the compiler is smart
enough to see through it when inlining (assuming it actually inlines
the check).
> }
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 4/7] drm/gpuvm: Add a helper to check if two VA can be merged
2025-07-07 19:06 ` Danilo Krummrich
@ 2025-08-21 12:18 ` Boris Brezillon
0 siblings, 0 replies; 36+ messages in thread
From: Boris Brezillon @ 2025-08-21 12:18 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Caterina Shablia, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
Matt Coster, Karol Herbst, Lyude Paul, Steven Price, Liviu Dudau,
Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, dri-devel,
linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
On Mon, 07 Jul 2025 21:06:50 +0200
"Danilo Krummrich" <dakr@kernel.org> wrote:
> On Mon Jul 7, 2025 at 9:00 PM CEST, Danilo Krummrich wrote:
> > On Mon Jul 7, 2025 at 7:04 PM CEST, Caterina Shablia wrote:
> >> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> >> index 05978c5c38b1..dc3c2f906400 100644
> >> --- a/drivers/gpu/drm/drm_gpuvm.c
> >> +++ b/drivers/gpu/drm/drm_gpuvm.c
> >> @@ -2098,12 +2098,48 @@ op_unmap_cb(const struct drm_gpuvm_ops *fn, void *priv,
> >> return fn->sm_step_unmap(&op, priv);
> >> }
> >>
> >> +static bool can_merge(struct drm_gpuvm *gpuvm, const struct drm_gpuva *a,
> >> + const struct drm_gpuva *b)
> >> +{
> >> + /* Only GEM-based mappings can be merged, and they must point to
> >> + * the same GEM object.
> >> + */
> >> + if (a->gem.obj != b->gem.obj || !a->gem.obj)
> >> + return false;
> >> +
> >> + /* Let's keep things simple for now and force all flags to match. */
> >> + if (a->flags != b->flags)
> >> + return false;
>
> Forgot to mention, this can include driver specific flags. How do we know from
> the generic code whether this condition makes sense? *At least* it would need to
> be documented.
You're right, it should have been:
if ((a->flags & DRM_GPUVA_MERGEABLE_FLAGS_MASK) !=
(b->flags & DRM_GPUVA_MERGEABLE_FLAGS_MASK))
return false;
with DRM_GPUVA_COMMON_FLAGS_MASK set to the set of flags that matter
when merging.
>
> However, I think it would be better to provide an optional callback for drivers
> to check whether merge makes sense or not. This doesn't mean we need drivers to
> do those common checks, this can remain here in the common code.
Seems a bit premature to me. Again, if there's a need for drivers to add
extra checks we can always add a callback at this point, but until
this is the case, I'd rather stick to these common checks.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 5/7] drm/gpuvm: Add a flags field to drm_gpuvm_map_req/drm_gpuva_op_map
2025-07-22 19:21 ` Adrian Larumbe
@ 2025-08-21 12:21 ` Boris Brezillon
0 siblings, 0 replies; 36+ messages in thread
From: Boris Brezillon @ 2025-08-21 12:21 UTC (permalink / raw)
To: Adrian Larumbe
Cc: Caterina Shablia, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
Matt Coster, Karol Herbst, Lyude Paul, Danilo Krummrich,
Steven Price, Liviu Dudau, Lucas De Marchi, Thomas Hellström,
Rodrigo Vivi, dri-devel, linux-kernel, nouveau, intel-xe, asahi,
Asahi Lina
On Tue, 22 Jul 2025 20:21:25 +0100
Adrian Larumbe <adrian.larumbe@collabora.com> wrote:
> > /**
> > @@ -1074,6 +1079,9 @@ struct drm_gpuvm_map_req {
> > /** @offset: offset in the GEM */
> > u64 offset;
> > } gem;
> > +
> > + /** @flags: combination of DRM_GPUVA_ flags describing the mapping properties. */
> > + enum drm_gpuva_flags flags;
>
> Wouldn't this be better expressed as a u32 combination of enum drm_gpuva_flags flags?
> Calling it 'flags' makes me feel like any OR'd combination of enum values would be possible.
I agree, but it's how other flags field are defined in gpuvm so I went
for the same definition to keep things consistent. Luckily (or
unfortunately, depending on where you place yourself), C is pretty lax
about what can be assigned to an enum type.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 6/7] drm/gpuvm: Add DRM_GPUVA_REPEAT flag and logic
2025-07-07 19:33 ` Danilo Krummrich
@ 2025-08-21 12:29 ` Boris Brezillon
0 siblings, 0 replies; 36+ messages in thread
From: Boris Brezillon @ 2025-08-21 12:29 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Caterina Shablia, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
Matt Coster, Karol Herbst, Lyude Paul, Steven Price, Liviu Dudau,
Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, dri-devel,
linux-kernel, nouveau, intel-xe, asahi, Asahi Lina, Asahi Lina
On Mon, 07 Jul 2025 21:33:13 +0200
"Danilo Krummrich" <dakr@kernel.org> wrote:
> On Mon Jul 7, 2025 at 7:04 PM CEST, Caterina Shablia wrote:
> > From: Asahi Lina <lina+kernel@asahilina.net>
> >
> > To be able to support "fake sparse" mappings without relying on GPU page
> > fault handling, drivers may need to create large (e.g. 4GiB) mappings of
> > the same page repeatedly (or same range of pages). Doing this through
> > individual mappings would be very wasteful. This can be handled better
> > by using a flag on map creation, but to do it safely, drm_gpuvm needs to
> > be aware of this special case.
> >
> > Add a flag that signals that a given mapping is a page mapping, which is
> > repeated all over the entire requested VA range. This tweaks the
> > sm_map() logic to treat the GEM offsets differently when mappings are
> > a repeated ones so they are not incremented as they would be with regular
> > mappings.
> >
> > The size of the GEM portion to repeat is passed through
> > drm_gpuva::gem::range. Most of the time it will be a page size, but
> > it can be bigger as long as it's less that drm_gpuva::va::range, and
> > drm_gpuva::gem::range is a multiple of drm_gpuva::va::range.
>
> Should be "as long as it's less that drm_gpuva::va::range, and
> drm_gpuva::va::range is a multiple of drm_gpuva::gem::range".
>
> I also think this feature deserves its own section in the global GPUVM
> documentation -- please add a corresponding paragraph.
Sure.
>
> > +static int check_map_req(struct drm_gpuvm *gpuvm,
>
> Let's call this validate_map_request().
I can do that, sure.
>
> > + const struct drm_gpuvm_map_req *req)
> > +{
> > + if (unlikely(!drm_gpuvm_range_valid(gpuvm, req->va.addr, req->va.range)))
> > + return -EINVAL;
> > +
> > + if (req->flags & DRM_GPUVA_REPEAT) {
> > + u64 va_range = req->va.range;
> > +
> > + /* For a repeated mapping, GEM range must be > 0
> > + * and a multiple of the VA range.
> > + */
> > + if (unlikely(!req->gem.range ||
> > + va_range < req->gem.range ||
> > + do_div(va_range, req->gem.range)))
> > + return -EINVAL;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > static int
> > __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> > const struct drm_gpuvm_ops *ops, void *priv,
> > @@ -2137,6 +2179,7 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> > struct drm_gpuva reqva = {
> > .va.addr = req->va.addr,
> > .va.range = req->va.range,
> > + .gem.range = req->gem.range,
> > .gem.offset = req->gem.offset,
> > .gem.obj = req->gem.obj,
> > .flags = req->flags,
> > @@ -2144,7 +2187,8 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> > u64 req_end = req->va.addr + req->va.range;
> > int ret;
> >
> > - if (unlikely(!drm_gpuvm_range_valid(gpuvm, req->va.addr, req->va.range)))
> > + ret = check_map_req(gpuvm, req);
> > + if (unlikely(ret))
> > return -EINVAL;
> >
> > drm_gpuvm_for_each_va_range_safe(va, next, gpuvm, req->va.addr, req_end) {
> > @@ -2175,7 +2219,8 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> > .va.addr = req_end,
> > .va.range = range - req->va.range,
> > .gem.obj = obj,
> > - .gem.offset = offset + req->va.range,
> > + .gem.range = va->gem.range,
> > + .gem.offset = offset,
>
> Why change this from offset + req->va.range to just offset?
This is conditionally updated if DRM_GPUVA_REPEAT is not set further
down, because we don't want to move the GEM offset if the mapped portion
is repeated.
>
> Same for similar other changes below.
>
> Also it seems that we need to update the documentation which shows all potential
> cases when calling __drm_gpuvm_sm_map() [1].
Yep, will do.
>
> [1] https://docs.kernel.org/gpu/drm-mm.html#split-and-merge
>
> > .flags = va->flags,
> > };
> > struct drm_gpuva_op_unmap u = {
> > @@ -2183,6 +2228,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> > .keep = merge,
> > };
> >
> > + if (!(va->flags & DRM_GPUVA_REPEAT))
> > + n.gem.offset += req->va.range;
> > +
> > ret = op_remap_cb(ops, priv, NULL, &n, &u);
> > if (ret)
> > return ret;
> > @@ -2194,6 +2242,7 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> > .va.addr = addr,
> > .va.range = ls_range,
> > .gem.obj = obj,
> > + .gem.range = va->gem.range,
> > .gem.offset = offset,
> > .flags = va->flags,
> > };
> > @@ -2220,11 +2269,14 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> > .va.addr = req_end,
> > .va.range = end - req_end,
> > .gem.obj = obj,
> > - .gem.offset = offset + ls_range +
> > - req->va.range,
> > + .gem.range = va->gem.range,
> > + .gem.offset = offset,
> > .flags = va->flags,
> > };
> >
> > + if (!(va->flags & DRM_GPUVA_REPEAT))
> > + n.gem.offset += ls_range + req->va.range;
> > +
> > ret = op_remap_cb(ops, priv, &p, &n, &u);
> > if (ret)
> > return ret;
> > @@ -2250,7 +2302,8 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> > .va.addr = req_end,
> > .va.range = end - req_end,
> > .gem.obj = obj,
> > - .gem.offset = offset + req_end - addr,
> > + .gem.range = va->gem.range,
> > + .gem.offset = offset,
> > .flags = va->flags,
> > };
> > struct drm_gpuva_op_unmap u = {
> > @@ -2258,6 +2311,9 @@ __drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
> > .keep = merge,
> > };
> >
> > + if (!(va->flags & DRM_GPUVA_REPEAT))
> > + n.gem.offset += req_end - addr;
> > +
> > ret = op_remap_cb(ops, priv, NULL, &n, &u);
> > if (ret)
> > return ret;
> > @@ -2294,6 +2350,7 @@ __drm_gpuvm_sm_unmap(struct drm_gpuvm *gpuvm,
> > prev.va.addr = addr;
> > prev.va.range = req_addr - addr;
> > prev.gem.obj = obj;
> > + prev.gem.range = va->gem.range;
> > prev.gem.offset = offset;
> > prev.flags = va->flags;
> >
> > @@ -2304,7 +2361,10 @@ __drm_gpuvm_sm_unmap(struct drm_gpuvm *gpuvm,
> > next.va.addr = req_end;
> > next.va.range = end - req_end;
> > next.gem.obj = obj;
> > - next.gem.offset = offset + (req_end - addr);
> > + prev.gem.range = va->gem.range;
> > + next.gem.offset = offset;
> > + if (!(va->flags & DRM_GPUVA_REPEAT))
> > + next.gem.offset += req_end - addr;
> > next.flags = va->flags;
> >
> > next_split = true;
> > diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
> > index f77a89e791f1..629e8508f99f 100644
> > --- a/include/drm/drm_gpuvm.h
> > +++ b/include/drm/drm_gpuvm.h
> > @@ -56,10 +56,19 @@ enum drm_gpuva_flags {
> > */
> > DRM_GPUVA_SPARSE = (1 << 1),
> >
> > + /**
> > + * @DRM_GPUVA_REPEAT:
> > + *
> > + * Flag indicating that the &drm_gpuva is a mapping of a GEM
> > + * portion repeated multiple times to fill the virtual address
>
> "of a GEM object with a certain range that is repeated multiple times to ..."
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 1/7] drm/panthor: Add support for atomic page table updates
2025-08-21 11:51 ` Boris Brezillon
@ 2025-08-21 15:02 ` Steven Price
2025-08-21 15:15 ` Boris Brezillon
0 siblings, 1 reply; 36+ messages in thread
From: Steven Price @ 2025-08-21 15:02 UTC (permalink / raw)
To: Boris Brezillon
Cc: Caterina Shablia, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
Matt Coster, Karol Herbst, Lyude Paul, Danilo Krummrich,
Liviu Dudau, Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
On 21/08/2025 12:51, Boris Brezillon wrote:
> On Wed, 16 Jul 2025 16:43:24 +0100
> Steven Price <steven.price@arm.com> wrote:
[...]
>> Although in general I'm a bit wary of relying on the whole lock region
>> feature - previous GPUs have an errata. But maybe I'm being over
>> cautious there.
>
> We're heavily relying on it already to allow updates of the VM while
> the GPU is executing stuff. If that's problematic on v10+, I'd rather
> know early :D.
I think I'm just scarred by my experiences over a decade ago... ;)
I'm not aware of any issues with the modern[1] GPUs. The issue used to
be that the lock region could get accidentally unlocked by a cache flush
from another source - specifically the cache flush on job start flag.
It's also not a major issue if you keep the page tables consistent, the
lock region in theory allows a region to be in an inconsistent state -
but generally there's no need for that. AFAIK we mostly keep the tables
consistent anyway.
Thanks,
Steve
[1] Which in this context means well over a decade - it's somewhat scary
how long I've been doing this!
> Regards,
>
> Boris
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 1/7] drm/panthor: Add support for atomic page table updates
2025-08-21 15:02 ` Steven Price
@ 2025-08-21 15:15 ` Boris Brezillon
0 siblings, 0 replies; 36+ messages in thread
From: Boris Brezillon @ 2025-08-21 15:15 UTC (permalink / raw)
To: Steven Price
Cc: Caterina Shablia, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
Matt Coster, Karol Herbst, Lyude Paul, Danilo Krummrich,
Liviu Dudau, Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
dri-devel, linux-kernel, nouveau, intel-xe, asahi, Asahi Lina
On Thu, 21 Aug 2025 16:02:09 +0100
Steven Price <steven.price@arm.com> wrote:
> On 21/08/2025 12:51, Boris Brezillon wrote:
> > On Wed, 16 Jul 2025 16:43:24 +0100
> > Steven Price <steven.price@arm.com> wrote:
> [...]
> >> Although in general I'm a bit wary of relying on the whole lock region
> >> feature - previous GPUs have an errata. But maybe I'm being over
> >> cautious there.
> >
> > We're heavily relying on it already to allow updates of the VM while
> > the GPU is executing stuff. If that's problematic on v10+, I'd rather
> > know early :D.
>
> I think I'm just scarred by my experiences over a decade ago... ;)
>
> I'm not aware of any issues with the modern[1] GPUs. The issue used to
> be that the lock region could get accidentally unlocked by a cache flush
> from another source - specifically the cache flush on job start flag.
>
> It's also not a major issue if you keep the page tables consistent, the
> lock region in theory allows a region to be in an inconsistent state -
> but generally there's no need for that. AFAIK we mostly keep the tables
> consistent anyway.
Right, it's not a problem until we introduce sparse binding support, at
which point atomicity becomes important, and given remapping is not a
thing the io-pagetable layer provides (remap has to be unmap+map), I
need to rely on region locking to make it work, or we'll have to eat the
fault-but-not-really-because-its-being-remapped overhead/complexity.
Honestly, I'd rather rely on region locking if it's working, because
it's far simpler ;-).
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2025-08-21 15:15 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-07 17:04 [PATCH v4 0/7] drm/panthor: support repeated mappings Caterina Shablia
2025-07-07 17:04 ` [PATCH v4 1/7] drm/panthor: Add support for atomic page table updates Caterina Shablia
2025-07-11 13:30 ` Steven Price
2025-07-15 15:08 ` Caterina Shablia
2025-07-15 15:33 ` Caterina Shablia
2025-07-16 15:43 ` Steven Price
2025-08-21 11:51 ` Boris Brezillon
2025-08-21 15:02 ` Steven Price
2025-08-21 15:15 ` Boris Brezillon
2025-07-15 16:09 ` Caterina Shablia
2025-07-16 15:53 ` Steven Price
2025-08-21 11:36 ` Boris Brezillon
2025-07-07 17:04 ` [PATCH v4 2/7] drm/gpuvm: Kill drm_gpuva_init() Caterina Shablia
2025-07-07 18:41 ` Danilo Krummrich
2025-07-11 13:30 ` Steven Price
2025-07-07 17:04 ` [PATCH v4 3/7] drm/gpuvm: Pass map arguments through a struct Caterina Shablia
2025-07-07 18:44 ` Danilo Krummrich
2025-08-21 11:53 ` Boris Brezillon
2025-07-07 17:04 ` [PATCH v4 4/7] drm/gpuvm: Add a helper to check if two VA can be merged Caterina Shablia
2025-07-07 19:00 ` Danilo Krummrich
2025-07-07 19:06 ` Danilo Krummrich
2025-08-21 12:18 ` Boris Brezillon
2025-08-21 12:06 ` Boris Brezillon
2025-07-22 19:17 ` Adrian Larumbe
2025-08-21 11:54 ` Boris Brezillon
2025-07-07 17:04 ` [PATCH v4 5/7] drm/gpuvm: Add a flags field to drm_gpuvm_map_req/drm_gpuva_op_map Caterina Shablia
2025-07-07 19:03 ` Danilo Krummrich
2025-07-22 19:21 ` Adrian Larumbe
2025-08-21 12:21 ` Boris Brezillon
2025-07-07 17:04 ` [PATCH v4 6/7] drm/gpuvm: Add DRM_GPUVA_REPEAT flag and logic Caterina Shablia
2025-07-07 19:33 ` Danilo Krummrich
2025-08-21 12:29 ` Boris Brezillon
2025-07-07 17:04 ` [PATCH v4 7/7] drm/panthor: Add support for repeated mappings Caterina Shablia
2025-07-11 14:03 ` Steven Price
2025-07-15 15:17 ` Caterina Shablia
2025-07-16 15:59 ` Steven Price
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).