* [PATCH v4 00/14] drm/panfrost, panthor: Cached maps and explicit flushing
@ 2025-10-15 16:03 Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 01/14] drm/panthor: Fix panthor_gpu_coherency_set() Boris Brezillon
` (13 more replies)
0 siblings, 14 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-10-15 16:03 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Boris Brezillon,
kernel
This series implements cached maps and explicit flushing for both panfrost
and panthor. To avoid code/bug duplication, the tricky guts of the cache
flushing ioctl which walk the sg list are broken into a new common shmem
helper which can be used by any driver.
The PanVK MR to use this lives here:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385
Changes in v2:
- Expose the coherency so userspace can know when it should skip cache
maintenance
- Hook things up at drm_gem_object_funcs level to dma-buf cpu_prep hooks
can be implemented generically
- Revisit the semantics of the flags passed to gem_sync()
- Add BO_QUERY_INFO ioctls to query BO flags on imported objects and
let the UMD know when cache maintenance is needed on those
Changes in v3:
- New patch to fix panthor_gpu_coherency_set()
- No other major changes, check each patch changelog for more details
Changes in v4:
- Two trivial fixes, check each patch changelog for more details
Boris Brezillon (7):
drm/panthor: Fix panthor_gpu_coherency_set()
drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync()
helper
drm/prime: Provide default ::{begin,end}_cpu_access() implementations
drm/panthor: Expose the selected coherency protocol to the UMD
drm/panthor: Add an ioctl to query BO flags
drm/panfrost: Expose the selected coherency protocol to the UMD
drm/panfrost: Add an ioctl to query BO flags
Faith Ekstrand (6):
drm/shmem: Add a drm_gem_shmem_sync() helper
drm/panthor: Add a PANTHOR_BO_SYNC ioctl
drm/panthor: Bump the driver version to 1.6
drm/panfrost: Add a PANFROST_SYNC_BO ioctl
drm/panfrost: Add flag to map GEM object Write-Back Cacheable
drm/panfrost: Bump the driver version to 1.6
Loïc Molinari (1):
drm/panthor: Add flag to map GEM object Write-Back Cacheable
drivers/gpu/drm/drm_gem.c | 10 ++
drivers/gpu/drm/drm_gem_shmem_helper.c | 89 +++++++++++
drivers/gpu/drm/drm_prime.c | 42 +++++
drivers/gpu/drm/panfrost/panfrost_device.h | 1 +
drivers/gpu/drm/panfrost/panfrost_drv.c | 106 ++++++++++++-
drivers/gpu/drm/panfrost/panfrost_gem.c | 23 +++
drivers/gpu/drm/panfrost/panfrost_gem.h | 2 +
drivers/gpu/drm/panfrost/panfrost_gpu.c | 26 +++-
drivers/gpu/drm/panfrost/panfrost_regs.h | 10 +-
drivers/gpu/drm/panthor/panthor_device.c | 10 +-
drivers/gpu/drm/panthor/panthor_drv.c | 84 +++++++++-
drivers/gpu/drm/panthor/panthor_gem.c | 24 +++
drivers/gpu/drm/panthor/panthor_gem.h | 3 +
drivers/gpu/drm/panthor/panthor_gpu.c | 2 +-
drivers/gpu/drm/panthor/panthor_sched.c | 18 ++-
include/drm/drm_gem.h | 45 ++++++
include/drm/drm_gem_shmem_helper.h | 11 ++
include/drm/drm_prime.h | 5 +
include/uapi/drm/panfrost_drm.h | 74 +++++++++
include/uapi/drm/panthor_drm.h | 171 ++++++++++++++++++++-
20 files changed, 738 insertions(+), 18 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v4 01/14] drm/panthor: Fix panthor_gpu_coherency_set()
2025-10-15 16:03 [PATCH v4 00/14] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
@ 2025-10-15 16:03 ` Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper Boris Brezillon
` (12 subsequent siblings)
13 siblings, 0 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-10-15 16:03 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Boris Brezillon,
kernel, Akash Goel
GPU_COHERENCY_PROTOCOL takes one of GPU_COHERENCY_xx
not BIT(GPU_COHERENCY_xx).
v3:
- New commit
v4:
- Add Steve's R-b
Cc: Akash Goel <akash.goel@arm.com>
Fixes: dd7db8d911a1 ("drm/panthor: Explicitly set the coherency mode")
Reported-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panthor/panthor_gpu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
index 9d98720ce03f..7f9a28e90409 100644
--- a/drivers/gpu/drm/panthor/panthor_gpu.c
+++ b/drivers/gpu/drm/panthor/panthor_gpu.c
@@ -49,7 +49,7 @@ struct panthor_gpu {
static void panthor_gpu_coherency_set(struct panthor_device *ptdev)
{
gpu_write(ptdev, GPU_COHERENCY_PROTOCOL,
- ptdev->coherent ? GPU_COHERENCY_PROT_BIT(ACE_LITE) : GPU_COHERENCY_NONE);
+ ptdev->coherent ? GPU_COHERENCY_ACE_LITE : GPU_COHERENCY_NONE);
}
static void panthor_gpu_l2_config_set(struct panthor_device *ptdev)
--
2.51.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-15 16:03 [PATCH v4 00/14] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 01/14] drm/panthor: Fix panthor_gpu_coherency_set() Boris Brezillon
@ 2025-10-15 16:03 ` Boris Brezillon
2025-10-16 8:13 ` Maxime Ripard
` (2 more replies)
2025-10-15 16:03 ` [PATCH v4 03/14] drm/prime: Provide default ::{begin, end}_cpu_access() implementations Boris Brezillon
` (11 subsequent siblings)
13 siblings, 3 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-10-15 16:03 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Boris Brezillon,
kernel
Prepare things for standardizing synchronization around CPU accesses
of GEM buffers. This will be used to provide default
drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and provide
a way for drivers to add their own ioctls to synchronize CPU
writes/reads when they can't do it directly from userland.
v2:
- New commit
v3:
- No changes
v4:
- Add Steve's R-b
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/drm_gem.c | 10 +++++++++
include/drm/drm_gem.h | 45 +++++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index a1a9c828938b..a1431e4f2404 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
}
EXPORT_SYMBOL(drm_gem_vunmap);
+int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
+ enum drm_gem_object_access_flags access)
+{
+ if (obj->funcs->sync)
+ return obj->funcs->sync(obj, offset, size, access);
+
+ return 0;
+}
+EXPORT_SYMBOL(drm_gem_sync);
+
/**
* drm_gem_lock_reservations - Sets up the ww context and acquires
* the lock on an array of GEM objects.
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 8d48d2af2649..1c33e59ab305 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -66,6 +66,33 @@ enum drm_gem_object_status {
DRM_GEM_OBJECT_ACTIVE = BIT(2),
};
+/**
+ * enum drm_gem_object_status - bitmask describing GEM access types to prepare for
+ */
+enum drm_gem_object_access_flags {
+ /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. */
+ DRM_GEM_OBJECT_CPU_ACCESS = 0,
+
+ /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device access. */
+ DRM_GEM_OBJECT_DEV_ACCESS = BIT(0),
+
+ /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the entity doing the access. */
+ DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0),
+
+ /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only accesses. */
+ DRM_GEM_OBJECT_READ_ACCESS = BIT(1),
+
+ /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only accesses. */
+ DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2),
+
+ /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write accesses. */
+ DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS |
+ DRM_GEM_OBJECT_WRITE_ACCESS,
+
+ /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check the access type. */
+ DRM_GEM_OBJECT_ACCESS_TYPE_MASK = DRM_GEM_OBJECT_RW_ACCESS,
+};
+
/**
* struct drm_gem_object_funcs - GEM object functions
*/
@@ -191,6 +218,21 @@ struct drm_gem_object_funcs {
*/
int (*mmap)(struct drm_gem_object *obj, struct vm_area_struct *vma);
+ /**
+ * @sync:
+ *
+ * Prepare for CPU/device access. This can involve migration of
+ * a buffer to the system-RAM/VRAM, or for UMA, flushing/invalidating
+ * the CPU caches. The range can be used to optimize the synchronization
+ * when possible.
+ *
+ * Returns 0 on success, -errno otherwise.
+ *
+ * This callback is optional.
+ */
+ int (*sync)(struct drm_gem_object *obj, size_t offset, size_t size,
+ enum drm_gem_object_access_flags access);
+
/**
* @evict:
*
@@ -559,6 +601,9 @@ void drm_gem_unlock(struct drm_gem_object *obj);
int drm_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map);
void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map);
+int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
+ enum drm_gem_object_access_flags access);
+
int drm_gem_objects_lookup(struct drm_file *filp, void __user *bo_handles,
int count, struct drm_gem_object ***objs_out);
struct drm_gem_object *drm_gem_object_lookup(struct drm_file *filp, u32 handle);
--
2.51.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 03/14] drm/prime: Provide default ::{begin, end}_cpu_access() implementations
2025-10-15 16:03 [PATCH v4 00/14] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 01/14] drm/panthor: Fix panthor_gpu_coherency_set() Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper Boris Brezillon
@ 2025-10-15 16:03 ` Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 04/14] drm/shmem: Add a drm_gem_shmem_sync() helper Boris Brezillon
` (10 subsequent siblings)
13 siblings, 0 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-10-15 16:03 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Boris Brezillon,
kernel
Hook-up drm_gem_dmabuf_{begin,end}_cpu_access() to drm_gem_sync() so
that drivers relying on the default prime_dmabuf_ops can still have
a way to prepare for CPU accesses from outside the UMD.
v2:
- New commit
v3:
- Don't return an error on NOP syncs, and document that case in a
comment
v4:
- Add Steve's R-b
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/drm_prime.c | 42 +++++++++++++++++++++++++++++++++++++
include/drm/drm_prime.h | 5 +++++
2 files changed, 47 insertions(+)
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 43a10b4af43a..30d495c70afb 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -823,6 +823,46 @@ int drm_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma)
}
EXPORT_SYMBOL(drm_gem_dmabuf_mmap);
+int drm_gem_dmabuf_begin_cpu_access(struct dma_buf *dma_buf,
+ enum dma_data_direction direction)
+{
+ struct drm_gem_object *obj = dma_buf->priv;
+ enum drm_gem_object_access_flags access = DRM_GEM_OBJECT_CPU_ACCESS;
+
+ /* begin_cpu_access(DMA_TO_DEVICE) is a NOP, the sync will happen
+ * in the end_cpu_access() path.
+ */
+ if (direction == DMA_FROM_DEVICE)
+ access |= DRM_GEM_OBJECT_READ_ACCESS;
+ else if (direction == DMA_BIDIRECTIONAL)
+ access |= DRM_GEM_OBJECT_RW_ACCESS;
+ else
+ return 0;
+
+ return drm_gem_sync(obj, 0, obj->size, access);
+}
+EXPORT_SYMBOL(drm_gem_dmabuf_begin_cpu_access);
+
+int drm_gem_dmabuf_end_cpu_access(struct dma_buf *dma_buf,
+ enum dma_data_direction direction)
+{
+ struct drm_gem_object *obj = dma_buf->priv;
+ enum drm_gem_object_access_flags access = DRM_GEM_OBJECT_DEV_ACCESS;
+
+ /* end_cpu_access(DMA_FROM_DEVICE) is a NOP, the sync should have
+ * happened in the begin_cpu_access() path already.
+ */
+ if (direction == DMA_TO_DEVICE)
+ access |= DRM_GEM_OBJECT_READ_ACCESS;
+ else if (direction == DMA_BIDIRECTIONAL)
+ access |= DRM_GEM_OBJECT_RW_ACCESS;
+ else
+ return 0;
+
+ return drm_gem_sync(obj, 0, obj->size, access);
+}
+EXPORT_SYMBOL(drm_gem_dmabuf_end_cpu_access);
+
static const struct dma_buf_ops drm_gem_prime_dmabuf_ops = {
.attach = drm_gem_map_attach,
.detach = drm_gem_map_detach,
@@ -832,6 +872,8 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops = {
.mmap = drm_gem_dmabuf_mmap,
.vmap = drm_gem_dmabuf_vmap,
.vunmap = drm_gem_dmabuf_vunmap,
+ .begin_cpu_access = drm_gem_dmabuf_begin_cpu_access,
+ .end_cpu_access = drm_gem_dmabuf_end_cpu_access,
};
/**
diff --git a/include/drm/drm_prime.h b/include/drm/drm_prime.h
index f50f862f0d8b..052fba039bb6 100644
--- a/include/drm/drm_prime.h
+++ b/include/drm/drm_prime.h
@@ -92,6 +92,11 @@ void drm_gem_dmabuf_vunmap(struct dma_buf *dma_buf, struct iosys_map *map);
int drm_gem_prime_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma);
int drm_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma);
+int drm_gem_dmabuf_begin_cpu_access(struct dma_buf *dma_buf,
+ enum dma_data_direction direction);
+int drm_gem_dmabuf_end_cpu_access(struct dma_buf *dma_buf,
+ enum dma_data_direction direction);
+
struct sg_table *drm_prime_pages_to_sg(struct drm_device *dev,
struct page **pages, unsigned int nr_pages);
struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
--
2.51.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 04/14] drm/shmem: Add a drm_gem_shmem_sync() helper
2025-10-15 16:03 [PATCH v4 00/14] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (2 preceding siblings ...)
2025-10-15 16:03 ` [PATCH v4 03/14] drm/prime: Provide default ::{begin, end}_cpu_access() implementations Boris Brezillon
@ 2025-10-15 16:03 ` Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 05/14] drm/panthor: Expose the selected coherency protocol to the UMD Boris Brezillon
` (9 subsequent siblings)
13 siblings, 0 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-10-15 16:03 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, kernel,
Boris Brezillon
From: Faith Ekstrand <faith.ekstrand@collabora.com>
This enables syncing mapped GEM objects between the CPU and GPU via calls
to dma_sync_*(). It's a bit annoying as it requires walking the sg_table
so it's best if every driver doesn't hand-roll it.
When we're dealing with a dma-buf, the synchronization requests are
forwarded to the exporter through dma_buf_{begin,end}_cpu_access().
We provide a drm_gem_shmem_object_sync() for drivers that wants to
have this default implementation as their drm_gem_object_funcs::sync,
but we don't set drm_gem_shmem_funcs::sync to
drm_gem_shmem_object_sync() to avoid changing the behavior of drivers
currently relying on the default gem_shmem vtable.
v2:
- s/drm_gem_shmem_sync_mmap/drm_gem_shmem_sync/
- Change the prototype to match drm_gem_object_funcs::sync()
- Add a wrapper for drm_gem_object_funcs::sync()
v3:
- No changes
v4:
- Add Steve's R-b
Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/drm_gem_shmem_helper.c | 89 ++++++++++++++++++++++++++
include/drm/drm_gem_shmem_helper.h | 11 ++++
2 files changed, 100 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index dc94a27710e5..4094bd243cc8 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -690,6 +690,95 @@ int drm_gem_shmem_mmap(struct drm_gem_shmem_object *shmem, struct vm_area_struct
}
EXPORT_SYMBOL_GPL(drm_gem_shmem_mmap);
+/**
+ * drm_gem_shmem_sync - Sync CPU-mapped data to/from the device
+ * @shmem: shmem GEM object
+ * @offset: Offset into the GEM object
+ * @size: Size of the area to sync
+ * @access: Flags describing the access to sync for
+ *
+ * Returns:
+ * 0 on success or a negative error code on failure.
+ */
+int drm_gem_shmem_sync(struct drm_gem_shmem_object *shmem, size_t offset,
+ size_t size, enum drm_gem_object_access_flags access)
+{
+ bool for_dev = (access & DRM_GEM_OBJECT_ACCESSOR_MASK) == DRM_GEM_OBJECT_DEV_ACCESS;
+ u32 access_type = access & DRM_GEM_OBJECT_ACCESS_TYPE_MASK;
+ struct drm_device *dev = shmem->base.dev;
+ enum dma_data_direction dir;
+ struct sg_table *sgt;
+ struct scatterlist *sgl;
+ unsigned int count;
+
+ if (access_type == DRM_GEM_OBJECT_RW_ACCESS)
+ dir = DMA_BIDIRECTIONAL;
+ else if (access_type == DRM_GEM_OBJECT_READ_ACCESS)
+ dir = for_dev ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
+ else if (access_type == DRM_GEM_OBJECT_WRITE_ACCESS)
+ dir = for_dev ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
+ else
+ return 0;
+
+ /* Don't bother if it's WC-mapped */
+ if (shmem->map_wc)
+ return 0;
+
+ if (size == 0)
+ return 0;
+
+ if (offset + size < offset || offset + size > shmem->base.size)
+ return -EINVAL;
+
+ if (drm_gem_is_imported(&shmem->base)) {
+ /* We can't do fine-grained syncs with dma_buf and there's no
+ * easy way to guarantee that CPU caches/memory won't get
+ * impacted by the buffer-wide synchronization, so let's fail
+ * instead of pretending we can cope with that.
+ */
+ if (offset != 0 || size != shmem->base.size)
+ return -EINVAL;
+
+ struct dma_buf *dma_buf = shmem->base.import_attach->dmabuf;
+
+ if (for_dev)
+ return dma_buf_end_cpu_access(dma_buf, dir);
+ else
+ return dma_buf_begin_cpu_access(dma_buf, dir);
+ }
+
+ sgt = drm_gem_shmem_get_pages_sgt(shmem);
+ if (IS_ERR(sgt))
+ return PTR_ERR(sgt);
+
+ for_each_sgtable_dma_sg(sgt, sgl, count) {
+ if (size == 0)
+ break;
+
+ dma_addr_t paddr = sg_dma_address(sgl);
+ size_t len = sg_dma_len(sgl);
+
+ if (len <= offset) {
+ offset -= len;
+ continue;
+ }
+
+ paddr += offset;
+ len -= offset;
+ len = min_t(size_t, len, size);
+ size -= len;
+ offset = 0;
+
+ if (for_dev)
+ dma_sync_single_for_device(dev->dev, paddr, len, dir);
+ else
+ dma_sync_single_for_cpu(dev->dev, paddr, len, dir);
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_sync);
+
/**
* drm_gem_shmem_print_info() - Print &drm_gem_shmem_object info for debugfs
* @shmem: shmem GEM object
diff --git a/include/drm/drm_gem_shmem_helper.h b/include/drm/drm_gem_shmem_helper.h
index 589f7bfe7506..6363e4ac9163 100644
--- a/include/drm/drm_gem_shmem_helper.h
+++ b/include/drm/drm_gem_shmem_helper.h
@@ -123,6 +123,8 @@ int drm_gem_shmem_vmap_locked(struct drm_gem_shmem_object *shmem,
void drm_gem_shmem_vunmap_locked(struct drm_gem_shmem_object *shmem,
struct iosys_map *map);
int drm_gem_shmem_mmap(struct drm_gem_shmem_object *shmem, struct vm_area_struct *vma);
+int drm_gem_shmem_sync(struct drm_gem_shmem_object *shmem, size_t offset,
+ size_t size, enum drm_gem_object_access_flags access);
int drm_gem_shmem_pin_locked(struct drm_gem_shmem_object *shmem);
void drm_gem_shmem_unpin_locked(struct drm_gem_shmem_object *shmem);
@@ -279,6 +281,15 @@ static inline int drm_gem_shmem_object_mmap(struct drm_gem_object *obj, struct v
return drm_gem_shmem_mmap(shmem, vma);
}
+static inline int
+drm_gem_shmem_object_sync(struct drm_gem_object *obj, size_t offset,
+ size_t size, enum drm_gem_object_access_flags access)
+{
+ struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
+
+ return drm_gem_shmem_sync(shmem, offset, size, access);
+}
+
/*
* Driver ops
*/
--
2.51.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 05/14] drm/panthor: Expose the selected coherency protocol to the UMD
2025-10-15 16:03 [PATCH v4 00/14] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (3 preceding siblings ...)
2025-10-15 16:03 ` [PATCH v4 04/14] drm/shmem: Add a drm_gem_shmem_sync() helper Boris Brezillon
@ 2025-10-15 16:03 ` Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 06/14] drm/panthor: Add a PANTHOR_BO_SYNC ioctl Boris Brezillon
` (8 subsequent siblings)
13 siblings, 0 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-10-15 16:03 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Boris Brezillon,
kernel
If we want to be able to skip CPU cache maintenance operations on
CPU-cached mappings, the UMD needs to know the kind of coherency
in place. Add a field to drm_panthor_gpu_info to do that. We can re-use
a padding field for that since this object is write-only from the
KMD perspective, and the UMD should just ignore it.
v2:
- New commit
v3:
- Make coherency protocol a real enum, not a bitmask
- Add BUILD_BUG_ON()s to make sure the values in panthor_regs.h and
those exposed through the uAPI match
v4:
- Add Steve's R-b
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panthor/panthor_device.c | 10 +++++-
drivers/gpu/drm/panthor/panthor_gpu.c | 2 +-
include/uapi/drm/panthor_drm.h | 39 ++++++++++++++++++++++--
3 files changed, 46 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
index c7033d82cef5..492585014e9f 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -25,6 +25,12 @@
static int panthor_gpu_coherency_init(struct panthor_device *ptdev)
{
+ BUILD_BUG_ON(GPU_COHERENCY_NONE != DRM_PANTHOR_GPU_COHERENCY_NONE);
+ BUILD_BUG_ON(GPU_COHERENCY_ACE_LITE != DRM_PANTHOR_GPU_COHERENCY_ACE_LITE);
+ BUILD_BUG_ON(GPU_COHERENCY_ACE != DRM_PANTHOR_GPU_COHERENCY_ACE);
+
+ /* Start with no coherency, and update it if the device is flagged coherent. */
+ ptdev->gpu_info.selected_coherency = GPU_COHERENCY_NONE;
ptdev->coherent = device_get_dma_attr(ptdev->base.dev) == DEV_DMA_COHERENT;
if (!ptdev->coherent)
@@ -34,8 +40,10 @@ static int panthor_gpu_coherency_init(struct panthor_device *ptdev)
* ACE protocol has never been supported for command stream frontend GPUs.
*/
if ((gpu_read(ptdev, GPU_COHERENCY_FEATURES) &
- GPU_COHERENCY_PROT_BIT(ACE_LITE)))
+ GPU_COHERENCY_PROT_BIT(ACE_LITE))) {
+ ptdev->gpu_info.selected_coherency = GPU_COHERENCY_ACE_LITE;
return 0;
+ }
drm_err(&ptdev->base, "Coherency not supported by the device");
return -ENOTSUPP;
diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
index 7f9a28e90409..a95c0b94ef58 100644
--- a/drivers/gpu/drm/panthor/panthor_gpu.c
+++ b/drivers/gpu/drm/panthor/panthor_gpu.c
@@ -49,7 +49,7 @@ struct panthor_gpu {
static void panthor_gpu_coherency_set(struct panthor_device *ptdev)
{
gpu_write(ptdev, GPU_COHERENCY_PROTOCOL,
- ptdev->coherent ? GPU_COHERENCY_ACE_LITE : GPU_COHERENCY_NONE);
+ ptdev->gpu_info.selected_coherency);
}
static void panthor_gpu_l2_config_set(struct panthor_device *ptdev)
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index 467d365ed7ba..f0f637e0631d 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -245,6 +245,26 @@ enum drm_panthor_dev_query_type {
DRM_PANTHOR_DEV_QUERY_GROUP_PRIORITIES_INFO,
};
+/**
+ * enum drm_panthor_gpu_coherency: Type of GPU coherency
+ */
+enum drm_panthor_gpu_coherency {
+ /**
+ * @DRM_PANTHOR_GPU_COHERENCY_ACE_LITE: ACE Lite coherency.
+ */
+ DRM_PANTHOR_GPU_COHERENCY_ACE_LITE = 0,
+
+ /**
+ * @DRM_PANTHOR_GPU_COHERENCY_ACE_LITE: ACE coherency.
+ */
+ DRM_PANTHOR_GPU_COHERENCY_ACE = 1,
+
+ /**
+ * @DRM_PANTHOR_GPU_COHERENCY_NONE: No coherency.
+ */
+ DRM_PANTHOR_GPU_COHERENCY_NONE = 31,
+};
+
/**
* struct drm_panthor_gpu_info - GPU information
*
@@ -301,7 +321,16 @@ struct drm_panthor_gpu_info {
*/
__u32 thread_max_barrier_size;
- /** @coherency_features: Coherency features. */
+ /**
+ * @coherency_features: Coherency features.
+ *
+ * Combination of drm_panthor_gpu_coherency flags.
+ *
+ * Note that this is just what the coherency protocols supported by the
+ * GPU, but the actual coherency in place depends on the SoC
+ * integration and is reflected by
+ * drm_panthor_gpu_info::selected_coherency.
+ */
__u32 coherency_features;
/** @texture_features: Texture features. */
@@ -310,8 +339,12 @@ struct drm_panthor_gpu_info {
/** @as_present: Bitmask encoding the number of address-space exposed by the MMU. */
__u32 as_present;
- /** @pad0: MBZ. */
- __u32 pad0;
+ /**
+ * @select_coherency: Coherency selected for this device.
+ *
+ * One of drm_panthor_gpu_coherency.
+ */
+ __u32 selected_coherency;
/** @shader_present: Bitmask encoding the shader cores exposed by the GPU. */
__u64 shader_present;
--
2.51.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 06/14] drm/panthor: Add a PANTHOR_BO_SYNC ioctl
2025-10-15 16:03 [PATCH v4 00/14] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (4 preceding siblings ...)
2025-10-15 16:03 ` [PATCH v4 05/14] drm/panthor: Expose the selected coherency protocol to the UMD Boris Brezillon
@ 2025-10-15 16:03 ` Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 07/14] drm/panthor: Add an ioctl to query BO flags Boris Brezillon
` (7 subsequent siblings)
13 siblings, 0 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-10-15 16:03 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, kernel,
Boris Brezillon
From: Faith Ekstrand <faith.ekstrand@collabora.com>
This will be used by the UMD to synchronize CPU-cached mappings when
the UMD can't do it directly (no usermode cache maintenance instruction
on Arm32).
v2:
- Change the flags so they better match the drm_gem_shmem_sync()
semantics
v3:
- Add Steve's R-b
Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panthor/panthor_drv.c | 47 ++++++++++++++++++-
drivers/gpu/drm/panthor/panthor_gem.c | 21 +++++++++
drivers/gpu/drm/panthor/panthor_gem.h | 3 ++
include/uapi/drm/panthor_drm.h | 66 +++++++++++++++++++++++++++
4 files changed, 136 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index fb4b293f17f0..857954d2ac7b 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -175,7 +175,8 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
PANTHOR_UOBJ_DECL(struct drm_panthor_sync_op, timeline_value), \
PANTHOR_UOBJ_DECL(struct drm_panthor_queue_submit, syncs), \
PANTHOR_UOBJ_DECL(struct drm_panthor_queue_create, ringbuf_size), \
- PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs))
+ PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs), \
+ PANTHOR_UOBJ_DECL(struct drm_panthor_bo_sync_op, size))
/**
* PANTHOR_UOBJ_SET() - Copy a kernel object to a user object.
@@ -1394,6 +1395,49 @@ static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev,
return 0;
}
+#define PANTHOR_BO_SYNC_OP_FLAGS \
+ (DRM_PANTHOR_BO_SYNC_FOR_DEV | DRM_PANTHOR_BO_SYNC_FOR_READ | \
+ DRM_PANTHOR_BO_SYNC_FOR_WRITE)
+
+static int panthor_ioctl_bo_sync(struct drm_device *ddev, void *data,
+ struct drm_file *file)
+{
+ struct drm_panthor_bo_sync *args = data;
+ struct drm_panthor_bo_sync_op *ops;
+ struct drm_gem_object *obj;
+ int ret = 0;
+
+ ret = PANTHOR_UOBJ_GET_ARRAY(ops, &args->ops);
+ if (ret)
+ return ret;
+
+ for (u32 i = 0; i < args->ops.count; i++) {
+ if (ops[i].flags & ~PANTHOR_BO_SYNC_OP_FLAGS) {
+ ret = -EINVAL;
+ goto err_ops;
+ }
+
+ obj = drm_gem_object_lookup(file, ops[i].handle);
+ if (!obj) {
+ ret = -ENOENT;
+ goto err_ops;
+ }
+
+ ret = panthor_gem_bo_sync(obj, ops[i].flags,
+ ops[i].offset, ops[i].size);
+
+ drm_gem_object_put(obj);
+
+ if (ret)
+ goto err_ops;
+ }
+
+err_ops:
+ kvfree(ops);
+
+ return ret;
+}
+
static int
panthor_open(struct drm_device *ddev, struct drm_file *file)
{
@@ -1468,6 +1512,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
PANTHOR_IOCTL(GROUP_SUBMIT, group_submit, DRM_RENDER_ALLOW),
PANTHOR_IOCTL(BO_SET_LABEL, bo_set_label, DRM_RENDER_ALLOW),
PANTHOR_IOCTL(SET_USER_MMIO_OFFSET, set_user_mmio_offset, DRM_RENDER_ALLOW),
+ PANTHOR_IOCTL(BO_SYNC, bo_sync, DRM_RENDER_ALLOW),
};
static int panthor_mmap(struct file *filp, struct vm_area_struct *vma)
diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
index 156c7a0b62a2..617e04134d30 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -338,6 +338,27 @@ panthor_gem_kernel_bo_set_label(struct panthor_kernel_bo *bo, const char *label)
panthor_gem_bo_set_label(bo->obj, str);
}
+int
+panthor_gem_bo_sync(struct drm_gem_object *obj, u32 flags,
+ u64 offset, u64 size)
+{
+ enum drm_gem_object_access_flags gem_access_flags = 0;
+ struct panthor_gem_object *bo = to_panthor_bo(obj);
+
+ if (flags & DRM_PANTHOR_BO_SYNC_FOR_DEV)
+ gem_access_flags |= DRM_GEM_OBJECT_DEV_ACCESS;
+ else
+ gem_access_flags |= DRM_GEM_OBJECT_CPU_ACCESS;
+
+ if (flags & DRM_PANTHOR_BO_SYNC_FOR_READ)
+ gem_access_flags |= DRM_GEM_OBJECT_READ_ACCESS;
+
+ if (flags & DRM_PANTHOR_BO_SYNC_FOR_WRITE)
+ gem_access_flags |= DRM_GEM_OBJECT_WRITE_ACCESS;
+
+ return drm_gem_shmem_sync(&bo->base, offset, size, gem_access_flags);
+}
+
#ifdef CONFIG_DEBUG_FS
struct gem_size_totals {
size_t size;
diff --git a/drivers/gpu/drm/panthor/panthor_gem.h b/drivers/gpu/drm/panthor/panthor_gem.h
index 80c6e24112d0..c08c7c066840 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.h
+++ b/drivers/gpu/drm/panthor/panthor_gem.h
@@ -147,6 +147,9 @@ panthor_gem_create_with_handle(struct drm_file *file,
void panthor_gem_bo_set_label(struct drm_gem_object *obj, const char *label);
void panthor_gem_kernel_bo_set_label(struct panthor_kernel_bo *bo, const char *label);
+int panthor_gem_bo_sync(struct drm_gem_object *obj, u32 flags,
+ u64 offset, u64 size);
+
static inline u64
panthor_kernel_bo_gpuva(struct panthor_kernel_bo *bo)
{
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index f0f637e0631d..c484a0204f3b 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -144,6 +144,9 @@ enum drm_panthor_ioctl_id {
* pgoff_t size.
*/
DRM_PANTHOR_SET_USER_MMIO_OFFSET,
+
+ /** @DRM_PANTHOR_BO_SYNC: Sync BO data to/from the device */
+ DRM_PANTHOR_BO_SYNC,
};
/**
@@ -1073,6 +1076,67 @@ struct drm_panthor_set_user_mmio_offset {
__u64 offset;
};
+/**
+ * enum drm_panthor_bo_sync_op_flags - BO sync flags
+ */
+enum drm_panthor_bo_sync_op_flags {
+ /**
+ * @DRM_PANTHOR_BO_SYNC_FOR_CPU: Sync data for CPU accesses.
+ */
+ DRM_PANTHOR_BO_SYNC_FOR_CPU = (0 << 0),
+
+ /**
+ * @DRM_PANTHOR_BO_SYNC_FOR_DEV: Sync data for device accesses.
+ */
+ DRM_PANTHOR_BO_SYNC_FOR_DEV = (1 << 0),
+
+ /**
+ * @DRM_PANTHOR_BO_SYNC_FOR_READ: Sync for reads.
+ */
+ DRM_PANTHOR_BO_SYNC_FOR_READ = (1 << 1),
+
+ /**
+ * @DRM_PANTHOR_BO_SYNC_FOR_WRITE: Sync for writes.
+ */
+ DRM_PANTHOR_BO_SYNC_FOR_WRITE = (1 << 2),
+};
+
+/**
+ * struct drm_panthor_bo_sync_op - BO map sync op
+ */
+struct drm_panthor_bo_sync_op {
+ /** @handle: Handle of the buffer object to sync. */
+ __u32 handle;
+
+ /** @flags: Flags controlling the sync operation. */
+ __u32 flags;
+
+ /**
+ * @offset: Offset into the BO at which the sync range starts.
+ *
+ * This will be rounded down to the nearest cache line as needed.
+ */
+ __u64 offset;
+
+ /**
+ * @size: Size of the range to sync
+ *
+ * @size + @offset will be rounded up to the nearest cache line as
+ * needed.
+ */
+ __u64 size;
+};
+
+/**
+ * struct drm_panthor_bo_sync - BO map sync request
+ */
+struct drm_panthor_bo_sync {
+ /**
+ * @ops: Array of struct drm_panthor_bo_sync_op sync operations.
+ */
+ struct drm_panthor_obj_array ops;
+};
+
/**
* DRM_IOCTL_PANTHOR() - Build a Panthor IOCTL number
* @__access: Access type. Must be R, W or RW.
@@ -1119,6 +1183,8 @@ enum {
DRM_IOCTL_PANTHOR(WR, BO_SET_LABEL, bo_set_label),
DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET =
DRM_IOCTL_PANTHOR(WR, SET_USER_MMIO_OFFSET, set_user_mmio_offset),
+ DRM_IOCTL_PANTHOR_BO_SYNC =
+ DRM_IOCTL_PANTHOR(WR, BO_SYNC, bo_sync),
};
#if defined(__cplusplus)
--
2.51.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 07/14] drm/panthor: Add an ioctl to query BO flags
2025-10-15 16:03 [PATCH v4 00/14] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (5 preceding siblings ...)
2025-10-15 16:03 ` [PATCH v4 06/14] drm/panthor: Add a PANTHOR_BO_SYNC ioctl Boris Brezillon
@ 2025-10-15 16:03 ` Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 08/14] drm/panthor: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
` (6 subsequent siblings)
13 siblings, 0 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-10-15 16:03 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Boris Brezillon,
kernel
This is useful when importing BOs, so we can know about cacheability
and flush the caches when needed.
We can also know when the buffer comes from a different subsystem and
take proper actions (avoid CPU mappings, or do kernel-based syncs
instead of userland cache flushes).
v2:
- New commit
v3:
- Add Steve's R-b
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panthor/panthor_drv.c | 24 +++++++++++
include/uapi/drm/panthor_drm.h | 57 +++++++++++++++++++++++++++
2 files changed, 81 insertions(+)
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 857954d2ac7b..9004d0ba0e45 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1438,6 +1438,29 @@ static int panthor_ioctl_bo_sync(struct drm_device *ddev, void *data,
return ret;
}
+static int panthor_ioctl_bo_query_info(struct drm_device *ddev, void *data,
+ struct drm_file *file)
+{
+ struct drm_panthor_bo_query_info *args = data;
+ struct panthor_gem_object *bo;
+ struct drm_gem_object *obj;
+
+ obj = drm_gem_object_lookup(file, args->handle);
+ if (!obj)
+ return -ENOENT;
+
+ bo = to_panthor_bo(obj);
+ args->pad = 0;
+ args->create_flags = bo->flags;
+
+ args->extra_flags = 0;
+ if (drm_gem_is_imported(&bo->base.base))
+ args->extra_flags |= DRM_PANTHOR_BO_IS_IMPORTED;
+
+ drm_gem_object_put(obj);
+ return 0;
+}
+
static int
panthor_open(struct drm_device *ddev, struct drm_file *file)
{
@@ -1513,6 +1536,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
PANTHOR_IOCTL(BO_SET_LABEL, bo_set_label, DRM_RENDER_ALLOW),
PANTHOR_IOCTL(SET_USER_MMIO_OFFSET, set_user_mmio_offset, DRM_RENDER_ALLOW),
PANTHOR_IOCTL(BO_SYNC, bo_sync, DRM_RENDER_ALLOW),
+ PANTHOR_IOCTL(BO_QUERY_INFO, bo_query_info, DRM_RENDER_ALLOW),
};
static int panthor_mmap(struct file *filp, struct vm_area_struct *vma)
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index c484a0204f3b..63ea44ab961d 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -147,6 +147,13 @@ enum drm_panthor_ioctl_id {
/** @DRM_PANTHOR_BO_SYNC: Sync BO data to/from the device */
DRM_PANTHOR_BO_SYNC,
+
+ /**
+ * @DRM_PANTHOR_BO_QUERY_INFO: Query information about a BO.
+ *
+ * This is useful for imported BOs.
+ */
+ DRM_PANTHOR_BO_QUERY_INFO,
};
/**
@@ -1137,6 +1144,54 @@ struct drm_panthor_bo_sync {
struct drm_panthor_obj_array ops;
};
+/**
+ * enum drm_panthor_bo_extra_flags - Set of flags returned on a BO_QUERY_INFO request
+ *
+ * Those are flags reflecting BO properties that are not directly coming from the flags
+ * passed are creation time, or information on BOs that were imported from other drivers.
+ */
+enum drm_panthor_bo_extra_flags {
+ /**
+ * @DRM_PANTHOR_BO_IS_IMPORTED: BO has been imported from an external driver.
+ *
+ * Note that imported dma-buf handles are not flagged as imported if they
+ * where exported by panthor. Only buffers that are coming from other drivers
+ * (dma heaps, other GPUs, display controllers, V4L, ...).
+ *
+ * It's also important to note that all imported BOs are mapped cached and can't
+ * be considered IO-coherent even if the GPU is. This means they require explicit
+ * syncs that must go through the DRM_PANTHOR_BO_SYNC ioctl (userland cache
+ * maintenance is not allowed in that case, because extra operations might be
+ * needed to make changes visible to the CPU/device, like buffer migration when the
+ * exporter is a GPU with its own VRAM).
+ */
+ DRM_PANTHOR_BO_IS_IMPORTED = (1 << 0),
+};
+
+/**
+ * struct drm_panthor_bo_query_info - Query BO info
+ */
+struct drm_panthor_bo_query_info {
+ /** @handle: Handle of the buffer object to query flags on. */
+ __u32 handle;
+
+ /**
+ * @extra_flags: Combination of enum drm_panthor_bo_extra_flags flags.
+ */
+ __u32 extra_flags;
+
+ /**
+ * @create_flags: Flags passed at creation time.
+ *
+ * Combination of enum drm_panthor_bo_flags flags.
+ * Will be zero if the buffer comes from a different driver.
+ */
+ __u32 create_flags;
+
+ /** @pad: Will be zero on return. */
+ __u32 pad;
+};
+
/**
* DRM_IOCTL_PANTHOR() - Build a Panthor IOCTL number
* @__access: Access type. Must be R, W or RW.
@@ -1185,6 +1240,8 @@ enum {
DRM_IOCTL_PANTHOR(WR, SET_USER_MMIO_OFFSET, set_user_mmio_offset),
DRM_IOCTL_PANTHOR_BO_SYNC =
DRM_IOCTL_PANTHOR(WR, BO_SYNC, bo_sync),
+ DRM_IOCTL_PANTHOR_BO_QUERY_INFO =
+ DRM_IOCTL_PANTHOR(WR, BO_QUERY_INFO, bo_query_info),
};
#if defined(__cplusplus)
--
2.51.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 08/14] drm/panthor: Add flag to map GEM object Write-Back Cacheable
2025-10-15 16:03 [PATCH v4 00/14] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (6 preceding siblings ...)
2025-10-15 16:03 ` [PATCH v4 07/14] drm/panthor: Add an ioctl to query BO flags Boris Brezillon
@ 2025-10-15 16:03 ` Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 09/14] drm/panthor: Bump the driver version to 1.6 Boris Brezillon
` (5 subsequent siblings)
13 siblings, 0 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-10-15 16:03 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Loïc Molinari,
kernel, Boris Brezillon
From: Loïc Molinari <loic.molinari@collabora.com>
Will be used by the UMD to optimize CPU accesses to buffers
that are frequently read by the CPU, or on which the access
pattern makes non-cacheable mappings inefficient.
Mapping buffers CPU-cached implies taking care of the CPU
cache maintenance in the UMD, unless the GPU is IO coherent.
v2:
- Add more to the commit message
- Tweak the doc
- Make sure we sync the section of the BO pointing to the CS
syncobj before we read its seqno
v3:
- Fix formatting/spelling issues
v4:
- Add Steve's R-b
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panthor/panthor_drv.c | 7 ++++++-
drivers/gpu/drm/panthor/panthor_gem.c | 3 +++
drivers/gpu/drm/panthor/panthor_sched.c | 18 ++++++++++++++++--
include/uapi/drm/panthor_drm.h | 9 +++++++++
4 files changed, 34 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 9004d0ba0e45..d2c321404a75 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -900,7 +900,8 @@ static int panthor_ioctl_vm_destroy(struct drm_device *ddev, void *data,
return panthor_vm_pool_destroy_vm(pfile->vms, args->id);
}
-#define PANTHOR_BO_FLAGS DRM_PANTHOR_BO_NO_MMAP
+#define PANTHOR_BO_FLAGS (DRM_PANTHOR_BO_NO_MMAP | \
+ DRM_PANTHOR_BO_WB_MMAP)
static int panthor_ioctl_bo_create(struct drm_device *ddev, void *data,
struct drm_file *file)
@@ -919,6 +920,10 @@ static int panthor_ioctl_bo_create(struct drm_device *ddev, void *data,
goto out_dev_exit;
}
+ if ((args->flags & DRM_PANTHOR_BO_NO_MMAP) &&
+ (args->flags & DRM_PANTHOR_BO_WB_MMAP))
+ return -EINVAL;
+
if (args->exclusive_vm_id) {
vm = panthor_vm_pool_get_vm(pfile->vms, args->exclusive_vm_id);
if (!vm) {
diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
index 617e04134d30..a0ccc316e375 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -280,6 +280,9 @@ panthor_gem_create_with_handle(struct drm_file *file,
bo = to_panthor_bo(&shmem->base);
bo->flags = flags;
+ if (flags & DRM_PANTHOR_BO_WB_MMAP)
+ shmem->map_wc = false;
+
if (exclusive_vm) {
bo->exclusive_vm_root_gem = panthor_vm_root_gem(exclusive_vm);
drm_gem_object_get(bo->exclusive_vm_root_gem);
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index f5e01cb16cfc..a17427cabf2a 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -868,8 +868,11 @@ panthor_queue_get_syncwait_obj(struct panthor_group *group, struct panthor_queue
struct iosys_map map;
int ret;
- if (queue->syncwait.kmap)
- return queue->syncwait.kmap + queue->syncwait.offset;
+ if (queue->syncwait.kmap) {
+ bo = container_of(queue->syncwait.obj,
+ struct panthor_gem_object, base.base);
+ goto out_sync;
+ }
bo = panthor_vm_get_bo_for_va(group->vm,
queue->syncwait.gpu_va,
@@ -886,6 +889,17 @@ panthor_queue_get_syncwait_obj(struct panthor_group *group, struct panthor_queue
if (drm_WARN_ON(&ptdev->base, !queue->syncwait.kmap))
goto err_put_syncwait_obj;
+out_sync:
+ /* Make sure the CPU caches are invalidated before the seqno is read.
+ * drm_gem_shmem_sync() is a NOP if map_wc=false, so no need to check
+ * it here.
+ */
+ drm_gem_shmem_sync(&bo->base, queue->syncwait.offset,
+ queue->syncwait.sync64 ?
+ sizeof(struct panthor_syncobj_64b) :
+ sizeof(struct panthor_syncobj_32b),
+ DRM_GEM_OBJECT_CPU_ACCESS | DRM_GEM_OBJECT_READ_ACCESS);
+
return queue->syncwait.kmap + queue->syncwait.offset;
err_put_syncwait_obj:
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index 63ea44ab961d..dca162781bb2 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -681,6 +681,15 @@ struct drm_panthor_vm_get_state {
enum drm_panthor_bo_flags {
/** @DRM_PANTHOR_BO_NO_MMAP: The buffer object will never be CPU-mapped in userspace. */
DRM_PANTHOR_BO_NO_MMAP = (1 << 0),
+
+ /**
+ * @DRM_PANTHOR_BO_WB_MMAP: Force "Write-Back Cacheable" CPU mapping.
+ *
+ * CPU map the buffer object in userspace by forcing the "Write-Back
+ * Cacheable" cacheability attribute. The mapping otherwise uses the
+ * "Non-Cacheable" attribute if the GPU is not IO coherent.
+ */
+ DRM_PANTHOR_BO_WB_MMAP = (1 << 1),
};
/**
--
2.51.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 09/14] drm/panthor: Bump the driver version to 1.6
2025-10-15 16:03 [PATCH v4 00/14] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (7 preceding siblings ...)
2025-10-15 16:03 ` [PATCH v4 08/14] drm/panthor: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
@ 2025-10-15 16:03 ` Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 10/14] drm/panfrost: Expose the selected coherency protocol to the UMD Boris Brezillon
` (4 subsequent siblings)
13 siblings, 0 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-10-15 16:03 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, kernel,
Boris Brezillon
From: Faith Ekstrand <faith.ekstrand@collabora.com>
Bump the driver version to reflect the new cached-CPU mapping
capability.
v2:
- Quickly describe what the new version exposes in the commit message
v3:
- Add Steve's R-b
Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panthor/panthor_drv.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index d2c321404a75..9d626ce66ad7 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1675,6 +1675,10 @@ static void panthor_debugfs_init(struct drm_minor *minor)
* - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag
* - 1.4 - adds DRM_IOCTL_PANTHOR_BO_SET_LABEL ioctl
* - 1.5 - adds DRM_PANTHOR_SET_USER_MMIO_OFFSET ioctl
+ * - 1.6 - adds DRM_PANTHOR_BO_WB_MMAP flag
+ * - adds DRM_IOCTL_PANTHOR_BO_SYNC ioctl
+ * - adds DRM_IOCTL_PANTHOR_BO_QUERY_INFO ioctl
+ * - adds drm_panthor_gpu_info::selected_coherency
*/
static const struct drm_driver panthor_drm_driver = {
.driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ |
@@ -1688,7 +1692,7 @@ static const struct drm_driver panthor_drm_driver = {
.name = "panthor",
.desc = "Panthor DRM driver",
.major = 1,
- .minor = 5,
+ .minor = 6,
.gem_create_object = panthor_gem_create_object,
.gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table,
--
2.51.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 10/14] drm/panfrost: Expose the selected coherency protocol to the UMD
2025-10-15 16:03 [PATCH v4 00/14] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (8 preceding siblings ...)
2025-10-15 16:03 ` [PATCH v4 09/14] drm/panthor: Bump the driver version to 1.6 Boris Brezillon
@ 2025-10-15 16:03 ` Boris Brezillon
2025-10-15 16:06 ` Steven Price
2025-10-15 16:03 ` [PATCH v4 11/14] drm/panfrost: Add a PANFROST_SYNC_BO ioctl Boris Brezillon
` (3 subsequent siblings)
13 siblings, 1 reply; 40+ messages in thread
From: Boris Brezillon @ 2025-10-15 16:03 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Boris Brezillon,
kernel
Will be needed if we want to skip CPU cache maintenance operations when
the GPU can snoop CPU caches.
v2:
- New commit
v3:
- Fix the coherency values (enum instead of bitmask)
v4:
- Fix init/test on coherency_features
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
drivers/gpu/drm/panfrost/panfrost_device.h | 1 +
drivers/gpu/drm/panfrost/panfrost_drv.c | 1 +
drivers/gpu/drm/panfrost/panfrost_gpu.c | 26 +++++++++++++++++++---
drivers/gpu/drm/panfrost/panfrost_regs.h | 10 +++++++--
include/uapi/drm/panfrost_drm.h | 7 ++++++
5 files changed, 40 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
index 1e73efad02a8..bd38b7ae169e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -75,6 +75,7 @@ struct panfrost_features {
u32 thread_max_workgroup_sz;
u32 thread_max_barrier_sz;
u32 coherency_features;
+ u32 selected_coherency;
u32 afbc_features;
u32 texture_features[4];
u32 js_features[16];
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 607a5b8448d0..3ffcd08f7745 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -94,6 +94,7 @@ static int panfrost_ioctl_get_param(struct drm_device *ddev, void *data, struct
PANFROST_FEATURE_ARRAY(JS_FEATURES, js_features, 15);
PANFROST_FEATURE(NR_CORE_GROUPS, nr_core_groups);
PANFROST_FEATURE(THREAD_TLS_ALLOC, thread_tls_alloc);
+ PANFROST_FEATURE(SELECTED_COHERENCY, selected_coherency);
case DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP:
ret = panfrost_ioctl_query_timestamp(pfdev, ¶m->value);
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index 174e190ba40f..f350a913c786 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -157,8 +157,8 @@ static void panfrost_gpu_init_quirks(struct panfrost_device *pfdev)
pfdev->features.revision >= 0x2000)
quirks |= JM_MAX_JOB_THROTTLE_LIMIT << JM_JOB_THROTTLE_LIMIT_SHIFT;
else if (panfrost_model_eq(pfdev, 0x6000) &&
- pfdev->features.coherency_features == COHERENCY_ACE)
- quirks |= (COHERENCY_ACE_LITE | COHERENCY_ACE) <<
+ pfdev->features.coherency_features == BIT(COHERENCY_ACE))
+ quirks |= (BIT(COHERENCY_ACE_LITE) | BIT(COHERENCY_ACE)) <<
JM_FORCE_COHERENCY_FEATURES_SHIFT;
if (panfrost_has_hw_feature(pfdev, HW_FEATURE_IDVS_GROUP_SIZE))
@@ -260,7 +260,27 @@ static void panfrost_gpu_init_features(struct panfrost_device *pfdev)
pfdev->features.max_threads = gpu_read(pfdev, GPU_THREAD_MAX_THREADS);
pfdev->features.thread_max_workgroup_sz = gpu_read(pfdev, GPU_THREAD_MAX_WORKGROUP_SIZE);
pfdev->features.thread_max_barrier_sz = gpu_read(pfdev, GPU_THREAD_MAX_BARRIER_SIZE);
- pfdev->features.coherency_features = gpu_read(pfdev, GPU_COHERENCY_FEATURES);
+
+ if (panfrost_has_hw_feature(pfdev, HW_FEATURE_COHERENCY_REG))
+ pfdev->features.coherency_features = gpu_read(pfdev, GPU_COHERENCY_FEATURES);
+ else
+ pfdev->features.coherency_features = BIT(COHERENCY_ACE_LITE);
+
+ BUILD_BUG_ON(COHERENCY_ACE_LITE != DRM_PANFROST_GPU_COHERENCY_ACE_LITE);
+ BUILD_BUG_ON(COHERENCY_ACE != DRM_PANFROST_GPU_COHERENCY_ACE);
+ BUILD_BUG_ON(COHERENCY_NONE != DRM_PANFROST_GPU_COHERENCY_NONE);
+
+ if (!pfdev->coherent) {
+ pfdev->features.selected_coherency = COHERENCY_NONE;
+ } else if (pfdev->features.coherency_features & BIT(COHERENCY_ACE)) {
+ pfdev->features.selected_coherency = COHERENCY_ACE;
+ } else if (pfdev->features.coherency_features & BIT(COHERENCY_ACE_LITE)) {
+ pfdev->features.selected_coherency = COHERENCY_ACE_LITE;
+ } else {
+ drm_WARN(pfdev->ddev, true, "No known coherency protocol supported");
+ pfdev->features.selected_coherency = COHERENCY_NONE;
+ }
+
pfdev->features.afbc_features = gpu_read(pfdev, GPU_AFBC_FEATURES);
for (i = 0; i < 4; i++)
pfdev->features.texture_features[i] = gpu_read(pfdev, GPU_TEXTURE_FEATURES(i));
diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 2b8f1617b836..ee15f6bf6e6f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -102,9 +102,15 @@
#define GPU_L2_PRESENT_LO 0x120 /* (RO) Level 2 cache present bitmap, low word */
#define GPU_L2_PRESENT_HI 0x124 /* (RO) Level 2 cache present bitmap, high word */
+/* GPU_COHERENCY_FEATURES is a bitmask of BIT(COHERENCY_xxx) values encoding the
+ * set of supported coherency protocols. GPU_COHERENCY_ENABLE is passed a
+ * COHERENCY_xxx value.
+ */
#define GPU_COHERENCY_FEATURES 0x300 /* (RO) Coherency features present */
-#define COHERENCY_ACE_LITE BIT(0)
-#define COHERENCY_ACE BIT(1)
+#define GPU_COHERENCY_ENABLE 0x304 /* (RW) Coherency protocol selection */
+#define COHERENCY_ACE_LITE 0
+#define COHERENCY_ACE 1
+#define COHERENCY_NONE 31
#define GPU_STACK_PRESENT_LO 0xE00 /* (RO) Core stack present bitmap, low word */
#define GPU_STACK_PRESENT_HI 0xE04 /* (RO) Core stack present bitmap, high word */
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index e8b47c9f6976..9bd8fa401400 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -188,6 +188,13 @@ enum drm_panfrost_param {
DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP,
DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP_FREQUENCY,
DRM_PANFROST_PARAM_ALLOWED_JM_CTX_PRIORITIES,
+ DRM_PANFROST_PARAM_SELECTED_COHERENCY,
+};
+
+enum drm_panfrost_gpu_coherency {
+ DRM_PANFROST_GPU_COHERENCY_ACE_LITE = 0,
+ DRM_PANFROST_GPU_COHERENCY_ACE = 1,
+ DRM_PANFROST_GPU_COHERENCY_NONE = 31,
};
struct drm_panfrost_get_param {
--
2.51.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 11/14] drm/panfrost: Add a PANFROST_SYNC_BO ioctl
2025-10-15 16:03 [PATCH v4 00/14] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (9 preceding siblings ...)
2025-10-15 16:03 ` [PATCH v4 10/14] drm/panfrost: Expose the selected coherency protocol to the UMD Boris Brezillon
@ 2025-10-15 16:03 ` Boris Brezillon
2025-10-16 8:42 ` Marcin Ślusarz
2025-10-15 16:03 ` [PATCH v4 12/14] drm/panfrost: Add an ioctl to query BO flags Boris Brezillon
` (2 subsequent siblings)
13 siblings, 1 reply; 40+ messages in thread
From: Boris Brezillon @ 2025-10-15 16:03 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, kernel,
Boris Brezillon
From: Faith Ekstrand <faith.ekstrand@collabora.com>
This will be used by the UMD to synchronize CPU-cached mappings when
the UMD can't do it directly (no usermode cache maintenance instruction
on Arm32).
v2:
- Add more to the commit message
- Change the flags to better match the drm_gem_shmem_sync semantics
v3:
- Add Steve's R-b
Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panfrost/panfrost_drv.c | 57 +++++++++++++++++++++++++
drivers/gpu/drm/panfrost/panfrost_gem.c | 20 +++++++++
drivers/gpu/drm/panfrost/panfrost_gem.h | 2 +
include/uapi/drm/panfrost_drm.h | 47 ++++++++++++++++++++
4 files changed, 126 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 3ffcd08f7745..cabf544f0437 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -579,6 +579,62 @@ static int panfrost_ioctl_jm_ctx_destroy(struct drm_device *dev, void *data,
return panfrost_jm_ctx_destroy(file, args->handle);
}
+#define PANFROST_BO_SYNC_OP_FLAGS \
+ (PANFROST_BO_SYNC_FOR_DEV | PANFROST_BO_SYNC_FOR_READ | \
+ PANFROST_BO_SYNC_FOR_WRITE)
+
+static int panfrost_ioctl_sync_bo(struct drm_device *ddev, void *data,
+ struct drm_file *file)
+{
+ struct drm_panfrost_sync_bo *args = data;
+ struct drm_panfrost_bo_sync_op *ops;
+ struct drm_gem_object *obj;
+ int ret;
+ u32 i;
+
+ if (args->pad)
+ return -EINVAL;
+
+ ops = kvmalloc_array(args->op_count, sizeof(*ops), GFP_KERNEL);
+ if (!ops) {
+ DRM_DEBUG("Failed to allocate incoming BO sync ops array\n");
+ return -ENOMEM;
+ }
+
+ if (copy_from_user(ops, (void __user *)(uintptr_t)args->ops,
+ args->op_count * sizeof(*ops))) {
+ DRM_DEBUG("Failed to copy in BO sync ops\n");
+ ret = -EFAULT;
+ goto err_ops;
+ }
+
+ for (i = 0; i < args->op_count; i++) {
+ if (ops[i].flags & ~PANFROST_BO_SYNC_OP_FLAGS) {
+ ret = -EINVAL;
+ goto err_ops;
+ }
+
+ obj = drm_gem_object_lookup(file, ops[i].handle);
+ if (!obj) {
+ ret = -ENOENT;
+ goto err_ops;
+ }
+
+ ret = panfrost_gem_sync(obj, ops[i].flags,
+ ops[i].offset, ops[i].size);
+
+ drm_gem_object_put(obj);
+
+ if (ret)
+ goto err_ops;
+ }
+
+err_ops:
+ kvfree(ops);
+
+ return ret;
+}
+
int panfrost_unstable_ioctl_check(void)
{
if (!unstable_ioctls)
@@ -648,6 +704,7 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
PANFROST_IOCTL(SET_LABEL_BO, set_label_bo, DRM_RENDER_ALLOW),
PANFROST_IOCTL(JM_CTX_CREATE, jm_ctx_create, DRM_RENDER_ALLOW),
PANFROST_IOCTL(JM_CTX_DESTROY, jm_ctx_destroy, DRM_RENDER_ALLOW),
+ PANFROST_IOCTL(SYNC_BO, sync_bo, DRM_RENDER_ALLOW),
};
static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 85d6289a6eda..da0362202d94 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -365,6 +365,26 @@ panfrost_gem_set_label(struct drm_gem_object *obj, const char *label)
kfree_const(old_label);
}
+int
+panfrost_gem_sync(struct drm_gem_object *obj, u32 flags,
+ u32 offset, u32 size)
+{
+ struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+ enum drm_gem_object_access_flags gem_access_flags = 0;
+
+ if (flags & PANFROST_BO_SYNC_FOR_DEV)
+ gem_access_flags |= DRM_GEM_OBJECT_DEV_ACCESS;
+ else
+ gem_access_flags |= DRM_GEM_OBJECT_CPU_ACCESS;
+
+ if (flags & PANFROST_BO_SYNC_FOR_READ)
+ gem_access_flags |= DRM_GEM_OBJECT_READ_ACCESS;
+ if (flags & PANFROST_BO_SYNC_FOR_WRITE)
+ gem_access_flags |= DRM_GEM_OBJECT_WRITE_ACCESS;
+
+ return drm_gem_shmem_sync(&bo->base, offset, size, gem_access_flags);
+}
+
void
panfrost_gem_internal_set_label(struct drm_gem_object *obj, const char *label)
{
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
index 8de3e76f2717..6a0e090aa59b 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -148,6 +148,8 @@ int panfrost_gem_shrinker_init(struct drm_device *dev);
void panfrost_gem_shrinker_cleanup(struct drm_device *dev);
void panfrost_gem_set_label(struct drm_gem_object *obj, const char *label);
+int panfrost_gem_sync(struct drm_gem_object *obj, u32 flags,
+ u32 offset, u32 size);
void panfrost_gem_internal_set_label(struct drm_gem_object *obj, const char *label);
#ifdef CONFIG_DEBUG_FS
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index 9bd8fa401400..7ebb74e750d1 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -24,6 +24,7 @@ extern "C" {
#define DRM_PANFROST_SET_LABEL_BO 0x09
#define DRM_PANFROST_JM_CTX_CREATE 0x0a
#define DRM_PANFROST_JM_CTX_DESTROY 0x0b
+#define DRM_PANFROST_SYNC_BO 0x0c
#define DRM_IOCTL_PANFROST_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
#define DRM_IOCTL_PANFROST_WAIT_BO DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
@@ -35,6 +36,7 @@ extern "C" {
#define DRM_IOCTL_PANFROST_SET_LABEL_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SET_LABEL_BO, struct drm_panfrost_set_label_bo)
#define DRM_IOCTL_PANFROST_JM_CTX_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_CREATE, struct drm_panfrost_jm_ctx_create)
#define DRM_IOCTL_PANFROST_JM_CTX_DESTROY DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_DESTROY, struct drm_panfrost_jm_ctx_destroy)
+#define DRM_IOCTL_PANFROST_SYNC_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SYNC_BO, struct drm_panfrost_sync_bo)
/*
* Unstable ioctl(s): only exposed when the unsafe unstable_ioctls module
@@ -266,6 +268,51 @@ struct drm_panfrost_set_label_bo {
__u64 label;
};
+/* Valid flags to pass to drm_panfrost_bo_sync_op */
+#define PANFROST_BO_SYNC_FOR_CPU (0 << 0)
+#define PANFROST_BO_SYNC_FOR_DEV (1 << 0)
+#define PANFROST_BO_SYNC_FOR_READ (1 << 1)
+#define PANFROST_BO_SYNC_FOR_WRITE (1 << 2)
+
+/**
+ * struct drm_panthor_bo_flush_map_op - BO map sync op
+ */
+struct drm_panfrost_bo_sync_op {
+ /** @handle: Handle of the buffer object to sync. */
+ __u32 handle;
+
+ /** @flags: Flags controlling the sync operation. */
+ __u32 flags;
+
+ /**
+ * @offset: Offset into the BO at which the sync range starts.
+ *
+ * This will be rounded down to the nearest cache line as needed.
+ */
+ __u32 offset;
+
+ /**
+ * @size: Size of the range to sync
+ *
+ * @size + @offset will be rounded up to the nearest cache line as
+ * needed.
+ */
+ __u32 size;
+};
+
+/**
+ * struct drm_panfrost_sync_bo - ioctl argument for syncing BO maps
+ */
+struct drm_panfrost_sync_bo {
+ /** Array of struct drm_panfrost_bo_sync_op */
+ __u64 ops;
+
+ /** Number of BO sync ops */
+ __u32 op_count;
+
+ __u32 pad;
+};
+
/* Definitions for coredump decoding in user space */
#define PANFROSTDUMP_MAJOR 1
#define PANFROSTDUMP_MINOR 0
--
2.51.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 12/14] drm/panfrost: Add an ioctl to query BO flags
2025-10-15 16:03 [PATCH v4 00/14] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (10 preceding siblings ...)
2025-10-15 16:03 ` [PATCH v4 11/14] drm/panfrost: Add a PANFROST_SYNC_BO ioctl Boris Brezillon
@ 2025-10-15 16:03 ` Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 13/14] drm/panfrost: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 14/14] drm/panfrost: Bump the driver version to 1.6 Boris Brezillon
13 siblings, 0 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-10-15 16:03 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, Boris Brezillon,
kernel
This is useful when importing BOs, so we can know about cacheability
and flush the caches when needed.
v2:
- New commit
v3:
- Add Steve's R-b
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panfrost/panfrost_drv.c | 33 +++++++++++++++++++++++++
include/uapi/drm/panfrost_drm.h | 19 ++++++++++++++
2 files changed, 52 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index cabf544f0437..00c0881fa2f0 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -635,6 +635,38 @@ static int panfrost_ioctl_sync_bo(struct drm_device *ddev, void *data,
return ret;
}
+static int panfrost_ioctl_query_bo_info(struct drm_device *dev, void *data,
+ struct drm_file *file_priv)
+{
+ struct drm_panfrost_query_bo_info *args = data;
+ struct drm_gem_object *gem_obj;
+ struct panfrost_gem_object *bo;
+
+ gem_obj = drm_gem_object_lookup(file_priv, args->handle);
+ if (!gem_obj) {
+ DRM_DEBUG("Failed to look up GEM BO %d\n", args->handle);
+ return -ENOENT;
+ }
+
+ bo = to_panfrost_bo(gem_obj);
+ args->pad = 0;
+ args->create_flags = 0;
+ args->extra_flags = 0;
+
+ if (drm_gem_is_imported(gem_obj)) {
+ args->extra_flags |= DRM_PANFROST_BO_IS_IMPORTED;
+ } else {
+ if (bo->noexec)
+ args->create_flags |= PANFROST_BO_NOEXEC;
+
+ if (bo->is_heap)
+ args->create_flags |= PANFROST_BO_HEAP;
+ }
+
+ drm_gem_object_put(gem_obj);
+ return 0;
+}
+
int panfrost_unstable_ioctl_check(void)
{
if (!unstable_ioctls)
@@ -705,6 +737,7 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
PANFROST_IOCTL(JM_CTX_CREATE, jm_ctx_create, DRM_RENDER_ALLOW),
PANFROST_IOCTL(JM_CTX_DESTROY, jm_ctx_destroy, DRM_RENDER_ALLOW),
PANFROST_IOCTL(SYNC_BO, sync_bo, DRM_RENDER_ALLOW),
+ PANFROST_IOCTL(QUERY_BO_INFO, query_bo_info, DRM_RENDER_ALLOW),
};
static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index 7ebb74e750d1..e7d01e744efd 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -25,6 +25,7 @@ extern "C" {
#define DRM_PANFROST_JM_CTX_CREATE 0x0a
#define DRM_PANFROST_JM_CTX_DESTROY 0x0b
#define DRM_PANFROST_SYNC_BO 0x0c
+#define DRM_PANFROST_QUERY_BO_INFO 0x0d
#define DRM_IOCTL_PANFROST_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
#define DRM_IOCTL_PANFROST_WAIT_BO DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
@@ -37,6 +38,7 @@ extern "C" {
#define DRM_IOCTL_PANFROST_JM_CTX_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_CREATE, struct drm_panfrost_jm_ctx_create)
#define DRM_IOCTL_PANFROST_JM_CTX_DESTROY DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_DESTROY, struct drm_panfrost_jm_ctx_destroy)
#define DRM_IOCTL_PANFROST_SYNC_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SYNC_BO, struct drm_panfrost_sync_bo)
+#define DRM_IOCTL_PANFROST_QUERY_BO_INFO DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_QUERY_BO_INFO, struct drm_panfrost_query_bo_info)
/*
* Unstable ioctl(s): only exposed when the unsafe unstable_ioctls module
@@ -313,6 +315,23 @@ struct drm_panfrost_sync_bo {
__u32 pad;
};
+/** BO comes from a different subsystem. */
+#define DRM_PANFROST_BO_IS_IMPORTED (1 << 0)
+
+struct drm_panfrost_query_bo_info {
+ /** Handle of the object being queried. */
+ __u32 handle;
+
+ /** Extra flags that are not coming from the BO_CREATE ioctl(). */
+ __u32 extra_flags;
+
+ /** Flags passed at creation time. */
+ __u32 create_flags;
+
+ /** Will be zero on return. */
+ __u32 pad;
+};
+
/* Definitions for coredump decoding in user space */
#define PANFROSTDUMP_MAJOR 1
#define PANFROSTDUMP_MINOR 0
--
2.51.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 13/14] drm/panfrost: Add flag to map GEM object Write-Back Cacheable
2025-10-15 16:03 [PATCH v4 00/14] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (11 preceding siblings ...)
2025-10-15 16:03 ` [PATCH v4 12/14] drm/panfrost: Add an ioctl to query BO flags Boris Brezillon
@ 2025-10-15 16:03 ` Boris Brezillon
2025-10-15 16:06 ` Steven Price
2025-10-15 16:03 ` [PATCH v4 14/14] drm/panfrost: Bump the driver version to 1.6 Boris Brezillon
13 siblings, 1 reply; 40+ messages in thread
From: Boris Brezillon @ 2025-10-15 16:03 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, kernel,
Boris Brezillon
From: Faith Ekstrand <faith.ekstrand@collabora.com>
Will be used by the UMD to optimize CPU accesses to buffers
that are frequently read by the CPU, or on which the access
pattern makes non-cacheable mappings inefficient.
Mapping buffers CPU-cached implies taking care of the CPU
cache maintenance in the UMD, unless the GPU is IO coherent.
v2:
- Add more to the commit message
v3:
- No changes
v4:
- Fix the map_wc test in panfrost_ioctl_query_bo_info()
Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
drivers/gpu/drm/panfrost/panfrost_drv.c | 10 ++++++++--
drivers/gpu/drm/panfrost/panfrost_gem.c | 3 +++
include/uapi/drm/panfrost_drm.h | 1 +
3 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 00c0881fa2f0..9ca08d148f56 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -125,6 +125,10 @@ static int panfrost_ioctl_get_param(struct drm_device *ddev, void *data, struct
return 0;
}
+#define PANFROST_BO_FLAGS (PANFROST_BO_NOEXEC | \
+ PANFROST_BO_HEAP | \
+ PANFROST_BO_WB_MMAP)
+
static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data,
struct drm_file *file)
{
@@ -134,8 +138,7 @@ static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data,
struct panfrost_gem_mapping *mapping;
int ret;
- if (!args->size || args->pad ||
- (args->flags & ~(PANFROST_BO_NOEXEC | PANFROST_BO_HEAP)))
+ if (!args->size || args->pad || (args->flags & ~PANFROST_BO_FLAGS))
return -EINVAL;
/* Heaps should never be executable */
@@ -661,6 +664,9 @@ static int panfrost_ioctl_query_bo_info(struct drm_device *dev, void *data,
if (bo->is_heap)
args->create_flags |= PANFROST_BO_HEAP;
+
+ if (!bo->base.map_wc)
+ args->create_flags |= PANFROST_BO_WB_MMAP;
}
drm_gem_object_put(gem_obj);
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index da0362202d94..0e8028ee9d1f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -320,6 +320,9 @@ panfrost_gem_create(struct drm_device *dev, size_t size, u32 flags)
bo->noexec = !!(flags & PANFROST_BO_NOEXEC);
bo->is_heap = !!(flags & PANFROST_BO_HEAP);
+ if (flags & PANFROST_BO_WB_MMAP)
+ bo->base.map_wc = false;
+
return bo;
}
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index e7d01e744efd..244d2f96c2d7 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -104,6 +104,7 @@ struct drm_panfrost_wait_bo {
/* Valid flags to pass to drm_panfrost_create_bo */
#define PANFROST_BO_NOEXEC 1
#define PANFROST_BO_HEAP 2
+#define PANFROST_BO_WB_MMAP 4
/**
* struct drm_panfrost_create_bo - ioctl argument for creating Panfrost BOs.
--
2.51.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 14/14] drm/panfrost: Bump the driver version to 1.6
2025-10-15 16:03 [PATCH v4 00/14] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
` (12 preceding siblings ...)
2025-10-15 16:03 ` [PATCH v4 13/14] drm/panfrost: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
@ 2025-10-15 16:03 ` Boris Brezillon
13 siblings, 0 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-10-15 16:03 UTC (permalink / raw)
To: Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, kernel,
Boris Brezillon
From: Faith Ekstrand <faith.ekstrand@collabora.com>
Bump the driver version to reflect the new cached-CPU mapping
capability.
v2:
- Quickly describe what the new version exposes in the commit message
v3:
- Add Steve's R-b
Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
drivers/gpu/drm/panfrost/panfrost_drv.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 9ca08d148f56..90ed3abaab91 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -934,6 +934,9 @@ static void panfrost_debugfs_init(struct drm_minor *minor)
* - 1.4 - adds SET_LABEL_BO
* - 1.5 - adds JM_CTX_{CREATE,DESTROY} ioctls and extend SUBMIT to allow
* context creation with configurable priorities/affinity
+ * - 1.6 - adds PANFROST_BO_MAP_WB, PANFROST_IOCTL_SYNC_BO,
+ * PANFROST_IOCTL_QUERY_BO_INFO and
+ * DRM_PANFROST_PARAM_SELECTED_COHERENCY
*/
static const struct drm_driver panfrost_drm_driver = {
.driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ,
@@ -946,7 +949,7 @@ static const struct drm_driver panfrost_drm_driver = {
.name = "panfrost",
.desc = "panfrost DRM",
.major = 1,
- .minor = 5,
+ .minor = 6,
.gem_create_object = panfrost_gem_create_object,
.gem_prime_import_sg_table = panfrost_gem_prime_import_sg_table,
--
2.51.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v4 10/14] drm/panfrost: Expose the selected coherency protocol to the UMD
2025-10-15 16:03 ` [PATCH v4 10/14] drm/panfrost: Expose the selected coherency protocol to the UMD Boris Brezillon
@ 2025-10-15 16:06 ` Steven Price
0 siblings, 0 replies; 40+ messages in thread
From: Steven Price @ 2025-10-15 16:06 UTC (permalink / raw)
To: Boris Brezillon
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, kernel
On 15/10/2025 17:03, Boris Brezillon wrote:
> Will be needed if we want to skip CPU cache maintenance operations when
> the GPU can snoop CPU caches.
>
> v2:
> - New commit
>
> v3:
> - Fix the coherency values (enum instead of bitmask)
>
> v4:
> - Fix init/test on coherency_features
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
> ---
> drivers/gpu/drm/panfrost/panfrost_device.h | 1 +
> drivers/gpu/drm/panfrost/panfrost_drv.c | 1 +
> drivers/gpu/drm/panfrost/panfrost_gpu.c | 26 +++++++++++++++++++---
> drivers/gpu/drm/panfrost/panfrost_regs.h | 10 +++++++--
> include/uapi/drm/panfrost_drm.h | 7 ++++++
> 5 files changed, 40 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
> index 1e73efad02a8..bd38b7ae169e 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
> @@ -75,6 +75,7 @@ struct panfrost_features {
> u32 thread_max_workgroup_sz;
> u32 thread_max_barrier_sz;
> u32 coherency_features;
> + u32 selected_coherency;
> u32 afbc_features;
> u32 texture_features[4];
> u32 js_features[16];
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 607a5b8448d0..3ffcd08f7745 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -94,6 +94,7 @@ static int panfrost_ioctl_get_param(struct drm_device *ddev, void *data, struct
> PANFROST_FEATURE_ARRAY(JS_FEATURES, js_features, 15);
> PANFROST_FEATURE(NR_CORE_GROUPS, nr_core_groups);
> PANFROST_FEATURE(THREAD_TLS_ALLOC, thread_tls_alloc);
> + PANFROST_FEATURE(SELECTED_COHERENCY, selected_coherency);
>
> case DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP:
> ret = panfrost_ioctl_query_timestamp(pfdev, ¶m->value);
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> index 174e190ba40f..f350a913c786 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> @@ -157,8 +157,8 @@ static void panfrost_gpu_init_quirks(struct panfrost_device *pfdev)
> pfdev->features.revision >= 0x2000)
> quirks |= JM_MAX_JOB_THROTTLE_LIMIT << JM_JOB_THROTTLE_LIMIT_SHIFT;
> else if (panfrost_model_eq(pfdev, 0x6000) &&
> - pfdev->features.coherency_features == COHERENCY_ACE)
> - quirks |= (COHERENCY_ACE_LITE | COHERENCY_ACE) <<
> + pfdev->features.coherency_features == BIT(COHERENCY_ACE))
> + quirks |= (BIT(COHERENCY_ACE_LITE) | BIT(COHERENCY_ACE)) <<
> JM_FORCE_COHERENCY_FEATURES_SHIFT;
>
> if (panfrost_has_hw_feature(pfdev, HW_FEATURE_IDVS_GROUP_SIZE))
> @@ -260,7 +260,27 @@ static void panfrost_gpu_init_features(struct panfrost_device *pfdev)
> pfdev->features.max_threads = gpu_read(pfdev, GPU_THREAD_MAX_THREADS);
> pfdev->features.thread_max_workgroup_sz = gpu_read(pfdev, GPU_THREAD_MAX_WORKGROUP_SIZE);
> pfdev->features.thread_max_barrier_sz = gpu_read(pfdev, GPU_THREAD_MAX_BARRIER_SIZE);
> - pfdev->features.coherency_features = gpu_read(pfdev, GPU_COHERENCY_FEATURES);
> +
> + if (panfrost_has_hw_feature(pfdev, HW_FEATURE_COHERENCY_REG))
> + pfdev->features.coherency_features = gpu_read(pfdev, GPU_COHERENCY_FEATURES);
> + else
> + pfdev->features.coherency_features = BIT(COHERENCY_ACE_LITE);
> +
> + BUILD_BUG_ON(COHERENCY_ACE_LITE != DRM_PANFROST_GPU_COHERENCY_ACE_LITE);
> + BUILD_BUG_ON(COHERENCY_ACE != DRM_PANFROST_GPU_COHERENCY_ACE);
> + BUILD_BUG_ON(COHERENCY_NONE != DRM_PANFROST_GPU_COHERENCY_NONE);
> +
> + if (!pfdev->coherent) {
> + pfdev->features.selected_coherency = COHERENCY_NONE;
> + } else if (pfdev->features.coherency_features & BIT(COHERENCY_ACE)) {
> + pfdev->features.selected_coherency = COHERENCY_ACE;
> + } else if (pfdev->features.coherency_features & BIT(COHERENCY_ACE_LITE)) {
> + pfdev->features.selected_coherency = COHERENCY_ACE_LITE;
> + } else {
> + drm_WARN(pfdev->ddev, true, "No known coherency protocol supported");
> + pfdev->features.selected_coherency = COHERENCY_NONE;
> + }
> +
> pfdev->features.afbc_features = gpu_read(pfdev, GPU_AFBC_FEATURES);
> for (i = 0; i < 4; i++)
> pfdev->features.texture_features[i] = gpu_read(pfdev, GPU_TEXTURE_FEATURES(i));
> diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h
> index 2b8f1617b836..ee15f6bf6e6f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_regs.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
> @@ -102,9 +102,15 @@
> #define GPU_L2_PRESENT_LO 0x120 /* (RO) Level 2 cache present bitmap, low word */
> #define GPU_L2_PRESENT_HI 0x124 /* (RO) Level 2 cache present bitmap, high word */
>
> +/* GPU_COHERENCY_FEATURES is a bitmask of BIT(COHERENCY_xxx) values encoding the
> + * set of supported coherency protocols. GPU_COHERENCY_ENABLE is passed a
> + * COHERENCY_xxx value.
> + */
> #define GPU_COHERENCY_FEATURES 0x300 /* (RO) Coherency features present */
> -#define COHERENCY_ACE_LITE BIT(0)
> -#define COHERENCY_ACE BIT(1)
> +#define GPU_COHERENCY_ENABLE 0x304 /* (RW) Coherency protocol selection */
> +#define COHERENCY_ACE_LITE 0
> +#define COHERENCY_ACE 1
> +#define COHERENCY_NONE 31
>
> #define GPU_STACK_PRESENT_LO 0xE00 /* (RO) Core stack present bitmap, low word */
> #define GPU_STACK_PRESENT_HI 0xE04 /* (RO) Core stack present bitmap, high word */
> diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
> index e8b47c9f6976..9bd8fa401400 100644
> --- a/include/uapi/drm/panfrost_drm.h
> +++ b/include/uapi/drm/panfrost_drm.h
> @@ -188,6 +188,13 @@ enum drm_panfrost_param {
> DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP,
> DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP_FREQUENCY,
> DRM_PANFROST_PARAM_ALLOWED_JM_CTX_PRIORITIES,
> + DRM_PANFROST_PARAM_SELECTED_COHERENCY,
> +};
> +
> +enum drm_panfrost_gpu_coherency {
> + DRM_PANFROST_GPU_COHERENCY_ACE_LITE = 0,
> + DRM_PANFROST_GPU_COHERENCY_ACE = 1,
> + DRM_PANFROST_GPU_COHERENCY_NONE = 31,
> };
>
> struct drm_panfrost_get_param {
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 13/14] drm/panfrost: Add flag to map GEM object Write-Back Cacheable
2025-10-15 16:03 ` [PATCH v4 13/14] drm/panfrost: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
@ 2025-10-15 16:06 ` Steven Price
0 siblings, 0 replies; 40+ messages in thread
From: Steven Price @ 2025-10-15 16:06 UTC (permalink / raw)
To: Boris Brezillon
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, kernel
On 15/10/2025 17:03, Boris Brezillon wrote:
> From: Faith Ekstrand <faith.ekstrand@collabora.com>
>
> Will be used by the UMD to optimize CPU accesses to buffers
> that are frequently read by the CPU, or on which the access
> pattern makes non-cacheable mappings inefficient.
>
> Mapping buffers CPU-cached implies taking care of the CPU
> cache maintenance in the UMD, unless the GPU is IO coherent.
>
> v2:
> - Add more to the commit message
>
> v3:
> - No changes
>
> v4:
> - Fix the map_wc test in panfrost_ioctl_query_bo_info()
>
> Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
> ---
> drivers/gpu/drm/panfrost/panfrost_drv.c | 10 ++++++++--
> drivers/gpu/drm/panfrost/panfrost_gem.c | 3 +++
> include/uapi/drm/panfrost_drm.h | 1 +
> 3 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 00c0881fa2f0..9ca08d148f56 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -125,6 +125,10 @@ static int panfrost_ioctl_get_param(struct drm_device *ddev, void *data, struct
> return 0;
> }
>
> +#define PANFROST_BO_FLAGS (PANFROST_BO_NOEXEC | \
> + PANFROST_BO_HEAP | \
> + PANFROST_BO_WB_MMAP)
> +
> static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data,
> struct drm_file *file)
> {
> @@ -134,8 +138,7 @@ static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data,
> struct panfrost_gem_mapping *mapping;
> int ret;
>
> - if (!args->size || args->pad ||
> - (args->flags & ~(PANFROST_BO_NOEXEC | PANFROST_BO_HEAP)))
> + if (!args->size || args->pad || (args->flags & ~PANFROST_BO_FLAGS))
> return -EINVAL;
>
> /* Heaps should never be executable */
> @@ -661,6 +664,9 @@ static int panfrost_ioctl_query_bo_info(struct drm_device *dev, void *data,
>
> if (bo->is_heap)
> args->create_flags |= PANFROST_BO_HEAP;
> +
> + if (!bo->base.map_wc)
> + args->create_flags |= PANFROST_BO_WB_MMAP;
> }
>
> drm_gem_object_put(gem_obj);
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
> index da0362202d94..0e8028ee9d1f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> @@ -320,6 +320,9 @@ panfrost_gem_create(struct drm_device *dev, size_t size, u32 flags)
> bo->noexec = !!(flags & PANFROST_BO_NOEXEC);
> bo->is_heap = !!(flags & PANFROST_BO_HEAP);
>
> + if (flags & PANFROST_BO_WB_MMAP)
> + bo->base.map_wc = false;
> +
> return bo;
> }
>
> diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
> index e7d01e744efd..244d2f96c2d7 100644
> --- a/include/uapi/drm/panfrost_drm.h
> +++ b/include/uapi/drm/panfrost_drm.h
> @@ -104,6 +104,7 @@ struct drm_panfrost_wait_bo {
> /* Valid flags to pass to drm_panfrost_create_bo */
> #define PANFROST_BO_NOEXEC 1
> #define PANFROST_BO_HEAP 2
> +#define PANFROST_BO_WB_MMAP 4
>
> /**
> * struct drm_panfrost_create_bo - ioctl argument for creating Panfrost BOs.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-15 16:03 ` [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper Boris Brezillon
@ 2025-10-16 8:13 ` Maxime Ripard
2025-10-16 12:57 ` Boris Brezillon
2025-10-16 8:32 ` Thomas Zimmermann
2025-10-17 14:32 ` Faith Ekstrand
2 siblings, 1 reply; 40+ messages in thread
From: Maxime Ripard @ 2025-10-16 8:13 UTC (permalink / raw)
To: Boris Brezillon
Cc: Steven Price, dri-devel, Maarten Lankhorst, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, kernel
[-- Attachment #1: Type: text/plain, Size: 3335 bytes --]
Hi,
On Wed, Oct 15, 2025 at 06:03:14PM +0200, Boris Brezillon wrote:
> Prepare things for standardizing synchronization around CPU accesses
> of GEM buffers. This will be used to provide default
> drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and provide
> a way for drivers to add their own ioctls to synchronize CPU
> writes/reads when they can't do it directly from userland.
>
> v2:
> - New commit
>
> v3:
> - No changes
>
> v4:
> - Add Steve's R-b
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Reviewed-by: Steven Price <steven.price@arm.com>
> ---
> drivers/gpu/drm/drm_gem.c | 10 +++++++++
> include/drm/drm_gem.h | 45 +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 55 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index a1a9c828938b..a1431e4f2404 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
> }
> EXPORT_SYMBOL(drm_gem_vunmap);
>
> +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
> + enum drm_gem_object_access_flags access)
> +{
> + if (obj->funcs->sync)
> + return obj->funcs->sync(obj, offset, size, access);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(drm_gem_sync);
> +
> /**
> * drm_gem_lock_reservations - Sets up the ww context and acquires
> * the lock on an array of GEM objects.
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 8d48d2af2649..1c33e59ab305 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -66,6 +66,33 @@ enum drm_gem_object_status {
> DRM_GEM_OBJECT_ACTIVE = BIT(2),
> };
>
> +/**
> + * enum drm_gem_object_status - bitmask describing GEM access types to prepare for
Treating an enum as a bitmask is a bit weird to me. I'd say either have
a bitmask with BIT(enum values), or no enum at all.
> + */
> +enum drm_gem_object_access_flags {
> + /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. */
> + DRM_GEM_OBJECT_CPU_ACCESS = 0,
> +
> + /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device access. */
> + DRM_GEM_OBJECT_DEV_ACCESS = BIT(0),
> +
> + /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the entity doing the access. */
> + DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0),
Do we really want to have to variants with the same discriminant? If so,
we should document why it's something we want.
> + /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only accesses. */
> + DRM_GEM_OBJECT_READ_ACCESS = BIT(1),
> +
> + /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only accesses. */
> + DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2),
> +
> + /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write accesses. */
> + DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS |
> + DRM_GEM_OBJECT_WRITE_ACCESS,
> +
> + /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check the access type. */
> + DRM_GEM_OBJECT_ACCESS_TYPE_MASK = DRM_GEM_OBJECT_RW_ACCESS,
Same thing.
Or is it that you encode both the direction and access type, and have a
mask to isolate each?
If so, we should really move it out from an enum into defines, or treat each
separately like dma_sync_does.
Maxime
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 273 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-15 16:03 ` [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper Boris Brezillon
2025-10-16 8:13 ` Maxime Ripard
@ 2025-10-16 8:32 ` Thomas Zimmermann
2025-10-16 9:40 ` Boris Brezillon
2025-10-17 14:32 ` Faith Ekstrand
2 siblings, 1 reply; 40+ messages in thread
From: Thomas Zimmermann @ 2025-10-16 8:32 UTC (permalink / raw)
To: Boris Brezillon, Steven Price
Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, David Airlie,
Simona Vetter, Faith Ekstrand, kernel
Hi,
on patches 2 to 4: sync is really begin/end access wrapped into one
interface, which I find questionable. I also don't like that these
patches add generic infrastructure for a single driver.
My proposal is to make your own dma_buf exporter in panthor and set the
begin/end_cpu_access functions as you need. Panthor already contains its
own GEM export helper at [1]. If you inline drm_gem_prime_export() [2]
you can set the cpu_access callbacks to panthor-specific code. The other
dma-buf helpers' symbols should be exported and can be re-used. That is
a lot less intrusive and should provide what you need.
I found that a hand full of other DRM drivers implement dma-buf's
begin/end access callbacks. If you want a generic begin/end interface
for GEM, you certainly want to get them on board. If there's something
common to share, this should be done in a separate series.
[1]
https://elixir.bootlin.com/linux/v6.17.1/source/drivers/gpu/drm/panthor/panthor_gem.c#L196
[2]
https://elixir.bootlin.com/linux/v6.17.1/source/drivers/gpu/drm/drm_prime.c#L919
Best regards
Thomas
Am 15.10.25 um 18:03 schrieb Boris Brezillon:
> Prepare things for standardizing synchronization around CPU accesses
> of GEM buffers. This will be used to provide default
> drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and provide
> a way for drivers to add their own ioctls to synchronize CPU
> writes/reads when they can't do it directly from userland.
>
> v2:
> - New commit
>
> v3:
> - No changes
>
> v4:
> - Add Steve's R-b
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Reviewed-by: Steven Price <steven.price@arm.com>
> ---
> drivers/gpu/drm/drm_gem.c | 10 +++++++++
> include/drm/drm_gem.h | 45 +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 55 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index a1a9c828938b..a1431e4f2404 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
> }
> EXPORT_SYMBOL(drm_gem_vunmap);
>
> +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
> + enum drm_gem_object_access_flags access)
> +{
> + if (obj->funcs->sync)
> + return obj->funcs->sync(obj, offset, size, access);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(drm_gem_sync);
> +
> /**
> * drm_gem_lock_reservations - Sets up the ww context and acquires
> * the lock on an array of GEM objects.
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 8d48d2af2649..1c33e59ab305 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -66,6 +66,33 @@ enum drm_gem_object_status {
> DRM_GEM_OBJECT_ACTIVE = BIT(2),
> };
>
> +/**
> + * enum drm_gem_object_status - bitmask describing GEM access types to prepare for
> + */
> +enum drm_gem_object_access_flags {
> + /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. */
> + DRM_GEM_OBJECT_CPU_ACCESS = 0,
> +
> + /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device access. */
> + DRM_GEM_OBJECT_DEV_ACCESS = BIT(0),
> +
> + /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the entity doing the access. */
> + DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0),
> +
> + /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only accesses. */
> + DRM_GEM_OBJECT_READ_ACCESS = BIT(1),
> +
> + /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only accesses. */
> + DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2),
> +
> + /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write accesses. */
> + DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS |
> + DRM_GEM_OBJECT_WRITE_ACCESS,
> +
> + /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check the access type. */
> + DRM_GEM_OBJECT_ACCESS_TYPE_MASK = DRM_GEM_OBJECT_RW_ACCESS,
> +};
> +
> /**
> * struct drm_gem_object_funcs - GEM object functions
> */
> @@ -191,6 +218,21 @@ struct drm_gem_object_funcs {
> */
> int (*mmap)(struct drm_gem_object *obj, struct vm_area_struct *vma);
>
> + /**
> + * @sync:
> + *
> + * Prepare for CPU/device access. This can involve migration of
> + * a buffer to the system-RAM/VRAM, or for UMA, flushing/invalidating
> + * the CPU caches. The range can be used to optimize the synchronization
> + * when possible.
> + *
> + * Returns 0 on success, -errno otherwise.
> + *
> + * This callback is optional.
> + */
> + int (*sync)(struct drm_gem_object *obj, size_t offset, size_t size,
> + enum drm_gem_object_access_flags access);
> +
> /**
> * @evict:
> *
> @@ -559,6 +601,9 @@ void drm_gem_unlock(struct drm_gem_object *obj);
> int drm_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map);
> void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map);
>
> +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
> + enum drm_gem_object_access_flags access);
> +
> int drm_gem_objects_lookup(struct drm_file *filp, void __user *bo_handles,
> int count, struct drm_gem_object ***objs_out);
> struct drm_gem_object *drm_gem_object_lookup(struct drm_file *filp, u32 handle);
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 11/14] drm/panfrost: Add a PANFROST_SYNC_BO ioctl
2025-10-15 16:03 ` [PATCH v4 11/14] drm/panfrost: Add a PANFROST_SYNC_BO ioctl Boris Brezillon
@ 2025-10-16 8:42 ` Marcin Ślusarz
2025-10-16 9:52 ` Boris Brezillon
0 siblings, 1 reply; 40+ messages in thread
From: Marcin Ślusarz @ 2025-10-16 8:42 UTC (permalink / raw)
To: Boris Brezillon
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
kernel, nd
On Wed, Oct 15, 2025 at 06:03:23PM +0200, Boris Brezillon wrote:
> +static int panfrost_ioctl_sync_bo(struct drm_device *ddev, void *data,
> + struct drm_file *file)
> +{
> + struct drm_panfrost_sync_bo *args = data;
> + struct drm_panfrost_bo_sync_op *ops;
> + struct drm_gem_object *obj;
> + int ret;
> + u32 i;
> +
> + if (args->pad)
> + return -EINVAL;
> +
> + ops = kvmalloc_array(args->op_count, sizeof(*ops), GFP_KERNEL);
> + if (!ops) {
> + DRM_DEBUG("Failed to allocate incoming BO sync ops array\n");
> + return -ENOMEM;
> + }
> +
> + if (copy_from_user(ops, (void __user *)(uintptr_t)args->ops,
> + args->op_count * sizeof(*ops))) {
> + DRM_DEBUG("Failed to copy in BO sync ops\n");
> + ret = -EFAULT;
> + goto err_ops;
> + }
> +
> + for (i = 0; i < args->op_count; i++) {
If args->op_count is 0, if I'm not mistaken, kvmalloc_array and
copy_to_user will succeed, but then this function will return
unitialized value. Maybe add an explicit check for op_count == 0
at the beginning and avoid going through all that code?
> + if (ops[i].flags & ~PANFROST_BO_SYNC_OP_FLAGS) {
> + ret = -EINVAL;
> + goto err_ops;
> + }
> +
> + obj = drm_gem_object_lookup(file, ops[i].handle);
> + if (!obj) {
> + ret = -ENOENT;
> + goto err_ops;
> + }
> +
> + ret = panfrost_gem_sync(obj, ops[i].flags,
> + ops[i].offset, ops[i].size);
> +
> + drm_gem_object_put(obj);
> +
> + if (ret)
> + goto err_ops;
> + }
> +
> +err_ops:
> + kvfree(ops);
> +
> + return ret;
> +}
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-16 8:32 ` Thomas Zimmermann
@ 2025-10-16 9:40 ` Boris Brezillon
2025-10-16 9:58 ` Thomas Zimmermann
0 siblings, 1 reply; 40+ messages in thread
From: Boris Brezillon @ 2025-10-16 9:40 UTC (permalink / raw)
To: Thomas Zimmermann
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
David Airlie, Simona Vetter, Faith Ekstrand, kernel
Hi Thomas,
On Thu, 16 Oct 2025 10:32:46 +0200
Thomas Zimmermann <tzimmermann@suse.de> wrote:
> Hi,
>
> on patches 2 to 4: sync is really begin/end access wrapped into one
> interface, which I find questionable. I also don't like that these
> patches add generic infrastructure for a single driver.
It's actually two drivers (panfrost and panthor), and the interface is
here so other drivers relying on drm_gem_shmem don't have to hand-roll
these things in the future.
>
> My proposal is to make your own dma_buf exporter in panthor and set the
> begin/end_cpu_access functions as you need. Panthor already contains its
> own GEM export helper at [1]. If you inline drm_gem_prime_export() [2]
> you can set the cpu_access callbacks to panthor-specific code. The other
> dma-buf helpers' symbols should be exported and can be re-used. That is
> a lot less intrusive and should provide what you need.
I can of course do that in panthor, but then I'll have to duplicate
the same logic in panfrost. Also, the whole point of making it generic
is so that people don't forget that begin/end_cpu_access() is a thing
they should consider (like happened to me if you look at v2 of this
patchset), otherwise importers of their buffers might have unpleasant
side effects because of missing flush/invalidates. This, IMHO, is a good
reason to have it as a drm_gem_funcs::sync() callback. That, or we
decide that the default dma_buf_ops is not a thing, and we force
developers to think twice when they select the default hooks to pick
for their dma_buf implementation.
>
> I found that a hand full of other DRM drivers implement dma-buf's
> begin/end access callbacks. If you want a generic begin/end interface
> for GEM, you certainly want to get them on board. If there's something
> common to share, this should be done in a separate series.
Fair enough. I'll try to convert freedreno and imagination to this new
interface, and gather some feedback.
Regards,
Boris
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 11/14] drm/panfrost: Add a PANFROST_SYNC_BO ioctl
2025-10-16 8:42 ` Marcin Ślusarz
@ 2025-10-16 9:52 ` Boris Brezillon
2025-10-16 10:02 ` Marcin Ślusarz
0 siblings, 1 reply; 40+ messages in thread
From: Boris Brezillon @ 2025-10-16 9:52 UTC (permalink / raw)
To: Marcin Ślusarz
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
kernel, nd
On Thu, 16 Oct 2025 10:42:21 +0200
Marcin Ślusarz <marcin.slusarz@arm.com> wrote:
> On Wed, Oct 15, 2025 at 06:03:23PM +0200, Boris Brezillon wrote:
> > +static int panfrost_ioctl_sync_bo(struct drm_device *ddev, void *data,
> > + struct drm_file *file)
> > +{
> > + struct drm_panfrost_sync_bo *args = data;
> > + struct drm_panfrost_bo_sync_op *ops;
> > + struct drm_gem_object *obj;
> > + int ret;
> > + u32 i;
> > +
> > + if (args->pad)
> > + return -EINVAL;
> > +
> > + ops = kvmalloc_array(args->op_count, sizeof(*ops), GFP_KERNEL);
> > + if (!ops) {
> > + DRM_DEBUG("Failed to allocate incoming BO sync ops array\n");
> > + return -ENOMEM;
> > + }
> > +
> > + if (copy_from_user(ops, (void __user *)(uintptr_t)args->ops,
> > + args->op_count * sizeof(*ops))) {
> > + DRM_DEBUG("Failed to copy in BO sync ops\n");
> > + ret = -EFAULT;
> > + goto err_ops;
> > + }
> > +
> > + for (i = 0; i < args->op_count; i++) {
>
> If args->op_count is 0, if I'm not mistaken, kvmalloc_array and
> copy_to_user will succeed, but then this function will return
> unitialized value. Maybe add an explicit check for op_count == 0
> at the beginning and avoid going through all that code?
If args->op_count=0 the loop would be exited right away, so I'm not too
sure where the problem is. This being said, I agree it's not worth
going through kvmalloc_array() and copy_from_user() if we know there's
nothing to do. And it's probably a bit fragile to rely on
kvmalloc_array() not returning NULL when the size is zero (I actually
thought it was), so I agree we'd rather bail out early in that case.
>
> > + if (ops[i].flags & ~PANFROST_BO_SYNC_OP_FLAGS) {
> > + ret = -EINVAL;
> > + goto err_ops;
> > + }
> > +
> > + obj = drm_gem_object_lookup(file, ops[i].handle);
> > + if (!obj) {
> > + ret = -ENOENT;
> > + goto err_ops;
> > + }
> > +
> > + ret = panfrost_gem_sync(obj, ops[i].flags,
> > + ops[i].offset, ops[i].size);
> > +
> > + drm_gem_object_put(obj);
> > +
> > + if (ret)
> > + goto err_ops;
> > + }
> > +
> > +err_ops:
> > + kvfree(ops);
> > +
> > + return ret;
> > +}
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-16 9:40 ` Boris Brezillon
@ 2025-10-16 9:58 ` Thomas Zimmermann
2025-10-16 10:32 ` Boris Brezillon
0 siblings, 1 reply; 40+ messages in thread
From: Thomas Zimmermann @ 2025-10-16 9:58 UTC (permalink / raw)
To: Boris Brezillon
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
David Airlie, Simona Vetter, Faith Ekstrand, kernel
Hi
Am 16.10.25 um 11:40 schrieb Boris Brezillon:
> Hi Thomas,
>
> On Thu, 16 Oct 2025 10:32:46 +0200
> Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
>> Hi,
>>
>> on patches 2 to 4: sync is really begin/end access wrapped into one
>> interface, which I find questionable. I also don't like that these
>> patches add generic infrastructure for a single driver.
> It's actually two drivers (panfrost and panthor), and the interface is
> here so other drivers relying on drm_gem_shmem don't have to hand-roll
> these things in the future.
Ok.
But what about the unrelated drivers? Most GEM SHMEM drivers don't need
this. Looking at patch 4, there's a for loop over the SG table, which
appears to affect all drivers.
You know, there are two types of GEM SHMEM users. The ones like panthor
that build upon it with additional features. And the other type that
only use it as simple buffer storage. The former type what's building
blocks to combine as needed; that latter (actually the majority) wants a
simple solution. And TBH, I've considered spitting GEM SHMEM into two
for this reason.
>
>> My proposal is to make your own dma_buf exporter in panthor and set the
>> begin/end_cpu_access functions as you need. Panthor already contains its
>> own GEM export helper at [1]. If you inline drm_gem_prime_export() [2]
>> you can set the cpu_access callbacks to panthor-specific code. The other
>> dma-buf helpers' symbols should be exported and can be re-used. That is
>> a lot less intrusive and should provide what you need.
> I can of course do that in panthor, but then I'll have to duplicate
> the same logic in panfrost. Also, the whole point of making it generic
> is so that people don't forget that begin/end_cpu_access() is a thing
> they should consider (like happened to me if you look at v2 of this
> patchset), otherwise importers of their buffers might have unpleasant
> side effects because of missing flush/invalidates. This, IMHO, is a good
> reason to have it as a drm_gem_funcs::sync() callback. That, or we
> decide that the default dma_buf_ops is not a thing, and we force
> developers to think twice when they select the default hooks to pick
> for their dma_buf implementation.
See my comment above. Panthor and panfrost should treat GEM SHMEM like a
set of building blocks rather and a full solution.
>
>> I found that a hand full of other DRM drivers implement dma-buf's
>> begin/end access callbacks. If you want a generic begin/end interface
>> for GEM, you certainly want to get them on board. If there's something
>> common to share, this should be done in a separate series.
> Fair enough. I'll try to convert freedreno and imagination to this new
> interface, and gather some feedback.
I did 'git grep \.begin_cpu_access' in the drm directory. This returned,
amdgpu, i915, tegra, omap, and xe. These where the ones I had in mind.
Best regards
Thomas
>
> Regards,
>
> Boris
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 11/14] drm/panfrost: Add a PANFROST_SYNC_BO ioctl
2025-10-16 9:52 ` Boris Brezillon
@ 2025-10-16 10:02 ` Marcin Ślusarz
2025-10-16 10:35 ` Boris Brezillon
0 siblings, 1 reply; 40+ messages in thread
From: Marcin Ślusarz @ 2025-10-16 10:02 UTC (permalink / raw)
To: Boris Brezillon
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
kernel, nd
On Thu, Oct 16, 2025 at 11:52:24AM +0200, Boris Brezillon wrote:
> On Thu, 16 Oct 2025 10:42:21 +0200
> Marcin Ślusarz <marcin.slusarz@arm.com> wrote:
>
> > On Wed, Oct 15, 2025 at 06:03:23PM +0200, Boris Brezillon wrote:
> > > +static int panfrost_ioctl_sync_bo(struct drm_device *ddev, void *data,
> > > + struct drm_file *file)
> > > +{
> > > + struct drm_panfrost_sync_bo *args = data;
> > > + struct drm_panfrost_bo_sync_op *ops;
> > > + struct drm_gem_object *obj;
> > > + int ret;
> > > + u32 i;
> > > +
> > > + if (args->pad)
> > > + return -EINVAL;
> > > +
> > > + ops = kvmalloc_array(args->op_count, sizeof(*ops), GFP_KERNEL);
> > > + if (!ops) {
> > > + DRM_DEBUG("Failed to allocate incoming BO sync ops array\n");
> > > + return -ENOMEM;
> > > + }
> > > +
> > > + if (copy_from_user(ops, (void __user *)(uintptr_t)args->ops,
> > > + args->op_count * sizeof(*ops))) {
> > > + DRM_DEBUG("Failed to copy in BO sync ops\n");
> > > + ret = -EFAULT;
> > > + goto err_ops;
> > > + }
> > > +
> > > + for (i = 0; i < args->op_count; i++) {
> >
> > If args->op_count is 0, if I'm not mistaken, kvmalloc_array and
> > copy_to_user will succeed, but then this function will return
> > unitialized value. Maybe add an explicit check for op_count == 0
> > at the beginning and avoid going through all that code?
>
> If args->op_count=0 the loop would be exited right away, so I'm not too
> sure where the problem is.
Maybe I didn't explain it correctly, so let me clear that up:
ret is not initialized and not set anywhere if op_count is 0.
> This being said, I agree it's not worth
> going through kvmalloc_array() and copy_from_user() if we know there's
> nothing to do. And it's probably a bit fragile to rely on
> kvmalloc_array() not returning NULL when the size is zero (I actually
> thought it was), so I agree we'd rather bail out early in that case.
>
> >
> > > + if (ops[i].flags & ~PANFROST_BO_SYNC_OP_FLAGS) {
> > > + ret = -EINVAL;
> > > + goto err_ops;
> > > + }
> > > +
> > > + obj = drm_gem_object_lookup(file, ops[i].handle);
> > > + if (!obj) {
> > > + ret = -ENOENT;
> > > + goto err_ops;
> > > + }
> > > +
> > > + ret = panfrost_gem_sync(obj, ops[i].flags,
> > > + ops[i].offset, ops[i].size);
> > > +
> > > + drm_gem_object_put(obj);
> > > +
> > > + if (ret)
> > > + goto err_ops;
> > > + }
> > > +
> > > +err_ops:
> > > + kvfree(ops);
> > > +
> > > + return ret;
> > > +}
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-16 9:58 ` Thomas Zimmermann
@ 2025-10-16 10:32 ` Boris Brezillon
2025-10-16 11:42 ` Thomas Zimmermann
0 siblings, 1 reply; 40+ messages in thread
From: Boris Brezillon @ 2025-10-16 10:32 UTC (permalink / raw)
To: Thomas Zimmermann
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
David Airlie, Simona Vetter, Faith Ekstrand, kernel
On Thu, 16 Oct 2025 11:58:21 +0200
Thomas Zimmermann <tzimmermann@suse.de> wrote:
> Hi
>
> Am 16.10.25 um 11:40 schrieb Boris Brezillon:
> > Hi Thomas,
> >
> > On Thu, 16 Oct 2025 10:32:46 +0200
> > Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >
> >> Hi,
> >>
> >> on patches 2 to 4: sync is really begin/end access wrapped into one
> >> interface, which I find questionable. I also don't like that these
> >> patches add generic infrastructure for a single driver.
> > It's actually two drivers (panfrost and panthor), and the interface is
> > here so other drivers relying on drm_gem_shmem don't have to hand-roll
> > these things in the future.
>
> Ok.
>
> But what about the unrelated drivers? Most GEM SHMEM drivers don't need
> this. Looking at patch 4, there's a for loop over the SG table, which
> appears to affect all drivers.
It doesn't. First off, this ::sync hook is optional, and if you look at
patch 4's commit message, it says:
"
We provide a drm_gem_shmem_object_sync() for drivers that wants to
have this default implementation as their drm_gem_object_funcs::sync,
but we don't set drm_gem_shmem_funcs::sync to
drm_gem_shmem_object_sync() to avoid changing the behavior of drivers
currently relying on the default gem_shmem vtable.
"
>
> You know, there are two types of GEM SHMEM users. The ones like panthor
> that build upon it with additional features. And the other type that
> only use it as simple buffer storage. The former type what's building
> blocks to combine as needed; that latter (actually the majority) wants a
> simple solution. And TBH, I've considered spitting GEM SHMEM into two
> for this reason.
That's the very reason I don't force
::sync = drm_gem_shmem_sync
but leave it to each driver to decide whether they want it or not.
>
> >
> >> My proposal is to make your own dma_buf exporter in panthor and set the
> >> begin/end_cpu_access functions as you need. Panthor already contains its
> >> own GEM export helper at [1]. If you inline drm_gem_prime_export() [2]
> >> you can set the cpu_access callbacks to panthor-specific code. The other
> >> dma-buf helpers' symbols should be exported and can be re-used. That is
> >> a lot less intrusive and should provide what you need.
> > I can of course do that in panthor, but then I'll have to duplicate
> > the same logic in panfrost. Also, the whole point of making it generic
> > is so that people don't forget that begin/end_cpu_access() is a thing
> > they should consider (like happened to me if you look at v2 of this
> > patchset), otherwise importers of their buffers might have unpleasant
> > side effects because of missing flush/invalidates. This, IMHO, is a good
> > reason to have it as a drm_gem_funcs::sync() callback. That, or we
> > decide that the default dma_buf_ops is not a thing, and we force
> > developers to think twice when they select the default hooks to pick
> > for their dma_buf implementation.
>
> See my comment above. Panthor and panfrost should treat GEM SHMEM like a
> set of building blocks rather and a full solution.
That's exactly what it does. Panfrost/Panthor have their own
drm_gem_object_funcs, they just use the default drm_gem_shmem_sync
implementation provided by this patch, that's all. And yes,
hand-rolling your own drm_prime implem is not only annoying, it's also
error prone because some of the gem_shmem helpers [1] might rely on
drm_gem_is_prime_exported_dma_buf(), which checks the dma_buf ops
against the default drm_prime implementation.
>
> >
> >> I found that a hand full of other DRM drivers implement dma-buf's
> >> begin/end access callbacks. If you want a generic begin/end interface
> >> for GEM, you certainly want to get them on board. If there's something
> >> common to share, this should be done in a separate series.
> > Fair enough. I'll try to convert freedreno and imagination to this new
> > interface, and gather some feedback.
>
> I did 'git grep \.begin_cpu_access' in the drm directory. This returned,
> amdgpu, i915, tegra, omap, and xe. These where the ones I had in mind.
Hm, so tegra/omap should be relatively easy, but I'm reluctant to touch
the big drivers here (amdgpu, i915 and xe). Because it's an opt-in, and
because these drivers already have hand-rolled these dma_buf_ops, I'd
rather let their owners deal with the transition if they want to. IOW,
if you ask me to find new users, I'd like to focus on relatively small
drivers, and primarily those relying on drm_gem_shmem already. Note
that there might be drivers lacking a {begin,end}_cpu_access()
implementation, but don't know it, because there's very few use cases
where GPU is the exporter and the importer is not the same GPU device.
The other reason would be that most drivers relying on gem_shmem set
::map_wc=true unconditionally (CPU mappings are not cached, and thus
don't require synchronization), or just that those devices are IO
coherent.
Honestly, I'm not too sure why this is a problem if this hook is
optional. If it turns out to be too simple for more complex use cases
others have, it can still be extended when those drivers transition to
this ::sync() approach, as no in-kernel API is immutable. And in the
meantime, we have a solution for two drivers that doesn't imply
duplicating a bunch of drm_prime boiler-plate that's otherwise rather
generic.
[1]https://elixir.bootlin.com/linux/v6.17.1/source/drivers/gpu/drm/drm_gem_shmem_helper.c#L825
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 11/14] drm/panfrost: Add a PANFROST_SYNC_BO ioctl
2025-10-16 10:02 ` Marcin Ślusarz
@ 2025-10-16 10:35 ` Boris Brezillon
0 siblings, 0 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-10-16 10:35 UTC (permalink / raw)
To: Marcin Ślusarz
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
kernel, nd
On Thu, 16 Oct 2025 12:02:10 +0200
Marcin Ślusarz <marcin.slusarz@arm.com> wrote:
> On Thu, Oct 16, 2025 at 11:52:24AM +0200, Boris Brezillon wrote:
> > On Thu, 16 Oct 2025 10:42:21 +0200
> > Marcin Ślusarz <marcin.slusarz@arm.com> wrote:
> >
> > > On Wed, Oct 15, 2025 at 06:03:23PM +0200, Boris Brezillon wrote:
> > > > +static int panfrost_ioctl_sync_bo(struct drm_device *ddev, void *data,
> > > > + struct drm_file *file)
> > > > +{
> > > > + struct drm_panfrost_sync_bo *args = data;
> > > > + struct drm_panfrost_bo_sync_op *ops;
> > > > + struct drm_gem_object *obj;
> > > > + int ret;
> > > > + u32 i;
> > > > +
> > > > + if (args->pad)
> > > > + return -EINVAL;
> > > > +
> > > > + ops = kvmalloc_array(args->op_count, sizeof(*ops), GFP_KERNEL);
> > > > + if (!ops) {
> > > > + DRM_DEBUG("Failed to allocate incoming BO sync ops array\n");
> > > > + return -ENOMEM;
> > > > + }
> > > > +
> > > > + if (copy_from_user(ops, (void __user *)(uintptr_t)args->ops,
> > > > + args->op_count * sizeof(*ops))) {
> > > > + DRM_DEBUG("Failed to copy in BO sync ops\n");
> > > > + ret = -EFAULT;
> > > > + goto err_ops;
> > > > + }
> > > > +
> > > > + for (i = 0; i < args->op_count; i++) {
> > >
> > > If args->op_count is 0, if I'm not mistaken, kvmalloc_array and
> > > copy_to_user will succeed, but then this function will return
> > > unitialized value. Maybe add an explicit check for op_count == 0
> > > at the beginning and avoid going through all that code?
> >
> > If args->op_count=0 the loop would be exited right away, so I'm not too
> > sure where the problem is.
>
> Maybe I didn't explain it correctly, so let me clear that up:
> ret is not initialized and not set anywhere if op_count is 0.
Ah, right. I overlooked the fact ret wasn't initialized. Sorry for the
noise.
>
> > This being said, I agree it's not worth
> > going through kvmalloc_array() and copy_from_user() if we know there's
> > nothing to do. And it's probably a bit fragile to rely on
> > kvmalloc_array() not returning NULL when the size is zero (I actually
> > thought it was), so I agree we'd rather bail out early in that case.
> >
> > >
> > > > + if (ops[i].flags & ~PANFROST_BO_SYNC_OP_FLAGS) {
> > > > + ret = -EINVAL;
> > > > + goto err_ops;
> > > > + }
> > > > +
> > > > + obj = drm_gem_object_lookup(file, ops[i].handle);
> > > > + if (!obj) {
> > > > + ret = -ENOENT;
> > > > + goto err_ops;
> > > > + }
> > > > +
> > > > + ret = panfrost_gem_sync(obj, ops[i].flags,
> > > > + ops[i].offset, ops[i].size);
> > > > +
> > > > + drm_gem_object_put(obj);
> > > > +
> > > > + if (ret)
> > > > + goto err_ops;
> > > > + }
> > > > +
> > > > +err_ops:
> > > > + kvfree(ops);
> > > > +
> > > > + return ret;
> > > > +}
> >
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-16 10:32 ` Boris Brezillon
@ 2025-10-16 11:42 ` Thomas Zimmermann
2025-10-16 12:24 ` Boris Brezillon
2025-10-30 14:11 ` Boris Brezillon
0 siblings, 2 replies; 40+ messages in thread
From: Thomas Zimmermann @ 2025-10-16 11:42 UTC (permalink / raw)
To: Boris Brezillon
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
David Airlie, Simona Vetter, Faith Ekstrand, kernel
Hi
Am 16.10.25 um 12:32 schrieb Boris Brezillon:
> On Thu, 16 Oct 2025 11:58:21 +0200
> Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
>> Hi
>>
>> Am 16.10.25 um 11:40 schrieb Boris Brezillon:
>>> Hi Thomas,
>>>
>>> On Thu, 16 Oct 2025 10:32:46 +0200
>>> Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>>
>>>> Hi,
>>>>
>>>> on patches 2 to 4: sync is really begin/end access wrapped into one
>>>> interface, which I find questionable. I also don't like that these
>>>> patches add generic infrastructure for a single driver.
>>> It's actually two drivers (panfrost and panthor), and the interface is
>>> here so other drivers relying on drm_gem_shmem don't have to hand-roll
>>> these things in the future.
>> Ok.
>>
>> But what about the unrelated drivers? Most GEM SHMEM drivers don't need
>> this. Looking at patch 4, there's a for loop over the SG table, which
>> appears to affect all drivers.
> It doesn't. First off, this ::sync hook is optional, and if you look at
> patch 4's commit message, it says:
>
> "
> We provide a drm_gem_shmem_object_sync() for drivers that wants to
> have this default implementation as their drm_gem_object_funcs::sync,
> but we don't set drm_gem_shmem_funcs::sync to
> drm_gem_shmem_object_sync() to avoid changing the behavior of drivers
> currently relying on the default gem_shmem vtable.
> "
>
>> You know, there are two types of GEM SHMEM users. The ones like panthor
>> that build upon it with additional features. And the other type that
>> only use it as simple buffer storage. The former type what's building
>> blocks to combine as needed; that latter (actually the majority) wants a
>> simple solution. And TBH, I've considered spitting GEM SHMEM into two
>> for this reason.
> That's the very reason I don't force
>
> ::sync = drm_gem_shmem_sync
>
> but leave it to each driver to decide whether they want it or not.
Apologies for misunderstanding. My impression was that the sync helper
is now the default. But that brings me to another question: where do you
set the gem callback? I applied your series and grepped for it. It's not
set in the panthor/panfrost gem funcs nor the default funcs for shmem. I
can only find the sync calls in the ioctl code.
>
>>>
>>>> My proposal is to make your own dma_buf exporter in panthor and set the
>>>> begin/end_cpu_access functions as you need. Panthor already contains its
>>>> own GEM export helper at [1]. If you inline drm_gem_prime_export() [2]
>>>> you can set the cpu_access callbacks to panthor-specific code. The other
>>>> dma-buf helpers' symbols should be exported and can be re-used. That is
>>>> a lot less intrusive and should provide what you need.
>>> I can of course do that in panthor, but then I'll have to duplicate
>>> the same logic in panfrost. Also, the whole point of making it generic
>>> is so that people don't forget that begin/end_cpu_access() is a thing
>>> they should consider (like happened to me if you look at v2 of this
>>> patchset), otherwise importers of their buffers might have unpleasant
>>> side effects because of missing flush/invalidates. This, IMHO, is a good
>>> reason to have it as a drm_gem_funcs::sync() callback. That, or we
>>> decide that the default dma_buf_ops is not a thing, and we force
>>> developers to think twice when they select the default hooks to pick
>>> for their dma_buf implementation.
>> See my comment above. Panthor and panfrost should treat GEM SHMEM like a
>> set of building blocks rather and a full solution.
> That's exactly what it does. Panfrost/Panthor have their own
> drm_gem_object_funcs, they just use the default drm_gem_shmem_sync
> implementation provided by this patch, that's all. And yes,
> hand-rolling your own drm_prime implem is not only annoying, it's also
> error prone because some of the gem_shmem helpers [1] might rely on
> drm_gem_is_prime_exported_dma_buf(), which checks the dma_buf ops
> against the default drm_prime implementation.
I know, but that's a shortcoming of the current implementation. IIRC all
driver's prime importers have code like [1] at the top, so we probably
should do that in common code.
But that's not the point here. I really do think that drivers either use
a simple turn-key solution OR hand-pick the building blocks for the
complex scenarios. There's no middle ground IMHO. IIRC the original TTM
was supposed to provide a full solution for the complex scenarios and
failed at that.
>
>>>
>>>> I found that a hand full of other DRM drivers implement dma-buf's
>>>> begin/end access callbacks. If you want a generic begin/end interface
>>>> for GEM, you certainly want to get them on board. If there's something
>>>> common to share, this should be done in a separate series.
>>> Fair enough. I'll try to convert freedreno and imagination to this new
>>> interface, and gather some feedback.
>> I did 'git grep \.begin_cpu_access' in the drm directory. This returned,
>> amdgpu, i915, tegra, omap, and xe. These where the ones I had in mind.
> Hm, so tegra/omap should be relatively easy, but I'm reluctant to touch
> the big drivers here (amdgpu, i915 and xe). Because it's an opt-in, and
> because these drivers already have hand-rolled these dma_buf_ops, I'd
> rather let their owners deal with the transition if they want to. IOW,
> if you ask me to find new users, I'd like to focus on relatively small
> drivers, and primarily those relying on drm_gem_shmem already. Note
> that there might be drivers lacking a {begin,end}_cpu_access()
> implementation, but don't know it, because there's very few use cases
> where GPU is the exporter and the importer is not the same GPU device.
> The other reason would be that most drivers relying on gem_shmem set
> ::map_wc=true unconditionally (CPU mappings are not cached, and thus
> don't require synchronization), or just that those devices are IO
> coherent.
I'm not asking you to rewrite all these drivers, but to take them into
consideration. Feedback from their devs would be valuable. Semantics of
similar operations in dma-buf and GEM sometimes differ in subtle ways.
Things like pinning, locking and eviction might be of importance to make
sync work for all drivers.
>
> Honestly, I'm not too sure why this is a problem if this hook is
> optional. If it turns out to be too simple for more complex use cases
> others have, it can still be extended when those drivers transition to
> this ::sync() approach, as no in-kernel API is immutable. And in the
> meantime, we have a solution for two drivers that doesn't imply
> duplicating a bunch of drm_prime boiler-plate that's otherwise rather
> generic.
The prime code you'd have to duplicate is just 10 lines, plus some small
per-driver code. Most of that being data-structure inits.
I want to point out that I'm not opposing the general idea of GEM sync,
but I think it should get more feedback from others. It's supposed to be
a generic interface after all. Hence I was asking to put all this into a
separate series.
Best regards
Thomas
>
> [1]https://elixir.bootlin.com/linux/v6.17.1/source/drivers/gpu/drm/drm_gem_shmem_helper.c#L825
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-16 11:42 ` Thomas Zimmermann
@ 2025-10-16 12:24 ` Boris Brezillon
2025-10-30 14:11 ` Boris Brezillon
1 sibling, 0 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-10-16 12:24 UTC (permalink / raw)
To: Thomas Zimmermann
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
David Airlie, Simona Vetter, Faith Ekstrand, kernel
On Thu, 16 Oct 2025 13:42:59 +0200
Thomas Zimmermann <tzimmermann@suse.de> wrote:
> Hi
>
> Am 16.10.25 um 12:32 schrieb Boris Brezillon:
> > On Thu, 16 Oct 2025 11:58:21 +0200
> > Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >
> >> Hi
> >>
> >> Am 16.10.25 um 11:40 schrieb Boris Brezillon:
> >>> Hi Thomas,
> >>>
> >>> On Thu, 16 Oct 2025 10:32:46 +0200
> >>> Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> on patches 2 to 4: sync is really begin/end access wrapped into one
> >>>> interface, which I find questionable. I also don't like that these
> >>>> patches add generic infrastructure for a single driver.
> >>> It's actually two drivers (panfrost and panthor), and the interface is
> >>> here so other drivers relying on drm_gem_shmem don't have to hand-roll
> >>> these things in the future.
> >> Ok.
> >>
> >> But what about the unrelated drivers? Most GEM SHMEM drivers don't need
> >> this. Looking at patch 4, there's a for loop over the SG table, which
> >> appears to affect all drivers.
> > It doesn't. First off, this ::sync hook is optional, and if you look at
> > patch 4's commit message, it says:
> >
> > "
> > We provide a drm_gem_shmem_object_sync() for drivers that wants to
> > have this default implementation as their drm_gem_object_funcs::sync,
> > but we don't set drm_gem_shmem_funcs::sync to
> > drm_gem_shmem_object_sync() to avoid changing the behavior of drivers
> > currently relying on the default gem_shmem vtable.
> > "
> >
> >> You know, there are two types of GEM SHMEM users. The ones like panthor
> >> that build upon it with additional features. And the other type that
> >> only use it as simple buffer storage. The former type what's building
> >> blocks to combine as needed; that latter (actually the majority) wants a
> >> simple solution. And TBH, I've considered spitting GEM SHMEM into two
> >> for this reason.
> > That's the very reason I don't force
> >
> > ::sync = drm_gem_shmem_sync
> >
> > but leave it to each driver to decide whether they want it or not.
>
> Apologies for misunderstanding. My impression was that the sync helper
> is now the default. But that brings me to another question: where do you
> set the gem callback? I applied your series and grepped for it. It's not
> set in the panthor/panfrost gem funcs nor the default funcs for shmem. I
> can only find the sync calls in the ioctl code.
Oops. Looks like I forgot to explicitly assign the sync hook
:face_palm:.
>
> >
> >>>
> >>>> My proposal is to make your own dma_buf exporter in panthor and set the
> >>>> begin/end_cpu_access functions as you need. Panthor already contains its
> >>>> own GEM export helper at [1]. If you inline drm_gem_prime_export() [2]
> >>>> you can set the cpu_access callbacks to panthor-specific code. The other
> >>>> dma-buf helpers' symbols should be exported and can be re-used. That is
> >>>> a lot less intrusive and should provide what you need.
> >>> I can of course do that in panthor, but then I'll have to duplicate
> >>> the same logic in panfrost. Also, the whole point of making it generic
> >>> is so that people don't forget that begin/end_cpu_access() is a thing
> >>> they should consider (like happened to me if you look at v2 of this
> >>> patchset), otherwise importers of their buffers might have unpleasant
> >>> side effects because of missing flush/invalidates. This, IMHO, is a good
> >>> reason to have it as a drm_gem_funcs::sync() callback. That, or we
> >>> decide that the default dma_buf_ops is not a thing, and we force
> >>> developers to think twice when they select the default hooks to pick
> >>> for their dma_buf implementation.
> >> See my comment above. Panthor and panfrost should treat GEM SHMEM like a
> >> set of building blocks rather and a full solution.
> > That's exactly what it does. Panfrost/Panthor have their own
> > drm_gem_object_funcs, they just use the default drm_gem_shmem_sync
> > implementation provided by this patch, that's all. And yes,
> > hand-rolling your own drm_prime implem is not only annoying, it's also
> > error prone because some of the gem_shmem helpers [1] might rely on
> > drm_gem_is_prime_exported_dma_buf(), which checks the dma_buf ops
> > against the default drm_prime implementation.
>
> I know, but that's a shortcoming of the current implementation. IIRC all
> driver's prime importers have code like [1] at the top, so we probably
> should do that in common code.
Right (I mention it later in this reply), but that's assuming the rest
of the helpers are truly designed to be generic, and that would require
going through all of these to be sure.
>
> But that's not the point here.
It kinda is though. Initially you said it's 10 lines of code only doing
struct init or whatnot, but it's actually more than that in practice.
And this glue is just the sort of annoying code base that you'd rather
have centralized so you don't have to fix an entire stack of drivers
when you fix something in the default helper.
> I really do think that drivers either use
> a simple turn-key solution OR hand-pick the building blocks for the
> complex scenarios. There's no middle ground IMHO. IIRC the original TTM
> was supposed to provide a full solution for the complex scenarios and
> failed at that.
But is it really a complex scenario I'm trying to support here? I'm
basically translating the dma_buf semantics into some standard DRM
hook. In a sense, that's what all drivers implementing their own
::begin/end_cpu_access() do already...
>
>
> >
> >>>
> >>>> I found that a hand full of other DRM drivers implement dma-buf's
> >>>> begin/end access callbacks. If you want a generic begin/end interface
> >>>> for GEM, you certainly want to get them on board. If there's something
> >>>> common to share, this should be done in a separate series.
> >>> Fair enough. I'll try to convert freedreno and imagination to this new
> >>> interface, and gather some feedback.
> >> I did 'git grep \.begin_cpu_access' in the drm directory. This returned,
> >> amdgpu, i915, tegra, omap, and xe. These where the ones I had in mind.
> > Hm, so tegra/omap should be relatively easy, but I'm reluctant to touch
> > the big drivers here (amdgpu, i915 and xe). Because it's an opt-in, and
> > because these drivers already have hand-rolled these dma_buf_ops, I'd
> > rather let their owners deal with the transition if they want to. IOW,
> > if you ask me to find new users, I'd like to focus on relatively small
> > drivers, and primarily those relying on drm_gem_shmem already. Note
> > that there might be drivers lacking a {begin,end}_cpu_access()
> > implementation, but don't know it, because there's very few use cases
> > where GPU is the exporter and the importer is not the same GPU device.
> > The other reason would be that most drivers relying on gem_shmem set
> > ::map_wc=true unconditionally (CPU mappings are not cached, and thus
> > don't require synchronization), or just that those devices are IO
> > coherent.
>
> I'm not asking you to rewrite all these drivers, but to take them into
> consideration.
I did. I looked at Tegra, Xe and MSM for instance, and given the sync()
hook semantics are very close to the begin/end_cpu_access() ones
(except things are combined in one hook instead of two), I thought it
could achieve the same thing custom begin/end_cpu_access() implems do.
> Feedback from their devs would be valuable. Semantics of
> similar operations in dma-buf and GEM sometimes differ in subtle ways.
> Things like pinning, locking and eviction might be of importance to make
> sync work for all drivers.
>
> >
> > Honestly, I'm not too sure why this is a problem if this hook is
> > optional. If it turns out to be too simple for more complex use cases
> > others have, it can still be extended when those drivers transition to
> > this ::sync() approach, as no in-kernel API is immutable. And in the
> > meantime, we have a solution for two drivers that doesn't imply
> > duplicating a bunch of drm_prime boiler-plate that's otherwise rather
> > generic.
>
> The prime code you'd have to duplicate is just 10 lines, plus some small
> per-driver code. Most of that being data-structure inits.
But it's more than just the struct init you have to duplicate. Look at
drm_gem_prime_import_dev() function for instance. It has
drm_gem_is_prime_exported_dma_buf() in it, so all of a sudden, you pull
a custom implem of drm_gem_prime_import() too, and I suspect that's not
the only thing that needs to be duplicated. Yes there are ways
around it, like having the dma_buf_ops stored at the drm_device
level, so drm_gem_is_prime_exported_dma_buf() can test against
dev->dma_buf_ops instead of &drm_gem_prime_dmabuf_ops, but my point
stands, it's usually more complicate than 10 lines of simple struct
inits as you seem to imply.
FWIW, I originally considered hand-rolling the drm_prime code for
panthor, and realized I'd have to do the same for panfrost, because
despite the mix-and-match approach that originally drove this
code-sharing mantra, there's just a lot helpers that are still designed
to be used only with the default implementation. Given the notion of
synchronization is common to all HW (I mean, dma_buf abstracts that
sync-around-cpu-access concept, so why can't DRM do the same?), I
figured it'd be way simpler, and more useful to others to have this
centralized at the GEM level.
>
> I want to point out that I'm not opposing the general idea of GEM sync,
> but I think it should get more feedback from others. It's supposed to be
> a generic interface after all. Hence I was asking to put all this into a
> separate series.
Ack. I'll post a new version with a few more drivers converted to it,
and try to Cc the intel/nouveau/amd people so I can get their feedback
too.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-16 8:13 ` Maxime Ripard
@ 2025-10-16 12:57 ` Boris Brezillon
2025-10-16 13:07 ` Boris Brezillon
0 siblings, 1 reply; 40+ messages in thread
From: Boris Brezillon @ 2025-10-16 12:57 UTC (permalink / raw)
To: Maxime Ripard
Cc: Steven Price, dri-devel, Maarten Lankhorst, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, kernel
On Thu, 16 Oct 2025 10:13:25 +0200
Maxime Ripard <mripard@kernel.org> wrote:
> Hi,
>
> On Wed, Oct 15, 2025 at 06:03:14PM +0200, Boris Brezillon wrote:
> > Prepare things for standardizing synchronization around CPU accesses
> > of GEM buffers. This will be used to provide default
> > drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and provide
> > a way for drivers to add their own ioctls to synchronize CPU
> > writes/reads when they can't do it directly from userland.
> >
> > v2:
> > - New commit
> >
> > v3:
> > - No changes
> >
> > v4:
> > - Add Steve's R-b
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > Reviewed-by: Steven Price <steven.price@arm.com>
> > ---
> > drivers/gpu/drm/drm_gem.c | 10 +++++++++
> > include/drm/drm_gem.h | 45 +++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 55 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> > index a1a9c828938b..a1431e4f2404 100644
> > --- a/drivers/gpu/drm/drm_gem.c
> > +++ b/drivers/gpu/drm/drm_gem.c
> > @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
> > }
> > EXPORT_SYMBOL(drm_gem_vunmap);
> >
> > +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
> > + enum drm_gem_object_access_flags access)
> > +{
> > + if (obj->funcs->sync)
> > + return obj->funcs->sync(obj, offset, size, access);
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL(drm_gem_sync);
> > +
> > /**
> > * drm_gem_lock_reservations - Sets up the ww context and acquires
> > * the lock on an array of GEM objects.
> > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > index 8d48d2af2649..1c33e59ab305 100644
> > --- a/include/drm/drm_gem.h
> > +++ b/include/drm/drm_gem.h
> > @@ -66,6 +66,33 @@ enum drm_gem_object_status {
> > DRM_GEM_OBJECT_ACTIVE = BIT(2),
> > };
> >
> > +/**
> > + * enum drm_gem_object_status - bitmask describing GEM access types to prepare for
>
> Treating an enum as a bitmask is a bit weird to me. I'd say either have
> a bitmask with BIT(enum values), or no enum at all.
I'll drop the enum and make it pure defines.
>
> > + */
> > +enum drm_gem_object_access_flags {
> > + /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. */
> > + DRM_GEM_OBJECT_CPU_ACCESS = 0,
> > +
> > + /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device access. */
> > + DRM_GEM_OBJECT_DEV_ACCESS = BIT(0),
> > +
> > + /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the entity doing the access. */
> > + DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0),
>
> Do we really want to have to variants with the same discriminant? If so,
> we should document why it's something we want.
>
> > + /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only accesses. */
> > + DRM_GEM_OBJECT_READ_ACCESS = BIT(1),
> > +
> > + /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only accesses. */
> > + DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2),
> > +
> > + /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write accesses. */
> > + DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS |
> > + DRM_GEM_OBJECT_WRITE_ACCESS,
> > +
> > + /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check the access type. */
> > + DRM_GEM_OBJECT_ACCESS_TYPE_MASK = DRM_GEM_OBJECT_RW_ACCESS,
>
> Same thing.
>
> Or is it that you encode both the direction and access type, and have a
> mask to isolate each?
This ^.
>
> If so, we should really move it out from an enum into defines, or treat each
> separately like dma_sync_does.
Sure, I can do that.
>
> Maxime
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-16 12:57 ` Boris Brezillon
@ 2025-10-16 13:07 ` Boris Brezillon
2025-10-29 9:30 ` Maxime Ripard
0 siblings, 1 reply; 40+ messages in thread
From: Boris Brezillon @ 2025-10-16 13:07 UTC (permalink / raw)
To: Maxime Ripard
Cc: Steven Price, dri-devel, Maarten Lankhorst, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, kernel
On Thu, 16 Oct 2025 14:57:08 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:
> On Thu, 16 Oct 2025 10:13:25 +0200
> Maxime Ripard <mripard@kernel.org> wrote:
>
> > Hi,
> >
> > On Wed, Oct 15, 2025 at 06:03:14PM +0200, Boris Brezillon wrote:
> > > Prepare things for standardizing synchronization around CPU accesses
> > > of GEM buffers. This will be used to provide default
> > > drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and provide
> > > a way for drivers to add their own ioctls to synchronize CPU
> > > writes/reads when they can't do it directly from userland.
> > >
> > > v2:
> > > - New commit
> > >
> > > v3:
> > > - No changes
> > >
> > > v4:
> > > - Add Steve's R-b
> > >
> > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > Reviewed-by: Steven Price <steven.price@arm.com>
> > > ---
> > > drivers/gpu/drm/drm_gem.c | 10 +++++++++
> > > include/drm/drm_gem.h | 45 +++++++++++++++++++++++++++++++++++++++
> > > 2 files changed, 55 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> > > index a1a9c828938b..a1431e4f2404 100644
> > > --- a/drivers/gpu/drm/drm_gem.c
> > > +++ b/drivers/gpu/drm/drm_gem.c
> > > @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
> > > }
> > > EXPORT_SYMBOL(drm_gem_vunmap);
> > >
> > > +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
> > > + enum drm_gem_object_access_flags access)
> > > +{
> > > + if (obj->funcs->sync)
> > > + return obj->funcs->sync(obj, offset, size, access);
> > > +
> > > + return 0;
> > > +}
> > > +EXPORT_SYMBOL(drm_gem_sync);
> > > +
> > > /**
> > > * drm_gem_lock_reservations - Sets up the ww context and acquires
> > > * the lock on an array of GEM objects.
> > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > index 8d48d2af2649..1c33e59ab305 100644
> > > --- a/include/drm/drm_gem.h
> > > +++ b/include/drm/drm_gem.h
> > > @@ -66,6 +66,33 @@ enum drm_gem_object_status {
> > > DRM_GEM_OBJECT_ACTIVE = BIT(2),
> > > };
> > >
> > > +/**
> > > + * enum drm_gem_object_status - bitmask describing GEM access types to prepare for
> >
> > Treating an enum as a bitmask is a bit weird to me. I'd say either have
> > a bitmask with BIT(enum values), or no enum at all.
>
> I'll drop the enum and make it pure defines.
>
> >
> > > + */
> > > +enum drm_gem_object_access_flags {
> > > + /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. */
> > > + DRM_GEM_OBJECT_CPU_ACCESS = 0,
> > > +
> > > + /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device access. */
> > > + DRM_GEM_OBJECT_DEV_ACCESS = BIT(0),
> > > +
> > > + /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the entity doing the access. */
> > > + DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0),
> >
> > Do we really want to have to variants with the same discriminant? If so,
> > we should document why it's something we want.
> >
> > > + /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only accesses. */
> > > + DRM_GEM_OBJECT_READ_ACCESS = BIT(1),
> > > +
> > > + /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only accesses. */
> > > + DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2),
> > > +
> > > + /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write accesses. */
> > > + DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS |
> > > + DRM_GEM_OBJECT_WRITE_ACCESS,
> > > +
> > > + /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check the access type. */
> > > + DRM_GEM_OBJECT_ACCESS_TYPE_MASK = DRM_GEM_OBJECT_RW_ACCESS,
> >
> > Same thing.
> >
> > Or is it that you encode both the direction and access type, and have a
> > mask to isolate each?
>
> This ^.
>
> >
> > If so, we should really move it out from an enum into defines, or treat each
> > separately like dma_sync_does.
>
> Sure, I can do that.
Actually, looking at the enum just above the one added in this patch
(drm_gem_object_status), it seems that it has the same flaws, and I
think it was the reason I went for this enum-based approach, because I
tend to be consistent with the code base I'm modifying.
Now, I get that defining flags with an enum and then composing those to
then pass the composition to some helper pretending it's still an enum
only works in C (because with C you can do anything you want :D), and
probably not if you're in pedantic mode. But if we want to enforce
that, we should probably fix the existing code base, otherwise this
will keep happening ;-). And no, before you ask, I'm not volunteering
for this :P.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-15 16:03 ` [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper Boris Brezillon
2025-10-16 8:13 ` Maxime Ripard
2025-10-16 8:32 ` Thomas Zimmermann
@ 2025-10-17 14:32 ` Faith Ekstrand
2025-10-17 14:40 ` Faith Ekstrand
2 siblings, 1 reply; 40+ messages in thread
From: Faith Ekstrand @ 2025-10-17 14:32 UTC (permalink / raw)
To: Boris Brezillon
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
kernel
On Wed, Oct 15, 2025 at 12:04 PM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> Prepare things for standardizing synchronization around CPU accesses
> of GEM buffers. This will be used to provide default
> drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and provide
> a way for drivers to add their own ioctls to synchronize CPU
> writes/reads when they can't do it directly from userland.
>
> v2:
> - New commit
>
> v3:
> - No changes
>
> v4:
> - Add Steve's R-b
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Reviewed-by: Steven Price <steven.price@arm.com>
> ---
> drivers/gpu/drm/drm_gem.c | 10 +++++++++
> include/drm/drm_gem.h | 45 +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 55 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index a1a9c828938b..a1431e4f2404 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
> }
> EXPORT_SYMBOL(drm_gem_vunmap);
>
> +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
> + enum drm_gem_object_access_flags access)
> +{
> + if (obj->funcs->sync)
> + return obj->funcs->sync(obj, offset, size, access);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(drm_gem_sync);
> +
> /**
> * drm_gem_lock_reservations - Sets up the ww context and acquires
> * the lock on an array of GEM objects.
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 8d48d2af2649..1c33e59ab305 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -66,6 +66,33 @@ enum drm_gem_object_status {
> DRM_GEM_OBJECT_ACTIVE = BIT(2),
> };
>
> +/**
> + * enum drm_gem_object_status - bitmask describing GEM access types to prepare for
> + */
> +enum drm_gem_object_access_flags {
> + /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. */
> + DRM_GEM_OBJECT_CPU_ACCESS = 0,
> +
> + /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device access. */
> + DRM_GEM_OBJECT_DEV_ACCESS = BIT(0),
> +
> + /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the entity doing the access. */
> + DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0),
> +
> + /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only accesses. */
> + DRM_GEM_OBJECT_READ_ACCESS = BIT(1),
> +
> + /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only accesses. */
> + DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2),
> +
> + /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write accesses. */
> + DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS |
> + DRM_GEM_OBJECT_WRITE_ACCESS,
> +
> + /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check the access type. */
> + DRM_GEM_OBJECT_ACCESS_TYPE_MASK = DRM_GEM_OBJECT_RW_ACCESS,
> +};
> +
> /**
> * struct drm_gem_object_funcs - GEM object functions
> */
> @@ -191,6 +218,21 @@ struct drm_gem_object_funcs {
> */
> int (*mmap)(struct drm_gem_object *obj, struct vm_area_struct *vma);
>
> + /**
> + * @sync:
> + *
> + * Prepare for CPU/device access. This can involve migration of
> + * a buffer to the system-RAM/VRAM, or for UMA, flushing/invalidating
> + * the CPU caches. The range can be used to optimize the synchronization
> + * when possible.
This has gone in a very different direction from the version I sent
out and the added generality makes me really nervous. The idea of sync
involving migration and that the range is a mere hint are antithetical
with Vulkan. It's a very GLish design that assumes that a BO is
exclusively used by one of the CPU or the GPU at the same time. This
simply isn't the case in modern APIs. Older DRM uAPIs (as well as
dma-buf itself) are littered with such ioctls and we're in the process
of deleting them all.
If the BO needs to be migrated in order to be accessed from the CPU,
that needs to happen on map, not on some sort of begin/end. Or better
yet, just disallow mapping such buffers. Once the client has a map,
they are free to access from the CPU while stuff is running on the
GPU. They have to be careful, of course, not to cause data races, but
accessing the same BO from the CPU and GPU or even the same range is
totally okay if you aren't racing.
As a corollary, just don't map PRIME buffers.
And the range really shouldn't be just a hint. With Vulkan, clients
are regularly sub-allocating from larger memory objects. If they ask
to flush 64B and end up flushing 64M, that's pretty bad.
All we need is something which lets us trap through to the kernel for
CPU cache management. That's all we need and that's really all it
should do.
~Faith
> + *
> + * Returns 0 on success, -errno otherwise.
> + *
> + * This callback is optional.
> + */
> + int (*sync)(struct drm_gem_object *obj, size_t offset, size_t size,
> + enum drm_gem_object_access_flags access);
> +
> /**
> * @evict:
> *
> @@ -559,6 +601,9 @@ void drm_gem_unlock(struct drm_gem_object *obj);
> int drm_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map);
> void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map);
>
> +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
> + enum drm_gem_object_access_flags access);
> +
> int drm_gem_objects_lookup(struct drm_file *filp, void __user *bo_handles,
> int count, struct drm_gem_object ***objs_out);
> struct drm_gem_object *drm_gem_object_lookup(struct drm_file *filp, u32 handle);
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-17 14:32 ` Faith Ekstrand
@ 2025-10-17 14:40 ` Faith Ekstrand
2025-10-17 15:26 ` Boris Brezillon
0 siblings, 1 reply; 40+ messages in thread
From: Faith Ekstrand @ 2025-10-17 14:40 UTC (permalink / raw)
To: Boris Brezillon
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
kernel
On Fri, Oct 17, 2025 at 10:32 AM Faith Ekstrand <faith@gfxstrand.net> wrote:
>
> On Wed, Oct 15, 2025 at 12:04 PM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > Prepare things for standardizing synchronization around CPU accesses
> > of GEM buffers. This will be used to provide default
> > drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and provide
> > a way for drivers to add their own ioctls to synchronize CPU
> > writes/reads when they can't do it directly from userland.
> >
> > v2:
> > - New commit
> >
> > v3:
> > - No changes
> >
> > v4:
> > - Add Steve's R-b
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > Reviewed-by: Steven Price <steven.price@arm.com>
> > ---
> > drivers/gpu/drm/drm_gem.c | 10 +++++++++
> > include/drm/drm_gem.h | 45 +++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 55 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> > index a1a9c828938b..a1431e4f2404 100644
> > --- a/drivers/gpu/drm/drm_gem.c
> > +++ b/drivers/gpu/drm/drm_gem.c
> > @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
> > }
> > EXPORT_SYMBOL(drm_gem_vunmap);
> >
> > +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
> > + enum drm_gem_object_access_flags access)
> > +{
> > + if (obj->funcs->sync)
> > + return obj->funcs->sync(obj, offset, size, access);
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL(drm_gem_sync);
> > +
> > /**
> > * drm_gem_lock_reservations - Sets up the ww context and acquires
> > * the lock on an array of GEM objects.
> > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > index 8d48d2af2649..1c33e59ab305 100644
> > --- a/include/drm/drm_gem.h
> > +++ b/include/drm/drm_gem.h
> > @@ -66,6 +66,33 @@ enum drm_gem_object_status {
> > DRM_GEM_OBJECT_ACTIVE = BIT(2),
> > };
> >
> > +/**
> > + * enum drm_gem_object_status - bitmask describing GEM access types to prepare for
> > + */
> > +enum drm_gem_object_access_flags {
> > + /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. */
> > + DRM_GEM_OBJECT_CPU_ACCESS = 0,
> > +
> > + /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device access. */
> > + DRM_GEM_OBJECT_DEV_ACCESS = BIT(0),
> > +
> > + /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the entity doing the access. */
> > + DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0),
> > +
> > + /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only accesses. */
> > + DRM_GEM_OBJECT_READ_ACCESS = BIT(1),
> > +
> > + /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only accesses. */
> > + DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2),
> > +
> > + /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write accesses. */
> > + DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS |
> > + DRM_GEM_OBJECT_WRITE_ACCESS,
> > +
> > + /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check the access type. */
> > + DRM_GEM_OBJECT_ACCESS_TYPE_MASK = DRM_GEM_OBJECT_RW_ACCESS,
> > +};
> > +
> > /**
> > * struct drm_gem_object_funcs - GEM object functions
> > */
> > @@ -191,6 +218,21 @@ struct drm_gem_object_funcs {
> > */
> > int (*mmap)(struct drm_gem_object *obj, struct vm_area_struct *vma);
> >
> > + /**
> > + * @sync:
> > + *
> > + * Prepare for CPU/device access. This can involve migration of
> > + * a buffer to the system-RAM/VRAM, or for UMA, flushing/invalidating
> > + * the CPU caches. The range can be used to optimize the synchronization
> > + * when possible.
>
> This has gone in a very different direction from the version I sent
> out and the added generality makes me really nervous. The idea of sync
> involving migration and that the range is a mere hint are antithetical
> with Vulkan. It's a very GLish design that assumes that a BO is
> exclusively used by one of the CPU or the GPU at the same time. This
> simply isn't the case in modern APIs. Older DRM uAPIs (as well as
> dma-buf itself) are littered with such ioctls and we're in the process
> of deleting them all.
And yes, I realize I sent this on the patch for the hook which you
intended to plumb through to dma-buf. However, I also saw it being
propagated to an ioctl and I didn't know where else to put it that had
the relevant details.
~Faith
> If the BO needs to be migrated in order to be accessed from the CPU,
> that needs to happen on map, not on some sort of begin/end. Or better
> yet, just disallow mapping such buffers. Once the client has a map,
> they are free to access from the CPU while stuff is running on the
> GPU. They have to be careful, of course, not to cause data races, but
> accessing the same BO from the CPU and GPU or even the same range is
> totally okay if you aren't racing.
>
> As a corollary, just don't map PRIME buffers.
>
> And the range really shouldn't be just a hint. With Vulkan, clients
> are regularly sub-allocating from larger memory objects. If they ask
> to flush 64B and end up flushing 64M, that's pretty bad.
>
> All we need is something which lets us trap through to the kernel for
> CPU cache management. That's all we need and that's really all it
> should do.
>
> ~Faith
>
> > + *
> > + * Returns 0 on success, -errno otherwise.
> > + *
> > + * This callback is optional.
> > + */
> > + int (*sync)(struct drm_gem_object *obj, size_t offset, size_t size,
> > + enum drm_gem_object_access_flags access);
> > +
> > /**
> > * @evict:
> > *
> > @@ -559,6 +601,9 @@ void drm_gem_unlock(struct drm_gem_object *obj);
> > int drm_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map);
> > void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map);
> >
> > +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
> > + enum drm_gem_object_access_flags access);
> > +
> > int drm_gem_objects_lookup(struct drm_file *filp, void __user *bo_handles,
> > int count, struct drm_gem_object ***objs_out);
> > struct drm_gem_object *drm_gem_object_lookup(struct drm_file *filp, u32 handle);
> > --
> > 2.51.0
> >
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-17 14:40 ` Faith Ekstrand
@ 2025-10-17 15:26 ` Boris Brezillon
2025-10-17 15:35 ` Faith Ekstrand
0 siblings, 1 reply; 40+ messages in thread
From: Boris Brezillon @ 2025-10-17 15:26 UTC (permalink / raw)
To: Faith Ekstrand
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
kernel
On Fri, 17 Oct 2025 10:40:46 -0400
Faith Ekstrand <faith@gfxstrand.net> wrote:
> On Fri, Oct 17, 2025 at 10:32 AM Faith Ekstrand <faith@gfxstrand.net> wrote:
> >
> > On Wed, Oct 15, 2025 at 12:04 PM Boris Brezillon
> > <boris.brezillon@collabora.com> wrote:
> > >
> > > Prepare things for standardizing synchronization around CPU accesses
> > > of GEM buffers. This will be used to provide default
> > > drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and provide
> > > a way for drivers to add their own ioctls to synchronize CPU
> > > writes/reads when they can't do it directly from userland.
> > >
> > > v2:
> > > - New commit
> > >
> > > v3:
> > > - No changes
> > >
> > > v4:
> > > - Add Steve's R-b
> > >
> > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > Reviewed-by: Steven Price <steven.price@arm.com>
> > > ---
> > > drivers/gpu/drm/drm_gem.c | 10 +++++++++
> > > include/drm/drm_gem.h | 45 +++++++++++++++++++++++++++++++++++++++
> > > 2 files changed, 55 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> > > index a1a9c828938b..a1431e4f2404 100644
> > > --- a/drivers/gpu/drm/drm_gem.c
> > > +++ b/drivers/gpu/drm/drm_gem.c
> > > @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
> > > }
> > > EXPORT_SYMBOL(drm_gem_vunmap);
> > >
> > > +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
> > > + enum drm_gem_object_access_flags access)
> > > +{
> > > + if (obj->funcs->sync)
> > > + return obj->funcs->sync(obj, offset, size, access);
> > > +
> > > + return 0;
> > > +}
> > > +EXPORT_SYMBOL(drm_gem_sync);
> > > +
> > > /**
> > > * drm_gem_lock_reservations - Sets up the ww context and acquires
> > > * the lock on an array of GEM objects.
> > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > index 8d48d2af2649..1c33e59ab305 100644
> > > --- a/include/drm/drm_gem.h
> > > +++ b/include/drm/drm_gem.h
> > > @@ -66,6 +66,33 @@ enum drm_gem_object_status {
> > > DRM_GEM_OBJECT_ACTIVE = BIT(2),
> > > };
> > >
> > > +/**
> > > + * enum drm_gem_object_status - bitmask describing GEM access types to prepare for
> > > + */
> > > +enum drm_gem_object_access_flags {
> > > + /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. */
> > > + DRM_GEM_OBJECT_CPU_ACCESS = 0,
> > > +
> > > + /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device access. */
> > > + DRM_GEM_OBJECT_DEV_ACCESS = BIT(0),
> > > +
> > > + /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the entity doing the access. */
> > > + DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0),
> > > +
> > > + /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only accesses. */
> > > + DRM_GEM_OBJECT_READ_ACCESS = BIT(1),
> > > +
> > > + /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only accesses. */
> > > + DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2),
> > > +
> > > + /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write accesses. */
> > > + DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS |
> > > + DRM_GEM_OBJECT_WRITE_ACCESS,
> > > +
> > > + /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check the access type. */
> > > + DRM_GEM_OBJECT_ACCESS_TYPE_MASK = DRM_GEM_OBJECT_RW_ACCESS,
> > > +};
> > > +
> > > /**
> > > * struct drm_gem_object_funcs - GEM object functions
> > > */
> > > @@ -191,6 +218,21 @@ struct drm_gem_object_funcs {
> > > */
> > > int (*mmap)(struct drm_gem_object *obj, struct vm_area_struct *vma);
> > >
> > > + /**
> > > + * @sync:
> > > + *
> > > + * Prepare for CPU/device access. This can involve migration of
> > > + * a buffer to the system-RAM/VRAM, or for UMA, flushing/invalidating
> > > + * the CPU caches. The range can be used to optimize the synchronization
> > > + * when possible.
> >
> > This has gone in a very different direction from the version I sent
> > out and the added generality makes me really nervous. The idea of sync
> > involving migration and that the range is a mere hint are antithetical
> > with Vulkan. It's a very GLish design that assumes that a BO is
> > exclusively used by one of the CPU or the GPU at the same time. This
> > simply isn't the case in modern APIs. Older DRM uAPIs (as well as
> > dma-buf itself) are littered with such ioctls and we're in the process
> > of deleting them all.
>
> And yes, I realize I sent this on the patch for the hook which you
> intended to plumb through to dma-buf. However, I also saw it being
> propagated to an ioctl and I didn't know where else to put it that had
> the relevant details.
>
> ~Faith
>
> > If the BO needs to be migrated in order to be accessed from the CPU,
> > that needs to happen on map, not on some sort of begin/end. Or better
> > yet, just disallow mapping such buffers. Once the client has a map,
> > they are free to access from the CPU while stuff is running on the
> > GPU. They have to be careful, of course, not to cause data races, but
> > accessing the same BO from the CPU and GPU or even the same range is
> > totally okay if you aren't racing.
> >
> > As a corollary, just don't map PRIME buffers.
> >
> > And the range really shouldn't be just a hint. With Vulkan, clients
> > are regularly sub-allocating from larger memory objects. If they ask
> > to flush 64B and end up flushing 64M, that's pretty bad.
> >
> > All we need is something which lets us trap through to the kernel for
> > CPU cache management. That's all we need and that's really all it
> > should do.
Okay, so there's actually a problem with that I think, because we can't
know how the buffer we export will be used. It can be imported by the
same driver, and we're all good, but it can also be imported by a
different driver, which decides to vmap or allow mmap() on it, and then
we have to implement the dma_buf CPU sync hooks. Unless we decide that
all exported buffers should be write-combine only? This is the very
reason I started hooking things up on the dma_buf side, because we're
not in control of who the importer of our buffers is.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-17 15:26 ` Boris Brezillon
@ 2025-10-17 15:35 ` Faith Ekstrand
2025-10-17 15:50 ` Boris Brezillon
0 siblings, 1 reply; 40+ messages in thread
From: Faith Ekstrand @ 2025-10-17 15:35 UTC (permalink / raw)
To: Boris Brezillon
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
kernel
On Fri, Oct 17, 2025 at 11:27 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> On Fri, 17 Oct 2025 10:40:46 -0400
> Faith Ekstrand <faith@gfxstrand.net> wrote:
>
> > On Fri, Oct 17, 2025 at 10:32 AM Faith Ekstrand <faith@gfxstrand.net> wrote:
> > >
> > > On Wed, Oct 15, 2025 at 12:04 PM Boris Brezillon
> > > <boris.brezillon@collabora.com> wrote:
> > > >
> > > > Prepare things for standardizing synchronization around CPU accesses
> > > > of GEM buffers. This will be used to provide default
> > > > drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and provide
> > > > a way for drivers to add their own ioctls to synchronize CPU
> > > > writes/reads when they can't do it directly from userland.
> > > >
> > > > v2:
> > > > - New commit
> > > >
> > > > v3:
> > > > - No changes
> > > >
> > > > v4:
> > > > - Add Steve's R-b
> > > >
> > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > > Reviewed-by: Steven Price <steven.price@arm.com>
> > > > ---
> > > > drivers/gpu/drm/drm_gem.c | 10 +++++++++
> > > > include/drm/drm_gem.h | 45 +++++++++++++++++++++++++++++++++++++++
> > > > 2 files changed, 55 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> > > > index a1a9c828938b..a1431e4f2404 100644
> > > > --- a/drivers/gpu/drm/drm_gem.c
> > > > +++ b/drivers/gpu/drm/drm_gem.c
> > > > @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
> > > > }
> > > > EXPORT_SYMBOL(drm_gem_vunmap);
> > > >
> > > > +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
> > > > + enum drm_gem_object_access_flags access)
> > > > +{
> > > > + if (obj->funcs->sync)
> > > > + return obj->funcs->sync(obj, offset, size, access);
> > > > +
> > > > + return 0;
> > > > +}
> > > > +EXPORT_SYMBOL(drm_gem_sync);
> > > > +
> > > > /**
> > > > * drm_gem_lock_reservations - Sets up the ww context and acquires
> > > > * the lock on an array of GEM objects.
> > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > index 8d48d2af2649..1c33e59ab305 100644
> > > > --- a/include/drm/drm_gem.h
> > > > +++ b/include/drm/drm_gem.h
> > > > @@ -66,6 +66,33 @@ enum drm_gem_object_status {
> > > > DRM_GEM_OBJECT_ACTIVE = BIT(2),
> > > > };
> > > >
> > > > +/**
> > > > + * enum drm_gem_object_status - bitmask describing GEM access types to prepare for
> > > > + */
> > > > +enum drm_gem_object_access_flags {
> > > > + /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. */
> > > > + DRM_GEM_OBJECT_CPU_ACCESS = 0,
> > > > +
> > > > + /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device access. */
> > > > + DRM_GEM_OBJECT_DEV_ACCESS = BIT(0),
> > > > +
> > > > + /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the entity doing the access. */
> > > > + DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0),
> > > > +
> > > > + /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only accesses. */
> > > > + DRM_GEM_OBJECT_READ_ACCESS = BIT(1),
> > > > +
> > > > + /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only accesses. */
> > > > + DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2),
> > > > +
> > > > + /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write accesses. */
> > > > + DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS |
> > > > + DRM_GEM_OBJECT_WRITE_ACCESS,
> > > > +
> > > > + /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check the access type. */
> > > > + DRM_GEM_OBJECT_ACCESS_TYPE_MASK = DRM_GEM_OBJECT_RW_ACCESS,
> > > > +};
> > > > +
> > > > /**
> > > > * struct drm_gem_object_funcs - GEM object functions
> > > > */
> > > > @@ -191,6 +218,21 @@ struct drm_gem_object_funcs {
> > > > */
> > > > int (*mmap)(struct drm_gem_object *obj, struct vm_area_struct *vma);
> > > >
> > > > + /**
> > > > + * @sync:
> > > > + *
> > > > + * Prepare for CPU/device access. This can involve migration of
> > > > + * a buffer to the system-RAM/VRAM, or for UMA, flushing/invalidating
> > > > + * the CPU caches. The range can be used to optimize the synchronization
> > > > + * when possible.
> > >
> > > This has gone in a very different direction from the version I sent
> > > out and the added generality makes me really nervous. The idea of sync
> > > involving migration and that the range is a mere hint are antithetical
> > > with Vulkan. It's a very GLish design that assumes that a BO is
> > > exclusively used by one of the CPU or the GPU at the same time. This
> > > simply isn't the case in modern APIs. Older DRM uAPIs (as well as
> > > dma-buf itself) are littered with such ioctls and we're in the process
> > > of deleting them all.
> >
> > And yes, I realize I sent this on the patch for the hook which you
> > intended to plumb through to dma-buf. However, I also saw it being
> > propagated to an ioctl and I didn't know where else to put it that had
> > the relevant details.
> >
> > ~Faith
> >
> > > If the BO needs to be migrated in order to be accessed from the CPU,
> > > that needs to happen on map, not on some sort of begin/end. Or better
> > > yet, just disallow mapping such buffers. Once the client has a map,
> > > they are free to access from the CPU while stuff is running on the
> > > GPU. They have to be careful, of course, not to cause data races, but
> > > accessing the same BO from the CPU and GPU or even the same range is
> > > totally okay if you aren't racing.
> > >
> > > As a corollary, just don't map PRIME buffers.
> > >
> > > And the range really shouldn't be just a hint. With Vulkan, clients
> > > are regularly sub-allocating from larger memory objects. If they ask
> > > to flush 64B and end up flushing 64M, that's pretty bad.
> > >
> > > All we need is something which lets us trap through to the kernel for
> > > CPU cache management. That's all we need and that's really all it
> > > should do.
>
> Okay, so there's actually a problem with that I think, because we can't
> know how the buffer we export will be used. It can be imported by the
> same driver, and we're all good, but it can also be imported by a
> different driver, which decides to vmap or allow mmap() on it, and then
> we have to implement the dma_buf CPU sync hooks. Unless we decide that
> all exported buffers should be write-combine only? This is the very
> reason I started hooking things up on the dma_buf side, because we're
> not in control of who the importer of our buffers is.
Exported buffers should be WC-only. We could try to get creative but
the moment we let the lack of coherency leak to other processes,
potentially to other drivers, we're in a world of hurt. Even with the
dma-buf begin/end hooks, if it's imported into a driver that does
Vulkan, those hooks don't make sense and we're screwed. And, yes, I
know panvk is the only Vulkan implementation you're going to see on a
system with an Arm GPU, but thinking about things in the general case
across all of DRM, exporting non-coherent memory in 2025 is just
cursed.
~Faith
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-17 15:35 ` Faith Ekstrand
@ 2025-10-17 15:50 ` Boris Brezillon
2025-10-17 15:57 ` Faith Ekstrand
0 siblings, 1 reply; 40+ messages in thread
From: Boris Brezillon @ 2025-10-17 15:50 UTC (permalink / raw)
To: Faith Ekstrand
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
kernel, Matt Coster
+Matt for my comment on PVR having the same issue.
On Fri, 17 Oct 2025 11:35:54 -0400
Faith Ekstrand <faith@gfxstrand.net> wrote:
> On Fri, Oct 17, 2025 at 11:27 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > On Fri, 17 Oct 2025 10:40:46 -0400
> > Faith Ekstrand <faith@gfxstrand.net> wrote:
> >
> > > On Fri, Oct 17, 2025 at 10:32 AM Faith Ekstrand <faith@gfxstrand.net> wrote:
> > > >
> > > > On Wed, Oct 15, 2025 at 12:04 PM Boris Brezillon
> > > > <boris.brezillon@collabora.com> wrote:
> > > > >
> > > > > Prepare things for standardizing synchronization around CPU accesses
> > > > > of GEM buffers. This will be used to provide default
> > > > > drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and provide
> > > > > a way for drivers to add their own ioctls to synchronize CPU
> > > > > writes/reads when they can't do it directly from userland.
> > > > >
> > > > > v2:
> > > > > - New commit
> > > > >
> > > > > v3:
> > > > > - No changes
> > > > >
> > > > > v4:
> > > > > - Add Steve's R-b
> > > > >
> > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > > > Reviewed-by: Steven Price <steven.price@arm.com>
> > > > > ---
> > > > > drivers/gpu/drm/drm_gem.c | 10 +++++++++
> > > > > include/drm/drm_gem.h | 45 +++++++++++++++++++++++++++++++++++++++
> > > > > 2 files changed, 55 insertions(+)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> > > > > index a1a9c828938b..a1431e4f2404 100644
> > > > > --- a/drivers/gpu/drm/drm_gem.c
> > > > > +++ b/drivers/gpu/drm/drm_gem.c
> > > > > @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
> > > > > }
> > > > > EXPORT_SYMBOL(drm_gem_vunmap);
> > > > >
> > > > > +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
> > > > > + enum drm_gem_object_access_flags access)
> > > > > +{
> > > > > + if (obj->funcs->sync)
> > > > > + return obj->funcs->sync(obj, offset, size, access);
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +EXPORT_SYMBOL(drm_gem_sync);
> > > > > +
> > > > > /**
> > > > > * drm_gem_lock_reservations - Sets up the ww context and acquires
> > > > > * the lock on an array of GEM objects.
> > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > > index 8d48d2af2649..1c33e59ab305 100644
> > > > > --- a/include/drm/drm_gem.h
> > > > > +++ b/include/drm/drm_gem.h
> > > > > @@ -66,6 +66,33 @@ enum drm_gem_object_status {
> > > > > DRM_GEM_OBJECT_ACTIVE = BIT(2),
> > > > > };
> > > > >
> > > > > +/**
> > > > > + * enum drm_gem_object_status - bitmask describing GEM access types to prepare for
> > > > > + */
> > > > > +enum drm_gem_object_access_flags {
> > > > > + /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. */
> > > > > + DRM_GEM_OBJECT_CPU_ACCESS = 0,
> > > > > +
> > > > > + /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device access. */
> > > > > + DRM_GEM_OBJECT_DEV_ACCESS = BIT(0),
> > > > > +
> > > > > + /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the entity doing the access. */
> > > > > + DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0),
> > > > > +
> > > > > + /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only accesses. */
> > > > > + DRM_GEM_OBJECT_READ_ACCESS = BIT(1),
> > > > > +
> > > > > + /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only accesses. */
> > > > > + DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2),
> > > > > +
> > > > > + /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write accesses. */
> > > > > + DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS |
> > > > > + DRM_GEM_OBJECT_WRITE_ACCESS,
> > > > > +
> > > > > + /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check the access type. */
> > > > > + DRM_GEM_OBJECT_ACCESS_TYPE_MASK = DRM_GEM_OBJECT_RW_ACCESS,
> > > > > +};
> > > > > +
> > > > > /**
> > > > > * struct drm_gem_object_funcs - GEM object functions
> > > > > */
> > > > > @@ -191,6 +218,21 @@ struct drm_gem_object_funcs {
> > > > > */
> > > > > int (*mmap)(struct drm_gem_object *obj, struct vm_area_struct *vma);
> > > > >
> > > > > + /**
> > > > > + * @sync:
> > > > > + *
> > > > > + * Prepare for CPU/device access. This can involve migration of
> > > > > + * a buffer to the system-RAM/VRAM, or for UMA, flushing/invalidating
> > > > > + * the CPU caches. The range can be used to optimize the synchronization
> > > > > + * when possible.
> > > >
> > > > This has gone in a very different direction from the version I sent
> > > > out and the added generality makes me really nervous. The idea of sync
> > > > involving migration and that the range is a mere hint are antithetical
> > > > with Vulkan. It's a very GLish design that assumes that a BO is
> > > > exclusively used by one of the CPU or the GPU at the same time. This
> > > > simply isn't the case in modern APIs. Older DRM uAPIs (as well as
> > > > dma-buf itself) are littered with such ioctls and we're in the process
> > > > of deleting them all.
> > >
> > > And yes, I realize I sent this on the patch for the hook which you
> > > intended to plumb through to dma-buf. However, I also saw it being
> > > propagated to an ioctl and I didn't know where else to put it that had
> > > the relevant details.
> > >
> > > ~Faith
> > >
> > > > If the BO needs to be migrated in order to be accessed from the CPU,
> > > > that needs to happen on map, not on some sort of begin/end. Or better
> > > > yet, just disallow mapping such buffers. Once the client has a map,
> > > > they are free to access from the CPU while stuff is running on the
> > > > GPU. They have to be careful, of course, not to cause data races, but
> > > > accessing the same BO from the CPU and GPU or even the same range is
> > > > totally okay if you aren't racing.
> > > >
> > > > As a corollary, just don't map PRIME buffers.
> > > >
> > > > And the range really shouldn't be just a hint. With Vulkan, clients
> > > > are regularly sub-allocating from larger memory objects. If they ask
> > > > to flush 64B and end up flushing 64M, that's pretty bad.
> > > >
> > > > All we need is something which lets us trap through to the kernel for
> > > > CPU cache management. That's all we need and that's really all it
> > > > should do.
> >
> > Okay, so there's actually a problem with that I think, because we can't
> > know how the buffer we export will be used. It can be imported by the
> > same driver, and we're all good, but it can also be imported by a
> > different driver, which decides to vmap or allow mmap() on it, and then
> > we have to implement the dma_buf CPU sync hooks. Unless we decide that
> > all exported buffers should be write-combine only? This is the very
> > reason I started hooking things up on the dma_buf side, because we're
> > not in control of who the importer of our buffers is.
>
> Exported buffers should be WC-only. We could try to get creative but
> the moment we let the lack of coherency leak to other processes,
> potentially to other drivers, we're in a world of hurt. Even with the
> dma-buf begin/end hooks, if it's imported into a driver that does
> Vulkan, those hooks don't make sense and we're screwed.
Well, yeah, the 'entire-buf' granularity is a problem, indeed, and
there's no way around it with the current dma-buf API, which is why I
prevented external bufs from being mapped (AKA not host-visible in
Vulkan's term). But I really thought imported buffers coming from
panthor should be mapped cached.
> And, yes, I
> know panvk is the only Vulkan implementation you're going to see on a
> system with an Arm GPU, but thinking about things in the general case
> across all of DRM, exporting non-coherent memory in 2025 is just
> cursed.
Hm, okay. If that's the way to go, then we should enforce
!WB_MMAP || !PRIVATE
in panthor, and fail the export of a WB_MMAP buffer in panfrost (we
don't have a way to know that a buffer is private there).
I'll go and revisit the patchset to do that. I guess PVR needs fixing
too, because I haven't seen anything to prevent `CACHED + EXPORTABLE`
there.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-17 15:50 ` Boris Brezillon
@ 2025-10-17 15:57 ` Faith Ekstrand
2025-10-17 16:35 ` Boris Brezillon
0 siblings, 1 reply; 40+ messages in thread
From: Faith Ekstrand @ 2025-10-17 15:57 UTC (permalink / raw)
To: Boris Brezillon
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
kernel, Matt Coster
On Fri, Oct 17, 2025 at 11:50 AM Boris Brezillon
<boris.brezillon@collabora.com> wrote:
>
> +Matt for my comment on PVR having the same issue.
>
> On Fri, 17 Oct 2025 11:35:54 -0400
> Faith Ekstrand <faith@gfxstrand.net> wrote:
>
> > On Fri, Oct 17, 2025 at 11:27 AM Boris Brezillon
> > <boris.brezillon@collabora.com> wrote:
> > >
> > > On Fri, 17 Oct 2025 10:40:46 -0400
> > > Faith Ekstrand <faith@gfxstrand.net> wrote:
> > >
> > > > On Fri, Oct 17, 2025 at 10:32 AM Faith Ekstrand <faith@gfxstrand.net> wrote:
> > > > >
> > > > > On Wed, Oct 15, 2025 at 12:04 PM Boris Brezillon
> > > > > <boris.brezillon@collabora.com> wrote:
> > > > > >
> > > > > > Prepare things for standardizing synchronization around CPU accesses
> > > > > > of GEM buffers. This will be used to provide default
> > > > > > drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and provide
> > > > > > a way for drivers to add their own ioctls to synchronize CPU
> > > > > > writes/reads when they can't do it directly from userland.
> > > > > >
> > > > > > v2:
> > > > > > - New commit
> > > > > >
> > > > > > v3:
> > > > > > - No changes
> > > > > >
> > > > > > v4:
> > > > > > - Add Steve's R-b
> > > > > >
> > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > > > > Reviewed-by: Steven Price <steven.price@arm.com>
> > > > > > ---
> > > > > > drivers/gpu/drm/drm_gem.c | 10 +++++++++
> > > > > > include/drm/drm_gem.h | 45 +++++++++++++++++++++++++++++++++++++++
> > > > > > 2 files changed, 55 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> > > > > > index a1a9c828938b..a1431e4f2404 100644
> > > > > > --- a/drivers/gpu/drm/drm_gem.c
> > > > > > +++ b/drivers/gpu/drm/drm_gem.c
> > > > > > @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
> > > > > > }
> > > > > > EXPORT_SYMBOL(drm_gem_vunmap);
> > > > > >
> > > > > > +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
> > > > > > + enum drm_gem_object_access_flags access)
> > > > > > +{
> > > > > > + if (obj->funcs->sync)
> > > > > > + return obj->funcs->sync(obj, offset, size, access);
> > > > > > +
> > > > > > + return 0;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL(drm_gem_sync);
> > > > > > +
> > > > > > /**
> > > > > > * drm_gem_lock_reservations - Sets up the ww context and acquires
> > > > > > * the lock on an array of GEM objects.
> > > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > > > index 8d48d2af2649..1c33e59ab305 100644
> > > > > > --- a/include/drm/drm_gem.h
> > > > > > +++ b/include/drm/drm_gem.h
> > > > > > @@ -66,6 +66,33 @@ enum drm_gem_object_status {
> > > > > > DRM_GEM_OBJECT_ACTIVE = BIT(2),
> > > > > > };
> > > > > >
> > > > > > +/**
> > > > > > + * enum drm_gem_object_status - bitmask describing GEM access types to prepare for
> > > > > > + */
> > > > > > +enum drm_gem_object_access_flags {
> > > > > > + /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. */
> > > > > > + DRM_GEM_OBJECT_CPU_ACCESS = 0,
> > > > > > +
> > > > > > + /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device access. */
> > > > > > + DRM_GEM_OBJECT_DEV_ACCESS = BIT(0),
> > > > > > +
> > > > > > + /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the entity doing the access. */
> > > > > > + DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0),
> > > > > > +
> > > > > > + /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only accesses. */
> > > > > > + DRM_GEM_OBJECT_READ_ACCESS = BIT(1),
> > > > > > +
> > > > > > + /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only accesses. */
> > > > > > + DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2),
> > > > > > +
> > > > > > + /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write accesses. */
> > > > > > + DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS |
> > > > > > + DRM_GEM_OBJECT_WRITE_ACCESS,
> > > > > > +
> > > > > > + /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check the access type. */
> > > > > > + DRM_GEM_OBJECT_ACCESS_TYPE_MASK = DRM_GEM_OBJECT_RW_ACCESS,
> > > > > > +};
> > > > > > +
> > > > > > /**
> > > > > > * struct drm_gem_object_funcs - GEM object functions
> > > > > > */
> > > > > > @@ -191,6 +218,21 @@ struct drm_gem_object_funcs {
> > > > > > */
> > > > > > int (*mmap)(struct drm_gem_object *obj, struct vm_area_struct *vma);
> > > > > >
> > > > > > + /**
> > > > > > + * @sync:
> > > > > > + *
> > > > > > + * Prepare for CPU/device access. This can involve migration of
> > > > > > + * a buffer to the system-RAM/VRAM, or for UMA, flushing/invalidating
> > > > > > + * the CPU caches. The range can be used to optimize the synchronization
> > > > > > + * when possible.
> > > > >
> > > > > This has gone in a very different direction from the version I sent
> > > > > out and the added generality makes me really nervous. The idea of sync
> > > > > involving migration and that the range is a mere hint are antithetical
> > > > > with Vulkan. It's a very GLish design that assumes that a BO is
> > > > > exclusively used by one of the CPU or the GPU at the same time. This
> > > > > simply isn't the case in modern APIs. Older DRM uAPIs (as well as
> > > > > dma-buf itself) are littered with such ioctls and we're in the process
> > > > > of deleting them all.
> > > >
> > > > And yes, I realize I sent this on the patch for the hook which you
> > > > intended to plumb through to dma-buf. However, I also saw it being
> > > > propagated to an ioctl and I didn't know where else to put it that had
> > > > the relevant details.
> > > >
> > > > ~Faith
> > > >
> > > > > If the BO needs to be migrated in order to be accessed from the CPU,
> > > > > that needs to happen on map, not on some sort of begin/end. Or better
> > > > > yet, just disallow mapping such buffers. Once the client has a map,
> > > > > they are free to access from the CPU while stuff is running on the
> > > > > GPU. They have to be careful, of course, not to cause data races, but
> > > > > accessing the same BO from the CPU and GPU or even the same range is
> > > > > totally okay if you aren't racing.
> > > > >
> > > > > As a corollary, just don't map PRIME buffers.
> > > > >
> > > > > And the range really shouldn't be just a hint. With Vulkan, clients
> > > > > are regularly sub-allocating from larger memory objects. If they ask
> > > > > to flush 64B and end up flushing 64M, that's pretty bad.
> > > > >
> > > > > All we need is something which lets us trap through to the kernel for
> > > > > CPU cache management. That's all we need and that's really all it
> > > > > should do.
> > >
> > > Okay, so there's actually a problem with that I think, because we can't
> > > know how the buffer we export will be used. It can be imported by the
> > > same driver, and we're all good, but it can also be imported by a
> > > different driver, which decides to vmap or allow mmap() on it, and then
> > > we have to implement the dma_buf CPU sync hooks. Unless we decide that
> > > all exported buffers should be write-combine only? This is the very
> > > reason I started hooking things up on the dma_buf side, because we're
> > > not in control of who the importer of our buffers is.
> >
> > Exported buffers should be WC-only. We could try to get creative but
> > the moment we let the lack of coherency leak to other processes,
> > potentially to other drivers, we're in a world of hurt. Even with the
> > dma-buf begin/end hooks, if it's imported into a driver that does
> > Vulkan, those hooks don't make sense and we're screwed.
>
> Well, yeah, the 'entire-buf' granularity is a problem, indeed,
If all that's being done is cache flushing, it's kind of okay. It's a
big hammer but it's okay. However, if the driver is doing literally
anything else in begin/end, it's all a lie the moment you allow that
buffer to be mapped in Vulkan. The client may be reading from the CPU
for GPU download in one subrange, writing from the CPU for GPU upload
in another subrange, and reading/writing from the GPU in another
subrange all at the same time. That's totally incompatible with the
dma-buf begin/end model.
> and
> there's no way around it with the current dma-buf API, which is why I
> prevented external bufs from being mapped (AKA not host-visible in
> Vulkan's term). But I really thought imported buffers coming from
> panthor should be mapped cached.
Yeah. So we could potentially allow WB maps if the client requests an
export via OPAQUE_FD since we know a priori that such a buffer will
only ever be re-imported into panfrost/panvk. However, I really don't
think that's a common enough case to be worth optimizing just yet.
> > And, yes, I
> > know panvk is the only Vulkan implementation you're going to see on a
> > system with an Arm GPU, but thinking about things in the general case
> > across all of DRM, exporting non-coherent memory in 2025 is just
> > cursed.
>
> Hm, okay. If that's the way to go, then we should enforce
>
> !WB_MMAP || !PRIVATE
>
> in panthor, and fail the export of a WB_MMAP buffer in panfrost (we
> don't have a way to know that a buffer is private there).
Yeah, I'm a fan of that for now.
> I'll go and revisit the patchset to do that. I guess PVR needs fixing
> too, because I haven't seen anything to prevent `CACHED + EXPORTABLE`
> there.
Quite possible. I'm also not sure about msm, either.
~Faith
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-17 15:57 ` Faith Ekstrand
@ 2025-10-17 16:35 ` Boris Brezillon
0 siblings, 0 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-10-17 16:35 UTC (permalink / raw)
To: Faith Ekstrand
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
kernel, Matt Coster
On Fri, 17 Oct 2025 11:57:08 -0400
Faith Ekstrand <faith@gfxstrand.net> wrote:
> On Fri, Oct 17, 2025 at 11:50 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > +Matt for my comment on PVR having the same issue.
> >
> > On Fri, 17 Oct 2025 11:35:54 -0400
> > Faith Ekstrand <faith@gfxstrand.net> wrote:
> >
> > > On Fri, Oct 17, 2025 at 11:27 AM Boris Brezillon
> > > <boris.brezillon@collabora.com> wrote:
> > > >
> > > > On Fri, 17 Oct 2025 10:40:46 -0400
> > > > Faith Ekstrand <faith@gfxstrand.net> wrote:
> > > >
> > > > > On Fri, Oct 17, 2025 at 10:32 AM Faith Ekstrand <faith@gfxstrand.net> wrote:
> > > > > >
> > > > > > On Wed, Oct 15, 2025 at 12:04 PM Boris Brezillon
> > > > > > <boris.brezillon@collabora.com> wrote:
> > > > > > >
> > > > > > > Prepare things for standardizing synchronization around CPU accesses
> > > > > > > of GEM buffers. This will be used to provide default
> > > > > > > drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and provide
> > > > > > > a way for drivers to add their own ioctls to synchronize CPU
> > > > > > > writes/reads when they can't do it directly from userland.
> > > > > > >
> > > > > > > v2:
> > > > > > > - New commit
> > > > > > >
> > > > > > > v3:
> > > > > > > - No changes
> > > > > > >
> > > > > > > v4:
> > > > > > > - Add Steve's R-b
> > > > > > >
> > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > > > > > Reviewed-by: Steven Price <steven.price@arm.com>
> > > > > > > ---
> > > > > > > drivers/gpu/drm/drm_gem.c | 10 +++++++++
> > > > > > > include/drm/drm_gem.h | 45 +++++++++++++++++++++++++++++++++++++++
> > > > > > > 2 files changed, 55 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> > > > > > > index a1a9c828938b..a1431e4f2404 100644
> > > > > > > --- a/drivers/gpu/drm/drm_gem.c
> > > > > > > +++ b/drivers/gpu/drm/drm_gem.c
> > > > > > > @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
> > > > > > > }
> > > > > > > EXPORT_SYMBOL(drm_gem_vunmap);
> > > > > > >
> > > > > > > +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
> > > > > > > + enum drm_gem_object_access_flags access)
> > > > > > > +{
> > > > > > > + if (obj->funcs->sync)
> > > > > > > + return obj->funcs->sync(obj, offset, size, access);
> > > > > > > +
> > > > > > > + return 0;
> > > > > > > +}
> > > > > > > +EXPORT_SYMBOL(drm_gem_sync);
> > > > > > > +
> > > > > > > /**
> > > > > > > * drm_gem_lock_reservations - Sets up the ww context and acquires
> > > > > > > * the lock on an array of GEM objects.
> > > > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > > > > index 8d48d2af2649..1c33e59ab305 100644
> > > > > > > --- a/include/drm/drm_gem.h
> > > > > > > +++ b/include/drm/drm_gem.h
> > > > > > > @@ -66,6 +66,33 @@ enum drm_gem_object_status {
> > > > > > > DRM_GEM_OBJECT_ACTIVE = BIT(2),
> > > > > > > };
> > > > > > >
> > > > > > > +/**
> > > > > > > + * enum drm_gem_object_status - bitmask describing GEM access types to prepare for
> > > > > > > + */
> > > > > > > +enum drm_gem_object_access_flags {
> > > > > > > + /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. */
> > > > > > > + DRM_GEM_OBJECT_CPU_ACCESS = 0,
> > > > > > > +
> > > > > > > + /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device access. */
> > > > > > > + DRM_GEM_OBJECT_DEV_ACCESS = BIT(0),
> > > > > > > +
> > > > > > > + /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the entity doing the access. */
> > > > > > > + DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0),
> > > > > > > +
> > > > > > > + /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only accesses. */
> > > > > > > + DRM_GEM_OBJECT_READ_ACCESS = BIT(1),
> > > > > > > +
> > > > > > > + /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only accesses. */
> > > > > > > + DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2),
> > > > > > > +
> > > > > > > + /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write accesses. */
> > > > > > > + DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS |
> > > > > > > + DRM_GEM_OBJECT_WRITE_ACCESS,
> > > > > > > +
> > > > > > > + /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check the access type. */
> > > > > > > + DRM_GEM_OBJECT_ACCESS_TYPE_MASK = DRM_GEM_OBJECT_RW_ACCESS,
> > > > > > > +};
> > > > > > > +
> > > > > > > /**
> > > > > > > * struct drm_gem_object_funcs - GEM object functions
> > > > > > > */
> > > > > > > @@ -191,6 +218,21 @@ struct drm_gem_object_funcs {
> > > > > > > */
> > > > > > > int (*mmap)(struct drm_gem_object *obj, struct vm_area_struct *vma);
> > > > > > >
> > > > > > > + /**
> > > > > > > + * @sync:
> > > > > > > + *
> > > > > > > + * Prepare for CPU/device access. This can involve migration of
> > > > > > > + * a buffer to the system-RAM/VRAM, or for UMA, flushing/invalidating
> > > > > > > + * the CPU caches. The range can be used to optimize the synchronization
> > > > > > > + * when possible.
> > > > > >
> > > > > > This has gone in a very different direction from the version I sent
> > > > > > out and the added generality makes me really nervous. The idea of sync
> > > > > > involving migration and that the range is a mere hint are antithetical
> > > > > > with Vulkan. It's a very GLish design that assumes that a BO is
> > > > > > exclusively used by one of the CPU or the GPU at the same time. This
> > > > > > simply isn't the case in modern APIs. Older DRM uAPIs (as well as
> > > > > > dma-buf itself) are littered with such ioctls and we're in the process
> > > > > > of deleting them all.
> > > > >
> > > > > And yes, I realize I sent this on the patch for the hook which you
> > > > > intended to plumb through to dma-buf. However, I also saw it being
> > > > > propagated to an ioctl and I didn't know where else to put it that had
> > > > > the relevant details.
> > > > >
> > > > > ~Faith
> > > > >
> > > > > > If the BO needs to be migrated in order to be accessed from the CPU,
> > > > > > that needs to happen on map, not on some sort of begin/end. Or better
> > > > > > yet, just disallow mapping such buffers. Once the client has a map,
> > > > > > they are free to access from the CPU while stuff is running on the
> > > > > > GPU. They have to be careful, of course, not to cause data races, but
> > > > > > accessing the same BO from the CPU and GPU or even the same range is
> > > > > > totally okay if you aren't racing.
> > > > > >
> > > > > > As a corollary, just don't map PRIME buffers.
> > > > > >
> > > > > > And the range really shouldn't be just a hint. With Vulkan, clients
> > > > > > are regularly sub-allocating from larger memory objects. If they ask
> > > > > > to flush 64B and end up flushing 64M, that's pretty bad.
> > > > > >
> > > > > > All we need is something which lets us trap through to the kernel for
> > > > > > CPU cache management. That's all we need and that's really all it
> > > > > > should do.
> > > >
> > > > Okay, so there's actually a problem with that I think, because we can't
> > > > know how the buffer we export will be used. It can be imported by the
> > > > same driver, and we're all good, but it can also be imported by a
> > > > different driver, which decides to vmap or allow mmap() on it, and then
> > > > we have to implement the dma_buf CPU sync hooks. Unless we decide that
> > > > all exported buffers should be write-combine only? This is the very
> > > > reason I started hooking things up on the dma_buf side, because we're
> > > > not in control of who the importer of our buffers is.
> > >
> > > Exported buffers should be WC-only. We could try to get creative but
> > > the moment we let the lack of coherency leak to other processes,
> > > potentially to other drivers, we're in a world of hurt. Even with the
> > > dma-buf begin/end hooks, if it's imported into a driver that does
> > > Vulkan, those hooks don't make sense and we're screwed.
> >
> > Well, yeah, the 'entire-buf' granularity is a problem, indeed,
>
> If all that's being done is cache flushing, it's kind of okay. It's a
> big hammer but it's okay. However, if the driver is doing literally
> anything else in begin/end, it's all a lie the moment you allow that
> buffer to be mapped in Vulkan. The client may be reading from the CPU
> for GPU download in one subrange, writing from the CPU for GPU upload
> in another subrange, and reading/writing from the GPU in another
> subrange all at the same time. That's totally incompatible with the
> dma-buf begin/end model.
>
> > and
> > there's no way around it with the current dma-buf API, which is why I
> > prevented external bufs from being mapped (AKA not host-visible in
> > Vulkan's term). But I really thought imported buffers coming from
> > panthor should be mapped cached.
>
> Yeah. So we could potentially allow WB maps if the client requests an
> export via OPAQUE_FD since we know a priori that such a buffer will
> only ever be re-imported into panfrost/panvk. However, I really don't
> think that's a common enough case to be worth optimizing just yet.
>
> > > And, yes, I
> > > know panvk is the only Vulkan implementation you're going to see on a
> > > system with an Arm GPU, but thinking about things in the general case
> > > across all of DRM, exporting non-coherent memory in 2025 is just
> > > cursed.
> >
> > Hm, okay. If that's the way to go, then we should enforce
> >
> > !WB_MMAP || !PRIVATE
> >
> > in panthor, and fail the export of a WB_MMAP buffer in panfrost (we
> > don't have a way to know that a buffer is private there).
>
> Yeah, I'm a fan of that for now.
Might also require more changes when the GPU is IO-coherent, because
IO-coherency is a per-device thing on Arm. Right now we forcibly map
things WB even if the user didn't ask for it when the GPU is coherent,
but the importer might not be IO-coherent. I think it's fine in Panthor
because we can properly define the shareability/cachebility attributes,
but on older Mali gens, I remember hitting some issues. For instance, we
had issues when mapping things uncached on some IO-coherent amlogic
integration of the G52. I've managed to disable coherency on my VIM3
with a few hacks (switching to the new page table format was one of
them, but I'm not sure what I did is portable to older gens that only
support the old page table format), but I'd have to make sure this can
be done on a per mapping basis. Steve probably knows more about that.
TLDR; maybe we should focus on Panthor first, so we don't block this MR
on some changes breaking old HW in Panfrost.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-16 13:07 ` Boris Brezillon
@ 2025-10-29 9:30 ` Maxime Ripard
2025-10-30 14:07 ` Boris Brezillon
0 siblings, 1 reply; 40+ messages in thread
From: Maxime Ripard @ 2025-10-29 9:30 UTC (permalink / raw)
To: Boris Brezillon
Cc: Steven Price, dri-devel, Maarten Lankhorst, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, kernel
[-- Attachment #1: Type: text/plain, Size: 5367 bytes --]
On Thu, Oct 16, 2025 at 03:07:04PM +0200, Boris Brezillon wrote:
> On Thu, 16 Oct 2025 14:57:08 +0200
> Boris Brezillon <boris.brezillon@collabora.com> wrote:
>
> > On Thu, 16 Oct 2025 10:13:25 +0200
> > Maxime Ripard <mripard@kernel.org> wrote:
> >
> > > Hi,
> > >
> > > On Wed, Oct 15, 2025 at 06:03:14PM +0200, Boris Brezillon wrote:
> > > > Prepare things for standardizing synchronization around CPU accesses
> > > > of GEM buffers. This will be used to provide default
> > > > drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and provide
> > > > a way for drivers to add their own ioctls to synchronize CPU
> > > > writes/reads when they can't do it directly from userland.
> > > >
> > > > v2:
> > > > - New commit
> > > >
> > > > v3:
> > > > - No changes
> > > >
> > > > v4:
> > > > - Add Steve's R-b
> > > >
> > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > > Reviewed-by: Steven Price <steven.price@arm.com>
> > > > ---
> > > > drivers/gpu/drm/drm_gem.c | 10 +++++++++
> > > > include/drm/drm_gem.h | 45 +++++++++++++++++++++++++++++++++++++++
> > > > 2 files changed, 55 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> > > > index a1a9c828938b..a1431e4f2404 100644
> > > > --- a/drivers/gpu/drm/drm_gem.c
> > > > +++ b/drivers/gpu/drm/drm_gem.c
> > > > @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
> > > > }
> > > > EXPORT_SYMBOL(drm_gem_vunmap);
> > > >
> > > > +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
> > > > + enum drm_gem_object_access_flags access)
> > > > +{
> > > > + if (obj->funcs->sync)
> > > > + return obj->funcs->sync(obj, offset, size, access);
> > > > +
> > > > + return 0;
> > > > +}
> > > > +EXPORT_SYMBOL(drm_gem_sync);
> > > > +
> > > > /**
> > > > * drm_gem_lock_reservations - Sets up the ww context and acquires
> > > > * the lock on an array of GEM objects.
> > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > index 8d48d2af2649..1c33e59ab305 100644
> > > > --- a/include/drm/drm_gem.h
> > > > +++ b/include/drm/drm_gem.h
> > > > @@ -66,6 +66,33 @@ enum drm_gem_object_status {
> > > > DRM_GEM_OBJECT_ACTIVE = BIT(2),
> > > > };
> > > >
> > > > +/**
> > > > + * enum drm_gem_object_status - bitmask describing GEM access types to prepare for
> > >
> > > Treating an enum as a bitmask is a bit weird to me. I'd say either have
> > > a bitmask with BIT(enum values), or no enum at all.
> >
> > I'll drop the enum and make it pure defines.
> >
> > >
> > > > + */
> > > > +enum drm_gem_object_access_flags {
> > > > + /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. */
> > > > + DRM_GEM_OBJECT_CPU_ACCESS = 0,
> > > > +
> > > > + /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device access. */
> > > > + DRM_GEM_OBJECT_DEV_ACCESS = BIT(0),
> > > > +
> > > > + /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the entity doing the access. */
> > > > + DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0),
> > >
> > > Do we really want to have to variants with the same discriminant? If so,
> > > we should document why it's something we want.
> > >
> > > > + /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only accesses. */
> > > > + DRM_GEM_OBJECT_READ_ACCESS = BIT(1),
> > > > +
> > > > + /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only accesses. */
> > > > + DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2),
> > > > +
> > > > + /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write accesses. */
> > > > + DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS |
> > > > + DRM_GEM_OBJECT_WRITE_ACCESS,
> > > > +
> > > > + /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check the access type. */
> > > > + DRM_GEM_OBJECT_ACCESS_TYPE_MASK = DRM_GEM_OBJECT_RW_ACCESS,
> > >
> > > Same thing.
> > >
> > > Or is it that you encode both the direction and access type, and have a
> > > mask to isolate each?
> >
> > This ^.
> >
> > >
> > > If so, we should really move it out from an enum into defines, or treat each
> > > separately like dma_sync_does.
> >
> > Sure, I can do that.
>
> Actually, looking at the enum just above the one added in this patch
> (drm_gem_object_status), it seems that it has the same flaws, and I
> think it was the reason I went for this enum-based approach, because I
> tend to be consistent with the code base I'm modifying.
>
> Now, I get that defining flags with an enum and then composing those to
> then pass the composition to some helper pretending it's still an enum
> only works in C (because with C you can do anything you want :D), and
> probably not if you're in pedantic mode. But if we want to enforce
> that, we should probably fix the existing code base, otherwise this
> will keep happening ;-). And no, before you ask, I'm not volunteering
> for this :P.
I'm fine with it not being totally consistent with the other enums
around, if anything because those aren't consistent with the typical way
we use enums in C.
And I don't expect to fix everything else, especially if you don't have
the time. It's still not worth creating more work down the line when
you'll volunteer ;)
Maxime
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 273 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-29 9:30 ` Maxime Ripard
@ 2025-10-30 14:07 ` Boris Brezillon
0 siblings, 0 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:07 UTC (permalink / raw)
To: Maxime Ripard
Cc: Steven Price, dri-devel, Maarten Lankhorst, Thomas Zimmermann,
David Airlie, Simona Vetter, Faith Ekstrand, kernel
On Wed, 29 Oct 2025 10:30:56 +0100
Maxime Ripard <mripard@kernel.org> wrote:
> On Thu, Oct 16, 2025 at 03:07:04PM +0200, Boris Brezillon wrote:
> > On Thu, 16 Oct 2025 14:57:08 +0200
> > Boris Brezillon <boris.brezillon@collabora.com> wrote:
> >
> > > On Thu, 16 Oct 2025 10:13:25 +0200
> > > Maxime Ripard <mripard@kernel.org> wrote:
> > >
> > > > Hi,
> > > >
> > > > On Wed, Oct 15, 2025 at 06:03:14PM +0200, Boris Brezillon wrote:
> > > > > Prepare things for standardizing synchronization around CPU accesses
> > > > > of GEM buffers. This will be used to provide default
> > > > > drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and provide
> > > > > a way for drivers to add their own ioctls to synchronize CPU
> > > > > writes/reads when they can't do it directly from userland.
> > > > >
> > > > > v2:
> > > > > - New commit
> > > > >
> > > > > v3:
> > > > > - No changes
> > > > >
> > > > > v4:
> > > > > - Add Steve's R-b
> > > > >
> > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > > > Reviewed-by: Steven Price <steven.price@arm.com>
> > > > > ---
> > > > > drivers/gpu/drm/drm_gem.c | 10 +++++++++
> > > > > include/drm/drm_gem.h | 45 +++++++++++++++++++++++++++++++++++++++
> > > > > 2 files changed, 55 insertions(+)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> > > > > index a1a9c828938b..a1431e4f2404 100644
> > > > > --- a/drivers/gpu/drm/drm_gem.c
> > > > > +++ b/drivers/gpu/drm/drm_gem.c
> > > > > @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
> > > > > }
> > > > > EXPORT_SYMBOL(drm_gem_vunmap);
> > > > >
> > > > > +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
> > > > > + enum drm_gem_object_access_flags access)
> > > > > +{
> > > > > + if (obj->funcs->sync)
> > > > > + return obj->funcs->sync(obj, offset, size, access);
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +EXPORT_SYMBOL(drm_gem_sync);
> > > > > +
> > > > > /**
> > > > > * drm_gem_lock_reservations - Sets up the ww context and acquires
> > > > > * the lock on an array of GEM objects.
> > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > > index 8d48d2af2649..1c33e59ab305 100644
> > > > > --- a/include/drm/drm_gem.h
> > > > > +++ b/include/drm/drm_gem.h
> > > > > @@ -66,6 +66,33 @@ enum drm_gem_object_status {
> > > > > DRM_GEM_OBJECT_ACTIVE = BIT(2),
> > > > > };
> > > > >
> > > > > +/**
> > > > > + * enum drm_gem_object_status - bitmask describing GEM access types to prepare for
> > > >
> > > > Treating an enum as a bitmask is a bit weird to me. I'd say either have
> > > > a bitmask with BIT(enum values), or no enum at all.
> > >
> > > I'll drop the enum and make it pure defines.
> > >
> > > >
> > > > > + */
> > > > > +enum drm_gem_object_access_flags {
> > > > > + /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. */
> > > > > + DRM_GEM_OBJECT_CPU_ACCESS = 0,
> > > > > +
> > > > > + /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device access. */
> > > > > + DRM_GEM_OBJECT_DEV_ACCESS = BIT(0),
> > > > > +
> > > > > + /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the entity doing the access. */
> > > > > + DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0),
> > > >
> > > > Do we really want to have to variants with the same discriminant? If so,
> > > > we should document why it's something we want.
> > > >
> > > > > + /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only accesses. */
> > > > > + DRM_GEM_OBJECT_READ_ACCESS = BIT(1),
> > > > > +
> > > > > + /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only accesses. */
> > > > > + DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2),
> > > > > +
> > > > > + /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write accesses. */
> > > > > + DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS |
> > > > > + DRM_GEM_OBJECT_WRITE_ACCESS,
> > > > > +
> > > > > + /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check the access type. */
> > > > > + DRM_GEM_OBJECT_ACCESS_TYPE_MASK = DRM_GEM_OBJECT_RW_ACCESS,
> > > >
> > > > Same thing.
> > > >
> > > > Or is it that you encode both the direction and access type, and have a
> > > > mask to isolate each?
> > >
> > > This ^.
> > >
> > > >
> > > > If so, we should really move it out from an enum into defines, or treat each
> > > > separately like dma_sync_does.
> > >
> > > Sure, I can do that.
> >
> > Actually, looking at the enum just above the one added in this patch
> > (drm_gem_object_status), it seems that it has the same flaws, and I
> > think it was the reason I went for this enum-based approach, because I
> > tend to be consistent with the code base I'm modifying.
> >
> > Now, I get that defining flags with an enum and then composing those to
> > then pass the composition to some helper pretending it's still an enum
> > only works in C (because with C you can do anything you want :D), and
> > probably not if you're in pedantic mode. But if we want to enforce
> > that, we should probably fix the existing code base, otherwise this
> > will keep happening ;-). And no, before you ask, I'm not volunteering
> > for this :P.
>
> I'm fine with it not being totally consistent with the other enums
> around, if anything because those aren't consistent with the typical way
> we use enums in C.
I ended up dropping the generic ::sync() hook, and the enum that goes
with it. There's a new enum at the drm_gem_shmem level, but this time
it's a true enum, not a bitmask.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
2025-10-16 11:42 ` Thomas Zimmermann
2025-10-16 12:24 ` Boris Brezillon
@ 2025-10-30 14:11 ` Boris Brezillon
1 sibling, 0 replies; 40+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:11 UTC (permalink / raw)
To: Thomas Zimmermann
Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
David Airlie, Simona Vetter, Faith Ekstrand, kernel
Hi Thomas,
On Thu, 16 Oct 2025 13:42:59 +0200
Thomas Zimmermann <tzimmermann@suse.de> wrote:
> > Honestly, I'm not too sure why this is a problem if this hook is
> > optional. If it turns out to be too simple for more complex use cases
> > others have, it can still be extended when those drivers transition to
> > this ::sync() approach, as no in-kernel API is immutable. And in the
> > meantime, we have a solution for two drivers that doesn't imply
> > duplicating a bunch of drm_prime boiler-plate that's otherwise rather
> > generic.
>
> The prime code you'd have to duplicate is just 10 lines, plus some small
> per-driver code. Most of that being data-structure inits.
I went for this approach, with a few simple changes to drm_prime.c to
be able to re-use more of the prime boilerplate.
>
> I want to point out that I'm not opposing the general idea of GEM sync,
> but I think it should get more feedback from others. It's supposed to be
> a generic interface after all. Hence I was asking to put all this into a
> separate series.
New version sent, with more recipients this time. Hopefully I get some
answers to the dma_buf::{begin,end}_cpu_access() questions I have.
Regards,
Boris
^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2025-10-30 14:11 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-15 16:03 [PATCH v4 00/14] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 01/14] drm/panthor: Fix panthor_gpu_coherency_set() Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper Boris Brezillon
2025-10-16 8:13 ` Maxime Ripard
2025-10-16 12:57 ` Boris Brezillon
2025-10-16 13:07 ` Boris Brezillon
2025-10-29 9:30 ` Maxime Ripard
2025-10-30 14:07 ` Boris Brezillon
2025-10-16 8:32 ` Thomas Zimmermann
2025-10-16 9:40 ` Boris Brezillon
2025-10-16 9:58 ` Thomas Zimmermann
2025-10-16 10:32 ` Boris Brezillon
2025-10-16 11:42 ` Thomas Zimmermann
2025-10-16 12:24 ` Boris Brezillon
2025-10-30 14:11 ` Boris Brezillon
2025-10-17 14:32 ` Faith Ekstrand
2025-10-17 14:40 ` Faith Ekstrand
2025-10-17 15:26 ` Boris Brezillon
2025-10-17 15:35 ` Faith Ekstrand
2025-10-17 15:50 ` Boris Brezillon
2025-10-17 15:57 ` Faith Ekstrand
2025-10-17 16:35 ` Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 03/14] drm/prime: Provide default ::{begin, end}_cpu_access() implementations Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 04/14] drm/shmem: Add a drm_gem_shmem_sync() helper Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 05/14] drm/panthor: Expose the selected coherency protocol to the UMD Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 06/14] drm/panthor: Add a PANTHOR_BO_SYNC ioctl Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 07/14] drm/panthor: Add an ioctl to query BO flags Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 08/14] drm/panthor: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 09/14] drm/panthor: Bump the driver version to 1.6 Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 10/14] drm/panfrost: Expose the selected coherency protocol to the UMD Boris Brezillon
2025-10-15 16:06 ` Steven Price
2025-10-15 16:03 ` [PATCH v4 11/14] drm/panfrost: Add a PANFROST_SYNC_BO ioctl Boris Brezillon
2025-10-16 8:42 ` Marcin Ślusarz
2025-10-16 9:52 ` Boris Brezillon
2025-10-16 10:02 ` Marcin Ślusarz
2025-10-16 10:35 ` Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 12/14] drm/panfrost: Add an ioctl to query BO flags Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 13/14] drm/panfrost: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
2025-10-15 16:06 ` Steven Price
2025-10-15 16:03 ` [PATCH v4 14/14] drm/panfrost: Bump the driver version to 1.6 Boris Brezillon
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.