dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing
@ 2025-10-30 14:05 Boris Brezillon
  2025-10-30 14:05 ` [PATCH v5 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops Boris Brezillon
                   ` (15 more replies)
  0 siblings, 16 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, Boris Brezillon, kernel

This series implements cached maps and explicit flushing for both panfrost
and panthor. To avoid code/bug duplication, the tricky guts of the cache
flushing ioctl which walk the sg list are broken into a new common shmem
helper which can be used by any driver.

The PanVK MR to use this lives here:

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385

A few words about this v5: Faith and I discussed a few cache
maintenance aspects that were bothering me, and she suggested to forbid
exports of CPU-cacheable BOs to simplify things. But while doing that I
realized this was still broken on systems where the Mali GPU is coherent
but other potential importers are not, because in that case, even BOs
declared without WB_MMAP are upgraded to CPU-cacheable. I first
considered making those truly uncached, but that's not possible because
coherency is flagged at the device level and making a BO non-cacheable
on these systems messes up with how shmem zeroes out the pages at alloc
time (with a temporary cached mapping) and how Arm64 lazily defers
flushes. Because dma_map_sgtable() ends up being a NOP if the device is
coherent, we end up with dirty cachelines hanging around even after
we've created an uncached mapping to the same memory region, and this
leads to corruptions later on, or even worse, potential data leaks
because the uncached mapping can access memory before it's been zeroed
out.

TLDR; CPU cache maintenance is an mess on Arm unless everything is
coherent, and we'll have to sort it out at some point, but I believe
this is orthogonal to us implementing proper
dma_buf_ops::{begin,end}_cpu_access(), so, I eventually decided to
implement dma_buf_ops::{begin,end}_cpu_access() instead of pretending
we're good if we only export CPU-uncached BOs (which we can't really
do on systems where Mali is coherent, see above). I've hacked an
IGT test to make sure this does the right thing, and it seems to work.
Now, the real question is, is this the right thing to do. I basically
went for the system dma_heap approach where not only the exporter, but
also all importers have a dma_sync_sgtable_for_xxx() called on their
sgtable to prepare/end the CPU access. This should work if only CPU
cache maintenance is involved, but as soon as the importer needs more
than that (copying memory around to make it CPU/device visible), it's
going to be problematic. The only way to do that properly, would be
to add {begin,end}_cpu_access() hooks to dma_buf_attach_ops and let
the dma_buf core walk the importers to forward CPU access requests.

Sorry for the noise if you've been Cc-ed and don't care, but my goal
is to gather feedback on what exactly is expected from GPU drivers
exporting their CPU-cacheable buffers, and why so many drivers get
away with no dma_buf_ops::xxx_cpu_access() hooks, or very simple
ones that don't cover the cases I described above.

Changes in v2:
- Expose the coherency so userspace can know when it should skip cache
  maintenance
- Hook things up at drm_gem_object_funcs level to dma-buf cpu_prep hooks
  can be implemented generically
- Revisit the semantics of the flags passed to gem_sync()
- Add BO_QUERY_INFO ioctls to query BO flags on imported objects and
  let the UMD know when cache maintenance is needed on those

Changes in v3:
- New patch to fix panthor_gpu_coherency_set()
- No other major changes, check each patch changelog for more details

Changes in v4:
- Two trivial fixes, check each patch changelog for more details

Changes in v5:
- Add a way to overload dma_buf_ops while still relying on the drm_prime
  boilerplate
- Add default shmem implementation for
  dma_buf_ops::{begin,end}_cpu_access()
- Provide custom dma_buf_ops to deal with CPU cache flushes around CPU
  accesses when the BO is CPU-cacheable
- Go back to a version of drm_gem_shmem_sync() that only deals with
  cache maintenance, and adjust the semantics to make it clear this is
  the only thing it cares about
- Adjust the BO_SYNC ioctls according to the new drm_gem_shmem_sync()
  semantics

Boris Brezillon (10):
  drm/prime: Simplify life of drivers needing custom dma_buf_ops
  drm/shmem: Provide a generic {begin,end}_cpu_access() implementation
  drm/panthor: Provide a custom dma_buf implementation
  drm/panthor: Fix panthor_gpu_coherency_set()
  drm/panthor: Expose the selected coherency protocol to the UMD
  drm/panthor: Add a PANTHOR_BO_SYNC ioctl
  drm/panthor: Add an ioctl to query BO flags
  drm/panfrost: Provide a custom dma_buf implementation
  drm/panfrost: Expose the selected coherency protocol to the UMD
  drm/panfrost: Add an ioctl to query BO flags

Faith Ekstrand (5):
  drm/shmem: Add a drm_gem_shmem_sync() helper
  drm/panthor: Bump the driver version to 1.6
  drm/panfrost: Add a PANFROST_SYNC_BO ioctl
  drm/panfrost: Add flag to map GEM object Write-Back Cacheable
  drm/panfrost: Bump the driver version to 1.6

Loïc Molinari (1):
  drm/panthor: Add flag to map GEM object Write-Back Cacheable

 drivers/gpu/drm/drm_gem_shmem_helper.c     | 207 +++++++++++++++++++++
 drivers/gpu/drm/drm_prime.c                |  14 +-
 drivers/gpu/drm/panfrost/panfrost_device.h |   1 +
 drivers/gpu/drm/panfrost/panfrost_drv.c    |  98 +++++++++-
 drivers/gpu/drm/panfrost/panfrost_gem.c    |  73 ++++++++
 drivers/gpu/drm/panfrost/panfrost_gem.h    |   9 +
 drivers/gpu/drm/panfrost/panfrost_gpu.c    |  26 ++-
 drivers/gpu/drm/panfrost/panfrost_regs.h   |  10 +-
 drivers/gpu/drm/panthor/panthor_device.c   |  10 +-
 drivers/gpu/drm/panthor/panthor_drv.c      |  80 +++++++-
 drivers/gpu/drm/panthor/panthor_gem.c      |  77 +++++++-
 drivers/gpu/drm/panthor/panthor_gem.h      |   6 +
 drivers/gpu/drm/panthor/panthor_gpu.c      |   2 +-
 drivers/gpu/drm/panthor/panthor_sched.c    |  18 +-
 include/drm/drm_drv.h                      |   8 +
 include/drm/drm_gem_shmem_helper.h         |  24 +++
 include/uapi/drm/panfrost_drm.h            |  76 +++++++-
 include/uapi/drm/panthor_drm.h             | 160 +++++++++++++++-
 18 files changed, 875 insertions(+), 24 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v5 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops
  2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
@ 2025-10-30 14:05 ` Boris Brezillon
  2025-10-30 14:25   ` Christian König
  2025-10-30 14:05 ` [PATCH v5 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation Boris Brezillon
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, Boris Brezillon, kernel

drm_gem_is_prime_exported_dma_buf() checks the dma_buf->ops against
drm_gem_prime_dmabuf_ops, which makes it impossible to use if the
driver implements custom dma_buf_ops. Instead of duplicating a bunch
of helpers to work around it, let's provide a way for drivers to
expose their custom dma_buf_ops so the core prime helpers can rely on
that instead of hardcoding &drm_gem_prime_dmabuf_ops.

v5:
- New patch

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/drm_prime.c | 14 +++++++++++---
 include/drm/drm_drv.h       |  8 ++++++++
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 43a10b4af43a..3796844af418 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -903,6 +903,15 @@ unsigned long drm_prime_get_contiguous_size(struct sg_table *sgt)
 }
 EXPORT_SYMBOL(drm_prime_get_contiguous_size);
 
+static const struct dma_buf_ops *
+drm_gem_prime_get_dma_buf_ops(struct drm_device *dev)
+{
+	if (dev->driver->gem_prime_get_dma_buf_ops)
+		return dev->driver->gem_prime_get_dma_buf_ops(dev);
+
+	return &drm_gem_prime_dmabuf_ops;
+}
+
 /**
  * drm_gem_prime_export - helper library implementation of the export callback
  * @obj: GEM object to export
@@ -919,7 +928,7 @@ struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
 	struct dma_buf_export_info exp_info = {
 		.exp_name = KBUILD_MODNAME, /* white lie for debug */
 		.owner = dev->driver->fops->owner,
-		.ops = &drm_gem_prime_dmabuf_ops,
+		.ops = drm_gem_prime_get_dma_buf_ops(dev),
 		.size = obj->size,
 		.flags = flags,
 		.priv = obj,
@@ -930,7 +939,6 @@ struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
 }
 EXPORT_SYMBOL(drm_gem_prime_export);
 
-
 /**
  * drm_gem_is_prime_exported_dma_buf -
  * checks if the DMA-BUF was exported from a GEM object belonging to @dev.
@@ -946,7 +954,7 @@ bool drm_gem_is_prime_exported_dma_buf(struct drm_device *dev,
 {
 	struct drm_gem_object *obj = dma_buf->priv;
 
-	return (dma_buf->ops == &drm_gem_prime_dmabuf_ops) && (obj->dev == dev);
+	return (dma_buf->ops == drm_gem_prime_get_dma_buf_ops(dev)) && (obj->dev == dev);
 }
 EXPORT_SYMBOL(drm_gem_is_prime_exported_dma_buf);
 
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index 42fc085f986d..f18da3c0edb8 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -326,6 +326,14 @@ struct drm_driver {
 				struct dma_buf_attachment *attach,
 				struct sg_table *sgt);
 
+	/**
+	 * @gem_prime_get_dma_buf_ops:
+	 *
+	 * Optional hook used by the PRIME helpers to get the custom dma_buf_ops
+	 * used by this driver.
+	 */
+	const struct dma_buf_ops *(*gem_prime_get_dma_buf_ops)(struct drm_device *dev);
+
 	/**
 	 * @dumb_create:
 	 *
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation
  2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
  2025-10-30 14:05 ` [PATCH v5 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops Boris Brezillon
@ 2025-10-30 14:05 ` Boris Brezillon
  2025-10-30 14:31   ` [PATCH v5 02/16] drm/shmem: Provide a generic {begin,end}_cpu_access() implementation Christian König
  2025-11-03 20:34   ` [PATCH v5 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation Akash Goel
  2025-10-30 14:05 ` [PATCH v5 03/16] drm/shmem: Add a drm_gem_shmem_sync() helper Boris Brezillon
                   ` (13 subsequent siblings)
  15 siblings, 2 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, Boris Brezillon, kernel

The default implementation simply takes care of invalidating/flushing
caches around CPU accesses. It takes care of both the exporter and
the importers, which forces us to overload the default
::[un]map_dma_buf() implementation provided by drm_gem.c to store the
sgt.

v5:
- New patch

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/drm_gem_shmem_helper.c | 114 +++++++++++++++++++++++++
 include/drm/drm_gem_shmem_helper.h     |  10 +++
 2 files changed, 124 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index dc94a27710e5..e49c75739c20 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -893,6 +893,120 @@ struct drm_gem_object *drm_gem_shmem_prime_import_no_map(struct drm_device *dev,
 }
 EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_import_no_map);
 
+/**
+ * drm_gem_shmem_prime_unmap_dma_buf - Default unmap_dma_buf() for exported buffers
+ * @attach: attachment
+ * @sgt: SG table to unmap
+ * @dir: type of access done by this attachment
+ *
+ * Default implementation for dma_buf_ops::map_dma_buf(). This is just a wrapper
+ * around drm_gem_map_dma_buf() that lets us set the dma_buf_attachment::priv
+ * to the sgt so that drm_gem_shmem_prime_{begin,end}_cpu_access() can sync
+ * around CPU accesses.
+ */
+struct sg_table *
+drm_gem_shmem_prime_map_dma_buf(struct dma_buf_attachment *attach,
+				enum dma_data_direction dir)
+{
+	struct sg_table *sgt = drm_gem_map_dma_buf(attach, dir);
+
+	if (!IS_ERR(sgt))
+		attach->priv = sgt;
+
+	return sgt;
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_map_dma_buf);
+
+/**
+ * drm_gem_shmem_prime_unmap_dma_buf - Default unmap_dma_buf() for exported buffers
+ * @attach: attachment
+ * @sgt: SG table to unmap
+ * @dir: type of access done by this attachment
+ *
+ * Default implementation for dma_buf_ops::unmap_dma_buf(). This is just a wrapper
+ * around drm_gem_unmap_dma_buf() that lets us reset the dma_buf_attachment::priv
+ * field so that drm_gem_shmem_prime_{begin,end}_cpu_access() don't consider it
+ * as a mapped attachment to sync against.
+ */
+void drm_gem_shmem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
+				       struct sg_table *sgt,
+				       enum dma_data_direction dir)
+{
+	attach->priv = NULL;
+	drm_gem_unmap_dma_buf(attach, sgt, dir);
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_unmap_dma_buf);
+
+/**
+ * drm_gem_shmem_prime_begin_cpu_access - Default end_cpu_access() for exported buffers
+ * @dma_buf: The exported DMA buffer this acts on
+ * @dir: direction of the access
+ *
+ * Default implementation for dma_buf_ops::begin_cpu_access(). This only takes care of
+ * cache maintenance.
+ */
+int drm_gem_shmem_prime_begin_cpu_access(struct dma_buf *dma_buf,
+					 enum dma_data_direction dir)
+{
+	struct drm_gem_object *obj = dma_buf->priv;
+	struct drm_device *dev = obj->dev;
+	struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
+	struct dma_buf_attachment *attach;
+
+	dma_resv_lock(obj->resv, NULL);
+	if (shmem->sgt)
+		dma_sync_sgtable_for_cpu(dev->dev, shmem->sgt, dir);
+
+	if (shmem->vaddr)
+		invalidate_kernel_vmap_range(shmem->vaddr, shmem->base.size);
+
+	list_for_each_entry(attach, &dma_buf->attachments, node) {
+		struct sg_table *sgt = attach->priv;
+
+		if (sgt)
+			dma_sync_sgtable_for_cpu(attach->dev, sgt, dir);
+	}
+	dma_resv_unlock(obj->resv);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_begin_cpu_access);
+
+/**
+ * drm_gem_shmem_prime_end_cpu_access - Default end_cpu_access() for exported buffers
+ * @dma_buf: The exported DMA buffer this acts on
+ * @dir: direction of the access
+ *
+ * Default implementation for dma_buf_ops::end_cpu_access(). This only takes care of
+ * cache maintenance.
+ */
+int drm_gem_shmem_prime_end_cpu_access(struct dma_buf *dma_buf,
+				       enum dma_data_direction dir)
+{
+	struct drm_gem_object *obj = dma_buf->priv;
+	struct drm_device *dev = obj->dev;
+	struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
+	struct dma_buf_attachment *attach;
+
+	dma_resv_lock(obj->resv, NULL);
+	list_for_each_entry(attach, &dma_buf->attachments, node) {
+		struct sg_table *sgt = attach->priv;
+
+		if (sgt)
+			dma_sync_sgtable_for_device(attach->dev, sgt, dir);
+	}
+
+	if (shmem->vaddr)
+		flush_kernel_vmap_range(shmem->vaddr, shmem->base.size);
+
+	if (shmem->sgt)
+		dma_sync_sgtable_for_device(dev->dev, shmem->sgt, dir);
+
+	dma_resv_unlock(obj->resv);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_end_cpu_access);
+
 MODULE_DESCRIPTION("DRM SHMEM memory-management helpers");
 MODULE_IMPORT_NS("DMA_BUF");
 MODULE_LICENSE("GPL v2");
diff --git a/include/drm/drm_gem_shmem_helper.h b/include/drm/drm_gem_shmem_helper.h
index 589f7bfe7506..075275d6b2fd 100644
--- a/include/drm/drm_gem_shmem_helper.h
+++ b/include/drm/drm_gem_shmem_helper.h
@@ -291,6 +291,16 @@ int drm_gem_shmem_dumb_create(struct drm_file *file, struct drm_device *dev,
 			      struct drm_mode_create_dumb *args);
 struct drm_gem_object *drm_gem_shmem_prime_import_no_map(struct drm_device *dev,
 							 struct dma_buf *buf);
+struct sg_table *
+drm_gem_shmem_prime_map_dma_buf(struct dma_buf_attachment *attach,
+				enum dma_data_direction dir);
+void drm_gem_shmem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
+				       struct sg_table *sgt,
+				       enum dma_data_direction dir);
+int drm_gem_shmem_prime_begin_cpu_access(struct dma_buf *dma_buf,
+					 enum dma_data_direction dir);
+int drm_gem_shmem_prime_end_cpu_access(struct dma_buf *dma_buf,
+				       enum dma_data_direction dir);
 
 /**
  * DRM_GEM_SHMEM_DRIVER_OPS - Default shmem GEM operations
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 03/16] drm/shmem: Add a drm_gem_shmem_sync() helper
  2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
  2025-10-30 14:05 ` [PATCH v5 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops Boris Brezillon
  2025-10-30 14:05 ` [PATCH v5 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation Boris Brezillon
@ 2025-10-30 14:05 ` Boris Brezillon
  2025-10-30 14:05 ` [PATCH v5 04/16] drm/panthor: Provide a custom dma_buf implementation Boris Brezillon
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, kernel, Boris Brezillon

From: Faith Ekstrand <faith.ekstrand@collabora.com>

This enables syncing mapped GEM objects between the CPU and GPU via calls
to dma_sync_*().  It's a bit annoying as it requires walking the sg_table
so it's best if every driver doesn't hand-roll it.

v2:
- s/drm_gem_shmem_sync_mmap/drm_gem_shmem_sync/
- Change the prototype to match drm_gem_object_funcs::sync()
- Add a wrapper for drm_gem_object_funcs::sync()

v3:
- No changes

v4:
- Add Steve's R-b

v5:
- Change the semantics of the drm_gem_shmem_sync() helper to better
  reflect the UMD cache flush/flush+invalidate semantics (discussed
  with Faith)
- Drop R-bs

Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/drm_gem_shmem_helper.c | 93 ++++++++++++++++++++++++++
 include/drm/drm_gem_shmem_helper.h     | 14 ++++
 2 files changed, 107 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index e49c75739c20..d9266e22a0dc 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -690,6 +690,99 @@ int drm_gem_shmem_mmap(struct drm_gem_shmem_object *shmem, struct vm_area_struct
 }
 EXPORT_SYMBOL_GPL(drm_gem_shmem_mmap);
 
+/**
+ * drm_gem_shmem_sync - Sync CPU-mapped data
+ * @shmem: shmem GEM object
+ * @offset: Offset into the GEM object
+ * @size: Size of the area to sync
+ * @type: Type of synchronization
+ *
+ * Returns:
+ * 0 on success or a negative error code on failure.
+ */
+int drm_gem_shmem_sync(struct drm_gem_shmem_object *shmem, size_t offset,
+		       size_t size, enum drm_gem_shmem_sync_type type)
+{
+	const struct drm_device *dev = shmem->base.dev;
+	struct sg_table *sgt;
+	struct scatterlist *sgl;
+	unsigned int count;
+
+	/* Make sure the range is in bounds. */
+	if (offset + size < offset || offset + size > shmem->base.size)
+		return -EINVAL;
+
+	/* Disallow CPU-cache maintenance on imported buffers. */
+	if (drm_gem_is_imported(&shmem->base))
+		return -EINVAL;
+
+	switch (type) {
+	case DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH:
+	case DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE:
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	/* Don't bother if it's WC-mapped */
+	if (shmem->map_wc)
+		return 0;
+
+	/* Nothing to do if the size is zero. */
+	if (size == 0)
+		return 0;
+
+	sgt = drm_gem_shmem_get_pages_sgt(shmem);
+	if (IS_ERR(sgt))
+		return PTR_ERR(sgt);
+
+	for_each_sgtable_dma_sg(sgt, sgl, count) {
+		if (size == 0)
+			break;
+
+		dma_addr_t paddr = sg_dma_address(sgl);
+		size_t len = sg_dma_len(sgl);
+
+		if (len <= offset) {
+			offset -= len;
+			continue;
+		}
+
+		paddr += offset;
+		len -= offset;
+		len = min_t(size_t, len, size);
+		size -= len;
+		offset = 0;
+
+		/* It's unclear whether dma_sync_xxx() is the right API to do CPU
+		 * cache maintenance given an IOMMU can register their own
+		 * implementation doing more than just CPU cache flushes/invalidation,
+		 * and what we really care about here is CPU caches only, but that's
+		 * the best we have that is both arch-agnostic and does at least the
+		 * CPU cache maintenance on a <page,offset,size> tuple.
+		 *
+		 * Also, I wish we could do a single
+		 *
+		 *	dma_sync_single_for_device(BIDIR)
+		 *
+		 * and get a flush+invalidate, but that's not how it's implemented
+		 * in practice (at least on arm64), so we have to make it
+		 *
+		 *	dma_sync_single_for_device(TO_DEVICE)
+		 *	dma_sync_single_for_cpu(FROM_DEVICE)
+		 *
+		 * for the flush+invalidate case.
+		 */
+		dma_sync_single_for_device(dev->dev, paddr, len, DMA_TO_DEVICE);
+		if (type == DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE)
+			dma_sync_single_for_cpu(dev->dev, paddr, len, DMA_FROM_DEVICE);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_sync);
+
 /**
  * drm_gem_shmem_print_info() - Print &drm_gem_shmem_object info for debugfs
  * @shmem: shmem GEM object
diff --git a/include/drm/drm_gem_shmem_helper.h b/include/drm/drm_gem_shmem_helper.h
index 075275d6b2fd..b0b6d0104a9a 100644
--- a/include/drm/drm_gem_shmem_helper.h
+++ b/include/drm/drm_gem_shmem_helper.h
@@ -124,6 +124,20 @@ void drm_gem_shmem_vunmap_locked(struct drm_gem_shmem_object *shmem,
 				 struct iosys_map *map);
 int drm_gem_shmem_mmap(struct drm_gem_shmem_object *shmem, struct vm_area_struct *vma);
 
+/**
+ * enum enum drm_gem_shmem_sync_type - Type of synchronization
+ */
+enum drm_gem_shmem_sync_type {
+	/** DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH: Flush CPU caches */
+	DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH = 0,
+
+	/** DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE: Flush and invalidate CPU caches */
+	DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE,
+};
+
+int drm_gem_shmem_sync(struct drm_gem_shmem_object *shmem, size_t offset,
+		       size_t size, enum drm_gem_shmem_sync_type type);
+
 int drm_gem_shmem_pin_locked(struct drm_gem_shmem_object *shmem);
 void drm_gem_shmem_unpin_locked(struct drm_gem_shmem_object *shmem);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 04/16] drm/panthor: Provide a custom dma_buf implementation
  2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
                   ` (2 preceding siblings ...)
  2025-10-30 14:05 ` [PATCH v5 03/16] drm/shmem: Add a drm_gem_shmem_sync() helper Boris Brezillon
@ 2025-10-30 14:05 ` Boris Brezillon
  2025-10-30 14:05 ` [PATCH v5 05/16] drm/panthor: Fix panthor_gpu_coherency_set() Boris Brezillon
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, Boris Brezillon, kernel

Before we introduce cached CPU mappings, we want a dma_buf
implementation satisfying synchronization requests around CPU
accesses coming from a dma_buf exported by our driver. Let's
provide our own implementation relying on the default
gem_shmem_prime helpers designed for that purpose.

v5:
- New patch

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_drv.c |  1 +
 drivers/gpu/drm/panthor/panthor_gem.c | 19 +++++++++++++++++++
 drivers/gpu/drm/panthor/panthor_gem.h |  3 +++
 3 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index fb4b293f17f0..99a4534c0074 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1618,6 +1618,7 @@ static const struct drm_driver panthor_drm_driver = {
 
 	.gem_create_object = panthor_gem_create_object,
 	.gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table,
+	.gem_prime_get_dma_buf_ops = panthor_gem_prime_get_dma_buf_ops,
 #ifdef CONFIG_DEBUG_FS
 	.debugfs_init = panthor_debugfs_init,
 #endif
diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
index 156c7a0b62a2..160692e45f44 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -191,6 +191,25 @@ panthor_kernel_bo_create(struct panthor_device *ptdev, struct panthor_vm *vm,
 	return ERR_PTR(ret);
 }
 
+static const struct dma_buf_ops panthor_dma_buf_ops = {
+	.attach = drm_gem_map_attach,
+	.detach = drm_gem_map_detach,
+	.map_dma_buf = drm_gem_shmem_prime_map_dma_buf,
+	.unmap_dma_buf = drm_gem_shmem_prime_unmap_dma_buf,
+	.release = drm_gem_dmabuf_release,
+	.mmap = drm_gem_dmabuf_mmap,
+	.vmap = drm_gem_dmabuf_vmap,
+	.vunmap = drm_gem_dmabuf_vunmap,
+	.begin_cpu_access = drm_gem_shmem_prime_begin_cpu_access,
+	.end_cpu_access = drm_gem_shmem_prime_end_cpu_access,
+};
+
+const struct dma_buf_ops *
+panthor_gem_prime_get_dma_buf_ops(struct drm_device *dev)
+{
+	return &panthor_dma_buf_ops;
+}
+
 static struct dma_buf *
 panthor_gem_prime_export(struct drm_gem_object *obj, int flags)
 {
diff --git a/drivers/gpu/drm/panthor/panthor_gem.h b/drivers/gpu/drm/panthor/panthor_gem.h
index 80c6e24112d0..528088839468 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.h
+++ b/drivers/gpu/drm/panthor/panthor_gem.h
@@ -147,6 +147,9 @@ panthor_gem_create_with_handle(struct drm_file *file,
 void panthor_gem_bo_set_label(struct drm_gem_object *obj, const char *label);
 void panthor_gem_kernel_bo_set_label(struct panthor_kernel_bo *bo, const char *label);
 
+const struct dma_buf_ops *
+panthor_gem_prime_get_dma_buf_ops(struct drm_device *dev);
+
 static inline u64
 panthor_kernel_bo_gpuva(struct panthor_kernel_bo *bo)
 {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 05/16] drm/panthor: Fix panthor_gpu_coherency_set()
  2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
                   ` (3 preceding siblings ...)
  2025-10-30 14:05 ` [PATCH v5 04/16] drm/panthor: Provide a custom dma_buf implementation Boris Brezillon
@ 2025-10-30 14:05 ` Boris Brezillon
  2025-10-30 14:05 ` [PATCH v5 06/16] drm/panthor: Expose the selected coherency protocol to the UMD Boris Brezillon
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, Boris Brezillon, kernel, Akash Goel

GPU_COHERENCY_PROTOCOL takes one of GPU_COHERENCY_xx
not BIT(GPU_COHERENCY_xx).

v3:
- New commit

v4:
- Add Steve's R-b

v5:
- No changes

Cc: Akash Goel <akash.goel@arm.com>
Fixes: dd7db8d911a1 ("drm/panthor: Explicitly set the coherency mode")
Reported-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
 drivers/gpu/drm/panthor/panthor_gpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
index 9d98720ce03f..7f9a28e90409 100644
--- a/drivers/gpu/drm/panthor/panthor_gpu.c
+++ b/drivers/gpu/drm/panthor/panthor_gpu.c
@@ -49,7 +49,7 @@ struct panthor_gpu {
 static void panthor_gpu_coherency_set(struct panthor_device *ptdev)
 {
 	gpu_write(ptdev, GPU_COHERENCY_PROTOCOL,
-		ptdev->coherent ? GPU_COHERENCY_PROT_BIT(ACE_LITE) : GPU_COHERENCY_NONE);
+		  ptdev->coherent ? GPU_COHERENCY_ACE_LITE : GPU_COHERENCY_NONE);
 }
 
 static void panthor_gpu_l2_config_set(struct panthor_device *ptdev)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 06/16] drm/panthor: Expose the selected coherency protocol to the UMD
  2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
                   ` (4 preceding siblings ...)
  2025-10-30 14:05 ` [PATCH v5 05/16] drm/panthor: Fix panthor_gpu_coherency_set() Boris Brezillon
@ 2025-10-30 14:05 ` Boris Brezillon
  2025-10-30 14:05 ` [PATCH v5 07/16] drm/panthor: Add a PANTHOR_BO_SYNC ioctl Boris Brezillon
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, Boris Brezillon, kernel

If we want to be able to skip CPU cache maintenance operations on
CPU-cached mappings, the UMD needs to know the kind of coherency
in place. Add a field to drm_panthor_gpu_info to do that. We can re-use
a padding field for that since this object is write-only from the
KMD perspective, and the UMD should just ignore it.

v2:
- New commit

v3:
- Make coherency protocol a real enum, not a bitmask
- Add BUILD_BUG_ON()s to make sure the values in panthor_regs.h and
  those exposed through the uAPI match

v4:
- Add Steve's R-b

v5:
- No changes

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
 drivers/gpu/drm/panthor/panthor_device.c | 10 +++++-
 drivers/gpu/drm/panthor/panthor_gpu.c    |  2 +-
 include/uapi/drm/panthor_drm.h           | 39 ++++++++++++++++++++++--
 3 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
index c7033d82cef5..492585014e9f 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -25,6 +25,12 @@
 
 static int panthor_gpu_coherency_init(struct panthor_device *ptdev)
 {
+	BUILD_BUG_ON(GPU_COHERENCY_NONE != DRM_PANTHOR_GPU_COHERENCY_NONE);
+	BUILD_BUG_ON(GPU_COHERENCY_ACE_LITE != DRM_PANTHOR_GPU_COHERENCY_ACE_LITE);
+	BUILD_BUG_ON(GPU_COHERENCY_ACE != DRM_PANTHOR_GPU_COHERENCY_ACE);
+
+	/* Start with no coherency, and update it if the device is flagged coherent. */
+	ptdev->gpu_info.selected_coherency = GPU_COHERENCY_NONE;
 	ptdev->coherent = device_get_dma_attr(ptdev->base.dev) == DEV_DMA_COHERENT;
 
 	if (!ptdev->coherent)
@@ -34,8 +40,10 @@ static int panthor_gpu_coherency_init(struct panthor_device *ptdev)
 	 * ACE protocol has never been supported for command stream frontend GPUs.
 	 */
 	if ((gpu_read(ptdev, GPU_COHERENCY_FEATURES) &
-		      GPU_COHERENCY_PROT_BIT(ACE_LITE)))
+		      GPU_COHERENCY_PROT_BIT(ACE_LITE))) {
+		ptdev->gpu_info.selected_coherency = GPU_COHERENCY_ACE_LITE;
 		return 0;
+	}
 
 	drm_err(&ptdev->base, "Coherency not supported by the device");
 	return -ENOTSUPP;
diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c
index 7f9a28e90409..a95c0b94ef58 100644
--- a/drivers/gpu/drm/panthor/panthor_gpu.c
+++ b/drivers/gpu/drm/panthor/panthor_gpu.c
@@ -49,7 +49,7 @@ struct panthor_gpu {
 static void panthor_gpu_coherency_set(struct panthor_device *ptdev)
 {
 	gpu_write(ptdev, GPU_COHERENCY_PROTOCOL,
-		  ptdev->coherent ? GPU_COHERENCY_ACE_LITE : GPU_COHERENCY_NONE);
+		  ptdev->gpu_info.selected_coherency);
 }
 
 static void panthor_gpu_l2_config_set(struct panthor_device *ptdev)
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index 467d365ed7ba..f0f637e0631d 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -245,6 +245,26 @@ enum drm_panthor_dev_query_type {
 	DRM_PANTHOR_DEV_QUERY_GROUP_PRIORITIES_INFO,
 };
 
+/**
+ * enum drm_panthor_gpu_coherency: Type of GPU coherency
+ */
+enum drm_panthor_gpu_coherency {
+	/**
+	 * @DRM_PANTHOR_GPU_COHERENCY_ACE_LITE: ACE Lite coherency.
+	 */
+	DRM_PANTHOR_GPU_COHERENCY_ACE_LITE = 0,
+
+	/**
+	 * @DRM_PANTHOR_GPU_COHERENCY_ACE_LITE: ACE coherency.
+	 */
+	DRM_PANTHOR_GPU_COHERENCY_ACE = 1,
+
+	/**
+	 * @DRM_PANTHOR_GPU_COHERENCY_NONE: No coherency.
+	 */
+	DRM_PANTHOR_GPU_COHERENCY_NONE = 31,
+};
+
 /**
  * struct drm_panthor_gpu_info - GPU information
  *
@@ -301,7 +321,16 @@ struct drm_panthor_gpu_info {
 	 */
 	__u32 thread_max_barrier_size;
 
-	/** @coherency_features: Coherency features. */
+	/**
+	 * @coherency_features: Coherency features.
+	 *
+	 * Combination of drm_panthor_gpu_coherency flags.
+	 *
+	 * Note that this is just what the coherency protocols supported by the
+	 * GPU, but the actual coherency in place depends on the SoC
+	 * integration and is reflected by
+	 * drm_panthor_gpu_info::selected_coherency.
+	 */
 	__u32 coherency_features;
 
 	/** @texture_features: Texture features. */
@@ -310,8 +339,12 @@ struct drm_panthor_gpu_info {
 	/** @as_present: Bitmask encoding the number of address-space exposed by the MMU. */
 	__u32 as_present;
 
-	/** @pad0: MBZ. */
-	__u32 pad0;
+	/**
+	 * @select_coherency: Coherency selected for this device.
+	 *
+	 * One of drm_panthor_gpu_coherency.
+	 */
+	__u32 selected_coherency;
 
 	/** @shader_present: Bitmask encoding the shader cores exposed by the GPU. */
 	__u64 shader_present;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 07/16] drm/panthor: Add a PANTHOR_BO_SYNC ioctl
  2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
                   ` (5 preceding siblings ...)
  2025-10-30 14:05 ` [PATCH v5 06/16] drm/panthor: Expose the selected coherency protocol to the UMD Boris Brezillon
@ 2025-10-30 14:05 ` Boris Brezillon
  2025-10-31  7:25   ` Marcin Ślusarz
  2025-11-03 20:42   ` Akash Goel
  2025-10-30 14:05 ` [PATCH v5 08/16] drm/panthor: Add an ioctl to query BO flags Boris Brezillon
                   ` (8 subsequent siblings)
  15 siblings, 2 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, Boris Brezillon, kernel

This will be used by the UMD to synchronize CPU-cached mappings when
the UMD can't do it directly (no usermode cache maintenance instruction
on Arm32).

v2:
- Change the flags so they better match the drm_gem_shmem_sync()
  semantics

v3:
- Add Steve's R-b

v4:
- No changes

v5:
- Drop Steve's R-b (the semantics changes call for a new review)

Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_drv.c | 42 +++++++++++++++++++++-
 drivers/gpu/drm/panthor/panthor_gem.c | 21 +++++++++++
 drivers/gpu/drm/panthor/panthor_gem.h |  3 ++
 include/uapi/drm/panthor_drm.h        | 52 +++++++++++++++++++++++++++
 4 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 99a4534c0074..cad5c4270b04 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -175,7 +175,8 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
 		 PANTHOR_UOBJ_DECL(struct drm_panthor_sync_op, timeline_value), \
 		 PANTHOR_UOBJ_DECL(struct drm_panthor_queue_submit, syncs), \
 		 PANTHOR_UOBJ_DECL(struct drm_panthor_queue_create, ringbuf_size), \
-		 PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs))
+		 PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs), \
+		 PANTHOR_UOBJ_DECL(struct drm_panthor_bo_sync_op, size))
 
 /**
  * PANTHOR_UOBJ_SET() - Copy a kernel object to a user object.
@@ -1394,6 +1395,44 @@ static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev,
 	return 0;
 }
 
+#define PANTHOR_BO_SYNC_OP_FLAGS                                      \
+	(DRM_PANTHOR_BO_SYNC_FOR_DEV | DRM_PANTHOR_BO_SYNC_FOR_READ | \
+	 DRM_PANTHOR_BO_SYNC_FOR_WRITE)
+
+static int panthor_ioctl_bo_sync(struct drm_device *ddev, void *data,
+				 struct drm_file *file)
+{
+	struct drm_panthor_bo_sync *args = data;
+	struct drm_panthor_bo_sync_op *ops;
+	struct drm_gem_object *obj;
+	int ret = 0;
+
+	ret = PANTHOR_UOBJ_GET_ARRAY(ops, &args->ops);
+	if (ret)
+		return ret;
+
+	for (u32 i = 0; i < args->ops.count; i++) {
+		obj = drm_gem_object_lookup(file, ops[i].handle);
+		if (!obj) {
+			ret = -ENOENT;
+			goto err_ops;
+		}
+
+		ret = panthor_gem_sync(obj, ops[i].type, ops[i].offset,
+				       ops[i].size);
+
+		drm_gem_object_put(obj);
+
+		if (ret)
+			goto err_ops;
+	}
+
+err_ops:
+	kvfree(ops);
+
+	return ret;
+}
+
 static int
 panthor_open(struct drm_device *ddev, struct drm_file *file)
 {
@@ -1468,6 +1507,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
 	PANTHOR_IOCTL(GROUP_SUBMIT, group_submit, DRM_RENDER_ALLOW),
 	PANTHOR_IOCTL(BO_SET_LABEL, bo_set_label, DRM_RENDER_ALLOW),
 	PANTHOR_IOCTL(SET_USER_MMIO_OFFSET, set_user_mmio_offset, DRM_RENDER_ALLOW),
+	PANTHOR_IOCTL(BO_SYNC, bo_sync, DRM_RENDER_ALLOW),
 };
 
 static int panthor_mmap(struct file *filp, struct vm_area_struct *vma)
diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
index 160692e45f44..1b1e98c61b8c 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -357,6 +357,27 @@ panthor_gem_kernel_bo_set_label(struct panthor_kernel_bo *bo, const char *label)
 	panthor_gem_bo_set_label(bo->obj, str);
 }
 
+int
+panthor_gem_sync(struct drm_gem_object *obj, u32 type,
+		 u64 offset, u64 size)
+{
+	enum drm_gem_shmem_sync_type shmem_sync_type;
+	struct panthor_gem_object *bo = to_panthor_bo(obj);
+
+	switch (type) {
+	case DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH:
+		shmem_sync_type = DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH;
+		break;
+	case DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE:
+		shmem_sync_type = DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return drm_gem_shmem_sync(&bo->base, offset, size, shmem_sync_type);
+}
+
 #ifdef CONFIG_DEBUG_FS
 struct gem_size_totals {
 	size_t size;
diff --git a/drivers/gpu/drm/panthor/panthor_gem.h b/drivers/gpu/drm/panthor/panthor_gem.h
index 528088839468..8705c492c5b6 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.h
+++ b/drivers/gpu/drm/panthor/panthor_gem.h
@@ -147,6 +147,9 @@ panthor_gem_create_with_handle(struct drm_file *file,
 void panthor_gem_bo_set_label(struct drm_gem_object *obj, const char *label);
 void panthor_gem_kernel_bo_set_label(struct panthor_kernel_bo *bo, const char *label);
 
+int panthor_gem_sync(struct drm_gem_object *obj,
+		     u32 type, u64 offset, u64 size);
+
 const struct dma_buf_ops *
 panthor_gem_prime_get_dma_buf_ops(struct drm_device *dev);
 
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index f0f637e0631d..bb12760abe99 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -144,6 +144,9 @@ enum drm_panthor_ioctl_id {
 	 * pgoff_t size.
 	 */
 	DRM_PANTHOR_SET_USER_MMIO_OFFSET,
+
+	/** @DRM_PANTHOR_BO_SYNC: Sync BO data to/from the device */
+	DRM_PANTHOR_BO_SYNC,
 };
 
 /**
@@ -1073,6 +1076,53 @@ struct drm_panthor_set_user_mmio_offset {
 	__u64 offset;
 };
 
+/**
+ * enum drm_panthor_bo_sync_op_type - BO sync type
+ */
+enum drm_panthor_bo_sync_op_type {
+	/** @DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH: Flush CPU caches. */
+	DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH = 0,
+
+	/** @DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE: Flush and invalidate CPU caches. */
+	DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE = 1,
+};
+
+/**
+ * struct drm_panthor_bo_sync_op - BO map sync op
+ */
+struct drm_panthor_bo_sync_op {
+	/** @handle: Handle of the buffer object to sync. */
+	__u32 handle;
+
+	/** @type: Type of operation. */
+	__u32 type;
+
+	/**
+	 * @offset: Offset into the BO at which the sync range starts.
+	 *
+	 * This will be rounded down to the nearest cache line as needed.
+	 */
+	__u64 offset;
+
+	/**
+	 * @size: Size of the range to sync
+	 *
+	 * @size + @offset will be rounded up to the nearest cache line as
+	 * needed.
+	 */
+	__u64 size;
+};
+
+/**
+ * struct drm_panthor_bo_sync - BO map sync request
+ */
+struct drm_panthor_bo_sync {
+	/**
+	 * @ops: Array of struct drm_panthor_bo_sync_op sync operations.
+	 */
+	struct drm_panthor_obj_array ops;
+};
+
 /**
  * DRM_IOCTL_PANTHOR() - Build a Panthor IOCTL number
  * @__access: Access type. Must be R, W or RW.
@@ -1119,6 +1169,8 @@ enum {
 		DRM_IOCTL_PANTHOR(WR, BO_SET_LABEL, bo_set_label),
 	DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET =
 		DRM_IOCTL_PANTHOR(WR, SET_USER_MMIO_OFFSET, set_user_mmio_offset),
+	DRM_IOCTL_PANTHOR_BO_SYNC =
+		DRM_IOCTL_PANTHOR(WR, BO_SYNC, bo_sync),
 };
 
 #if defined(__cplusplus)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 08/16] drm/panthor: Add an ioctl to query BO flags
  2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
                   ` (6 preceding siblings ...)
  2025-10-30 14:05 ` [PATCH v5 07/16] drm/panthor: Add a PANTHOR_BO_SYNC ioctl Boris Brezillon
@ 2025-10-30 14:05 ` Boris Brezillon
  2025-10-30 14:05 ` [PATCH v5 09/16] drm/panthor: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, Boris Brezillon, kernel

This is useful when importing BOs, so we can know about cacheability
and flush the caches when needed.

We can also know when the buffer comes from a different subsystem and
take proper actions (avoid CPU mappings, or do kernel-based syncs
instead of userland cache flushes).

v2:
- New commit

v3:
- Add Steve's R-b

v4:
- No changes

v5:
- No changes

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
 drivers/gpu/drm/panthor/panthor_drv.c | 24 +++++++++++
 include/uapi/drm/panthor_drm.h        | 57 +++++++++++++++++++++++++++
 2 files changed, 81 insertions(+)

diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index cad5c4270b04..c07fc5dcd4a1 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1433,6 +1433,29 @@ static int panthor_ioctl_bo_sync(struct drm_device *ddev, void *data,
 	return ret;
 }
 
+static int panthor_ioctl_bo_query_info(struct drm_device *ddev, void *data,
+				       struct drm_file *file)
+{
+	struct drm_panthor_bo_query_info *args = data;
+	struct panthor_gem_object *bo;
+	struct drm_gem_object *obj;
+
+	obj = drm_gem_object_lookup(file, args->handle);
+	if (!obj)
+		return -ENOENT;
+
+	bo = to_panthor_bo(obj);
+	args->pad = 0;
+	args->create_flags = bo->flags;
+
+	args->extra_flags = 0;
+	if (drm_gem_is_imported(&bo->base.base))
+		args->extra_flags |= DRM_PANTHOR_BO_IS_IMPORTED;
+
+	drm_gem_object_put(obj);
+	return 0;
+}
+
 static int
 panthor_open(struct drm_device *ddev, struct drm_file *file)
 {
@@ -1508,6 +1531,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
 	PANTHOR_IOCTL(BO_SET_LABEL, bo_set_label, DRM_RENDER_ALLOW),
 	PANTHOR_IOCTL(SET_USER_MMIO_OFFSET, set_user_mmio_offset, DRM_RENDER_ALLOW),
 	PANTHOR_IOCTL(BO_SYNC, bo_sync, DRM_RENDER_ALLOW),
+	PANTHOR_IOCTL(BO_QUERY_INFO, bo_query_info, DRM_RENDER_ALLOW),
 };
 
 static int panthor_mmap(struct file *filp, struct vm_area_struct *vma)
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index bb12760abe99..7eec9f922183 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -147,6 +147,13 @@ enum drm_panthor_ioctl_id {
 
 	/** @DRM_PANTHOR_BO_SYNC: Sync BO data to/from the device */
 	DRM_PANTHOR_BO_SYNC,
+
+	/**
+	 * @DRM_PANTHOR_BO_QUERY_INFO: Query information about a BO.
+	 *
+	 * This is useful for imported BOs.
+	 */
+	DRM_PANTHOR_BO_QUERY_INFO,
 };
 
 /**
@@ -1123,6 +1130,54 @@ struct drm_panthor_bo_sync {
 	struct drm_panthor_obj_array ops;
 };
 
+/**
+ * enum drm_panthor_bo_extra_flags - Set of flags returned on a BO_QUERY_INFO request
+ *
+ * Those are flags reflecting BO properties that are not directly coming from the flags
+ * passed are creation time, or information on BOs that were imported from other drivers.
+ */
+enum drm_panthor_bo_extra_flags {
+	/**
+	 * @DRM_PANTHOR_BO_IS_IMPORTED: BO has been imported from an external driver.
+	 *
+	 * Note that imported dma-buf handles are not flagged as imported if they
+	 * where exported by panthor. Only buffers that are coming from other drivers
+	 * (dma heaps, other GPUs, display controllers, V4L, ...).
+	 *
+	 * It's also important to note that all imported BOs are mapped cached and can't
+	 * be considered IO-coherent even if the GPU is. This means they require explicit
+	 * syncs that must go through the DRM_PANTHOR_BO_SYNC ioctl (userland cache
+	 * maintenance is not allowed in that case, because extra operations might be
+	 * needed to make changes visible to the CPU/device, like buffer migration when the
+	 * exporter is a GPU with its own VRAM).
+	 */
+	DRM_PANTHOR_BO_IS_IMPORTED = (1 << 0),
+};
+
+/**
+ * struct drm_panthor_bo_query_info - Query BO info
+ */
+struct drm_panthor_bo_query_info {
+	/** @handle: Handle of the buffer object to query flags on. */
+	__u32 handle;
+
+	/**
+	 * @extra_flags: Combination of enum drm_panthor_bo_extra_flags flags.
+	 */
+	__u32 extra_flags;
+
+	/**
+	 * @create_flags: Flags passed at creation time.
+	 *
+	 * Combination of enum drm_panthor_bo_flags flags.
+	 * Will be zero if the buffer comes from a different driver.
+	 */
+	__u32 create_flags;
+
+	/** @pad: Will be zero on return. */
+	__u32 pad;
+};
+
 /**
  * DRM_IOCTL_PANTHOR() - Build a Panthor IOCTL number
  * @__access: Access type. Must be R, W or RW.
@@ -1171,6 +1226,8 @@ enum {
 		DRM_IOCTL_PANTHOR(WR, SET_USER_MMIO_OFFSET, set_user_mmio_offset),
 	DRM_IOCTL_PANTHOR_BO_SYNC =
 		DRM_IOCTL_PANTHOR(WR, BO_SYNC, bo_sync),
+	DRM_IOCTL_PANTHOR_BO_QUERY_INFO =
+		DRM_IOCTL_PANTHOR(WR, BO_QUERY_INFO, bo_query_info),
 };
 
 #if defined(__cplusplus)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 09/16] drm/panthor: Add flag to map GEM object Write-Back Cacheable
  2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
                   ` (7 preceding siblings ...)
  2025-10-30 14:05 ` [PATCH v5 08/16] drm/panthor: Add an ioctl to query BO flags Boris Brezillon
@ 2025-10-30 14:05 ` Boris Brezillon
  2025-11-03 16:43   ` Akash Goel
  2025-10-30 14:05 ` [PATCH v5 10/16] drm/panthor: Bump the driver version to 1.6 Boris Brezillon
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, Loïc Molinari, kernel, Boris Brezillon

From: Loïc Molinari <loic.molinari@collabora.com>

Will be used by the UMD to optimize CPU accesses to buffers
that are frequently read by the CPU, or on which the access
pattern makes non-cacheable mappings inefficient.

Mapping buffers CPU-cached implies taking care of the CPU
cache maintenance in the UMD, unless the GPU is IO coherent.

v2:
- Add more to the commit message
- Tweak the doc
- Make sure we sync the section of the BO pointing to the CS
  syncobj before we read its seqno

v3:
- Fix formatting/spelling issues

v4:
- Add Steve's R-b

v5:
- Drop Steve's R-b (changes in the ioctl semantics requiring
  new review)

Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_drv.c   |  7 ++++-
 drivers/gpu/drm/panthor/panthor_gem.c   | 37 +++++++++++++++++++++++--
 drivers/gpu/drm/panthor/panthor_sched.c | 18 ++++++++++--
 include/uapi/drm/panthor_drm.h          | 12 ++++++++
 4 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index c07fc5dcd4a1..4e915f5ef3fa 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -900,7 +900,8 @@ static int panthor_ioctl_vm_destroy(struct drm_device *ddev, void *data,
 	return panthor_vm_pool_destroy_vm(pfile->vms, args->id);
 }
 
-#define PANTHOR_BO_FLAGS		DRM_PANTHOR_BO_NO_MMAP
+#define PANTHOR_BO_FLAGS		(DRM_PANTHOR_BO_NO_MMAP | \
+					 DRM_PANTHOR_BO_WB_MMAP)
 
 static int panthor_ioctl_bo_create(struct drm_device *ddev, void *data,
 				   struct drm_file *file)
@@ -919,6 +920,10 @@ static int panthor_ioctl_bo_create(struct drm_device *ddev, void *data,
 		goto out_dev_exit;
 	}
 
+	if ((args->flags & DRM_PANTHOR_BO_NO_MMAP) &&
+	    (args->flags & DRM_PANTHOR_BO_WB_MMAP))
+		return -EINVAL;
+
 	if (args->exclusive_vm_id) {
 		vm = panthor_vm_pool_get_vm(pfile->vms, args->exclusive_vm_id);
 		if (!vm) {
diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
index 1b1e98c61b8c..479a779ee59d 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -58,6 +58,39 @@ static void panthor_gem_debugfs_set_usage_flags(struct panthor_gem_object *bo, u
 static void panthor_gem_debugfs_bo_init(struct panthor_gem_object *bo) {}
 #endif
 
+static bool
+should_map_wc(struct panthor_gem_object *bo, struct panthor_vm *exclusive_vm)
+{
+	struct panthor_device *ptdev = container_of(bo->base.base.dev, struct panthor_device, base);
+
+	/* We can't do uncached mappings if the device is coherent,
+	 * because the zeroing done by the shmem layer at page allocation
+	 * time happens on a cached mapping which isn't CPU-flushed (at least
+	 * not on Arm64 where the flush is deferred to PTE setup time, and
+	 * only done conditionally based on the mapping permissions). We can't
+	 * rely on dma_map_sgtable()/dma_sync_sgtable_for_xxx() either to flush
+	 * those, because they are NOPed if dma_dev_coherent() returns true.
+	 *
+	 * FIXME: Note that this problem is going to pop up again when we
+	 * decide to support mapping buffers with the NO_MMAP flag as
+	 * non-shareable (AKA buffers accessed only by the GPU), because we
+	 * need the same CPU flush to happen after page allocation, otherwise
+	 * there's a risk of data leak or late corruption caused by a dirty
+	 * cacheline being evicted. At this point we'll need a way to force
+	 * CPU cache maintenance regardless of whether the device is coherent
+	 * or not.
+	 */
+	if (ptdev->coherent)
+		return false;
+
+	/* Cached mappings are explicitly requested, so no write-combine. */
+	if (bo->flags & DRM_PANTHOR_BO_WB_MMAP)
+		return false;
+
+	/* The default is write-combine. */
+	return true;
+}
+
 static void panthor_gem_free_object(struct drm_gem_object *obj)
 {
 	struct panthor_gem_object *bo = to_panthor_bo(obj);
@@ -152,6 +185,7 @@ panthor_kernel_bo_create(struct panthor_device *ptdev, struct panthor_vm *vm,
 	bo = to_panthor_bo(&obj->base);
 	kbo->obj = &obj->base;
 	bo->flags = bo_flags;
+	bo->base.map_wc = should_map_wc(bo, vm);
 
 	if (vm == panthor_fw_vm(ptdev))
 		debug_flags |= PANTHOR_DEBUGFS_GEM_USAGE_FLAG_FW_MAPPED;
@@ -255,7 +289,6 @@ static const struct drm_gem_object_funcs panthor_gem_funcs = {
  */
 struct drm_gem_object *panthor_gem_create_object(struct drm_device *ddev, size_t size)
 {
-	struct panthor_device *ptdev = container_of(ddev, struct panthor_device, base);
 	struct panthor_gem_object *obj;
 
 	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
@@ -263,7 +296,6 @@ struct drm_gem_object *panthor_gem_create_object(struct drm_device *ddev, size_t
 		return ERR_PTR(-ENOMEM);
 
 	obj->base.base.funcs = &panthor_gem_funcs;
-	obj->base.map_wc = !ptdev->coherent;
 	mutex_init(&obj->label.lock);
 
 	panthor_gem_debugfs_bo_init(obj);
@@ -298,6 +330,7 @@ panthor_gem_create_with_handle(struct drm_file *file,
 
 	bo = to_panthor_bo(&shmem->base);
 	bo->flags = flags;
+	bo->base.map_wc = should_map_wc(bo, exclusive_vm);
 
 	if (exclusive_vm) {
 		bo->exclusive_vm_root_gem = panthor_vm_root_gem(exclusive_vm);
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index f5e01cb16cfc..7d5da5717de2 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -868,8 +868,11 @@ panthor_queue_get_syncwait_obj(struct panthor_group *group, struct panthor_queue
 	struct iosys_map map;
 	int ret;
 
-	if (queue->syncwait.kmap)
-		return queue->syncwait.kmap + queue->syncwait.offset;
+	if (queue->syncwait.kmap) {
+		bo = container_of(queue->syncwait.obj,
+				  struct panthor_gem_object, base.base);
+		goto out_sync;
+	}
 
 	bo = panthor_vm_get_bo_for_va(group->vm,
 				      queue->syncwait.gpu_va,
@@ -886,6 +889,17 @@ panthor_queue_get_syncwait_obj(struct panthor_group *group, struct panthor_queue
 	if (drm_WARN_ON(&ptdev->base, !queue->syncwait.kmap))
 		goto err_put_syncwait_obj;
 
+out_sync:
+	/* Make sure the CPU caches are invalidated before the seqno is read.
+	 * drm_gem_shmem_sync() is a NOP if map_wc=false, so no need to check
+	 * it here.
+	 */
+	drm_gem_shmem_sync(&bo->base, queue->syncwait.offset,
+			   queue->syncwait.sync64 ?
+				   sizeof(struct panthor_syncobj_64b) :
+				   sizeof(struct panthor_syncobj_32b),
+			   DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE);
+
 	return queue->syncwait.kmap + queue->syncwait.offset;
 
 err_put_syncwait_obj:
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index 7eec9f922183..57e2f5ffa03c 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -681,6 +681,18 @@ struct drm_panthor_vm_get_state {
 enum drm_panthor_bo_flags {
 	/** @DRM_PANTHOR_BO_NO_MMAP: The buffer object will never be CPU-mapped in userspace. */
 	DRM_PANTHOR_BO_NO_MMAP = (1 << 0),
+
+	/**
+	 * @DRM_PANTHOR_BO_WB_MMAP: Force "Write-Back Cacheable" CPU mapping.
+	 *
+	 * CPU map the buffer object in userspace by forcing the "Write-Back
+	 * Cacheable" cacheability attribute. The mapping otherwise uses the
+	 * "Non-Cacheable" attribute if the GPU is not IO coherent.
+	 *
+	 * Can't be set if exclusive_vm_id=0 (only private BOs can be mapped
+	 * cacheable).
+	 */
+	DRM_PANTHOR_BO_WB_MMAP = (1 << 1),
 };
 
 /**
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 10/16] drm/panthor: Bump the driver version to 1.6
  2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
                   ` (8 preceding siblings ...)
  2025-10-30 14:05 ` [PATCH v5 09/16] drm/panthor: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
@ 2025-10-30 14:05 ` Boris Brezillon
  2025-10-30 14:05 ` [PATCH v5 11/16] drm/panfrost: Provide a custom dma_buf implementation Boris Brezillon
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, kernel, Boris Brezillon

From: Faith Ekstrand <faith.ekstrand@collabora.com>

Bump the driver version to reflect the new cached-CPU mapping
capability.

v2:
- Quickly describe what the new version exposes in the commit message

v3:
- Add Steve's R-b

v4:
- No changes

v5:
- No changes

Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
 drivers/gpu/drm/panthor/panthor_drv.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 4e915f5ef3fa..2b2f2ae471a8 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1670,6 +1670,10 @@ static void panthor_debugfs_init(struct drm_minor *minor)
  * - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag
  * - 1.4 - adds DRM_IOCTL_PANTHOR_BO_SET_LABEL ioctl
  * - 1.5 - adds DRM_PANTHOR_SET_USER_MMIO_OFFSET ioctl
+ * - 1.6 - adds DRM_PANTHOR_BO_WB_MMAP flag
+ *       - adds DRM_IOCTL_PANTHOR_BO_SYNC ioctl
+ *       - adds DRM_IOCTL_PANTHOR_BO_QUERY_INFO ioctl
+ *       - adds drm_panthor_gpu_info::selected_coherency
  */
 static const struct drm_driver panthor_drm_driver = {
 	.driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ |
@@ -1683,7 +1687,7 @@ static const struct drm_driver panthor_drm_driver = {
 	.name = "panthor",
 	.desc = "Panthor DRM driver",
 	.major = 1,
-	.minor = 5,
+	.minor = 6,
 
 	.gem_create_object = panthor_gem_create_object,
 	.gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 11/16] drm/panfrost: Provide a custom dma_buf implementation
  2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
                   ` (9 preceding siblings ...)
  2025-10-30 14:05 ` [PATCH v5 10/16] drm/panthor: Bump the driver version to 1.6 Boris Brezillon
@ 2025-10-30 14:05 ` Boris Brezillon
  2025-10-30 14:05 ` [PATCH v5 12/16] drm/panfrost: Expose the selected coherency protocol to the UMD Boris Brezillon
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, Boris Brezillon, kernel

Before we introduce cached CPU mappings, we want a dma_buf
implementation satisfying synchronization requests around CPU
accesses coming from a dma_buf exported by our driver. Let's
provide our own implementation relying on the default
gem_shmem_prime helpers designed for that purpose.

v5:
- New patch

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_drv.c |  1 +
 drivers/gpu/drm/panfrost/panfrost_gem.c | 19 +++++++++++++++++++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  2 ++
 3 files changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 1c3c574cd64a..e3cdc0c95a56 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -852,6 +852,7 @@ static const struct drm_driver panfrost_drm_driver = {
 
 	.gem_create_object	= panfrost_gem_create_object,
 	.gem_prime_import_sg_table = panfrost_gem_prime_import_sg_table,
+	.gem_prime_get_dma_buf_ops = panfrost_gem_prime_get_dma_buf_ops,
 #ifdef CONFIG_DEBUG_FS
 	.debugfs_init = panfrost_debugfs_init,
 #endif
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 0528de674a4f..070ea7108af6 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -323,6 +323,25 @@ panfrost_gem_create(struct drm_device *dev, size_t size, u32 flags)
 	return bo;
 }
 
+static const struct dma_buf_ops panfrost_dma_buf_ops = {
+	.attach = drm_gem_map_attach,
+	.detach = drm_gem_map_detach,
+	.map_dma_buf = drm_gem_shmem_prime_map_dma_buf,
+	.unmap_dma_buf = drm_gem_shmem_prime_unmap_dma_buf,
+	.release = drm_gem_dmabuf_release,
+	.mmap = drm_gem_dmabuf_mmap,
+	.vmap = drm_gem_dmabuf_vmap,
+	.vunmap = drm_gem_dmabuf_vunmap,
+	.begin_cpu_access = drm_gem_shmem_prime_begin_cpu_access,
+	.end_cpu_access = drm_gem_shmem_prime_end_cpu_access,
+};
+
+const struct dma_buf_ops *
+panfrost_gem_prime_get_dma_buf_ops(struct drm_device *dev)
+{
+	return &panfrost_dma_buf_ops;
+}
+
 struct drm_gem_object *
 panfrost_gem_prime_import_sg_table(struct drm_device *dev,
 				   struct dma_buf_attachment *attach,
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
index 8de3e76f2717..c63264464271 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -130,6 +130,8 @@ struct drm_gem_object *
 panfrost_gem_prime_import_sg_table(struct drm_device *dev,
 				   struct dma_buf_attachment *attach,
 				   struct sg_table *sgt);
+const struct dma_buf_ops *
+panfrost_gem_prime_get_dma_buf_ops(struct drm_device *dev);
 
 struct panfrost_gem_object *
 panfrost_gem_create(struct drm_device *dev, size_t size, u32 flags);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 12/16] drm/panfrost: Expose the selected coherency protocol to the UMD
  2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
                   ` (10 preceding siblings ...)
  2025-10-30 14:05 ` [PATCH v5 11/16] drm/panfrost: Provide a custom dma_buf implementation Boris Brezillon
@ 2025-10-30 14:05 ` Boris Brezillon
  2025-10-30 14:05 ` [PATCH v5 13/16] drm/panfrost: Add a PANFROST_SYNC_BO ioctl Boris Brezillon
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, Boris Brezillon, kernel

Will be needed if we want to skip CPU cache maintenance operations when
the GPU can snoop CPU caches.

v2:
- New commit

v3:
- Fix the coherency values (enum instead of bitmask)

v4:
- Fix init/test on coherency_features

v5:
- No changes

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_device.h |  1 +
 drivers/gpu/drm/panfrost/panfrost_drv.c    |  1 +
 drivers/gpu/drm/panfrost/panfrost_gpu.c    | 26 +++++++++++++++++++---
 drivers/gpu/drm/panfrost/panfrost_regs.h   | 10 +++++++--
 include/uapi/drm/panfrost_drm.h            |  7 ++++++
 5 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
index e61c4329fd07..0f3992412205 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -79,6 +79,7 @@ struct panfrost_features {
 	u32 thread_max_workgroup_sz;
 	u32 thread_max_barrier_sz;
 	u32 coherency_features;
+	u32 selected_coherency;
 	u32 afbc_features;
 	u32 texture_features[4];
 	u32 js_features[16];
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index e3cdc0c95a56..6c08a4907eea 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -94,6 +94,7 @@ static int panfrost_ioctl_get_param(struct drm_device *ddev, void *data, struct
 		PANFROST_FEATURE_ARRAY(JS_FEATURES, js_features, 15);
 		PANFROST_FEATURE(NR_CORE_GROUPS, nr_core_groups);
 		PANFROST_FEATURE(THREAD_TLS_ALLOC, thread_tls_alloc);
+		PANFROST_FEATURE(SELECTED_COHERENCY, selected_coherency);
 
 	case DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP:
 		ret = panfrost_ioctl_query_timestamp(pfdev, &param->value);
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index 8d049a07d393..59e3f25b2dc2 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -157,8 +157,8 @@ static void panfrost_gpu_init_quirks(struct panfrost_device *pfdev)
 	    pfdev->features.revision >= 0x2000)
 		quirks |= JM_MAX_JOB_THROTTLE_LIMIT << JM_JOB_THROTTLE_LIMIT_SHIFT;
 	else if (panfrost_model_eq(pfdev, 0x6000) &&
-		 pfdev->features.coherency_features == COHERENCY_ACE)
-		quirks |= (COHERENCY_ACE_LITE | COHERENCY_ACE) <<
+		 pfdev->features.coherency_features == BIT(COHERENCY_ACE))
+		quirks |= (BIT(COHERENCY_ACE_LITE) | BIT(COHERENCY_ACE)) <<
 			   JM_FORCE_COHERENCY_FEATURES_SHIFT;
 
 	if (panfrost_has_hw_feature(pfdev, HW_FEATURE_IDVS_GROUP_SIZE))
@@ -261,7 +261,27 @@ static int panfrost_gpu_init_features(struct panfrost_device *pfdev)
 	pfdev->features.max_threads = gpu_read(pfdev, GPU_THREAD_MAX_THREADS);
 	pfdev->features.thread_max_workgroup_sz = gpu_read(pfdev, GPU_THREAD_MAX_WORKGROUP_SIZE);
 	pfdev->features.thread_max_barrier_sz = gpu_read(pfdev, GPU_THREAD_MAX_BARRIER_SIZE);
-	pfdev->features.coherency_features = gpu_read(pfdev, GPU_COHERENCY_FEATURES);
+
+	if (panfrost_has_hw_feature(pfdev, HW_FEATURE_COHERENCY_REG))
+		pfdev->features.coherency_features = gpu_read(pfdev, GPU_COHERENCY_FEATURES);
+	else
+		pfdev->features.coherency_features = BIT(COHERENCY_ACE_LITE);
+
+	BUILD_BUG_ON(COHERENCY_ACE_LITE != DRM_PANFROST_GPU_COHERENCY_ACE_LITE);
+	BUILD_BUG_ON(COHERENCY_ACE != DRM_PANFROST_GPU_COHERENCY_ACE);
+	BUILD_BUG_ON(COHERENCY_NONE != DRM_PANFROST_GPU_COHERENCY_NONE);
+
+	if (!pfdev->coherent) {
+		pfdev->features.selected_coherency = COHERENCY_NONE;
+	} else if (pfdev->features.coherency_features & BIT(COHERENCY_ACE)) {
+		pfdev->features.selected_coherency = COHERENCY_ACE;
+	} else if (pfdev->features.coherency_features & BIT(COHERENCY_ACE_LITE)) {
+		pfdev->features.selected_coherency = COHERENCY_ACE_LITE;
+	} else {
+		drm_WARN(&pfdev->base, true, "No known coherency protocol supported");
+		pfdev->features.selected_coherency = COHERENCY_NONE;
+	}
+
 	pfdev->features.afbc_features = gpu_read(pfdev, GPU_AFBC_FEATURES);
 	for (i = 0; i < 4; i++)
 		pfdev->features.texture_features[i] = gpu_read(pfdev, GPU_TEXTURE_FEATURES(i));
diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 2b8f1617b836..ee15f6bf6e6f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -102,9 +102,15 @@
 #define GPU_L2_PRESENT_LO		0x120	/* (RO) Level 2 cache present bitmap, low word */
 #define GPU_L2_PRESENT_HI		0x124	/* (RO) Level 2 cache present bitmap, high word */
 
+/* GPU_COHERENCY_FEATURES is a bitmask of BIT(COHERENCY_xxx) values encoding the
+ * set of supported coherency protocols. GPU_COHERENCY_ENABLE is passed a
+ * COHERENCY_xxx value.
+ */
 #define GPU_COHERENCY_FEATURES		0x300	/* (RO) Coherency features present */
-#define   COHERENCY_ACE_LITE		BIT(0)
-#define   COHERENCY_ACE			BIT(1)
+#define GPU_COHERENCY_ENABLE		0x304	/* (RW) Coherency protocol selection */
+#define   COHERENCY_ACE_LITE		0
+#define   COHERENCY_ACE			1
+#define   COHERENCY_NONE		31
 
 #define GPU_STACK_PRESENT_LO		0xE00   /* (RO) Core stack present bitmap, low word */
 #define GPU_STACK_PRESENT_HI		0xE04   /* (RO) Core stack present bitmap, high word */
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index e8b47c9f6976..9bd8fa401400 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -188,6 +188,13 @@ enum drm_panfrost_param {
 	DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP,
 	DRM_PANFROST_PARAM_SYSTEM_TIMESTAMP_FREQUENCY,
 	DRM_PANFROST_PARAM_ALLOWED_JM_CTX_PRIORITIES,
+	DRM_PANFROST_PARAM_SELECTED_COHERENCY,
+};
+
+enum drm_panfrost_gpu_coherency {
+	DRM_PANFROST_GPU_COHERENCY_ACE_LITE = 0,
+	DRM_PANFROST_GPU_COHERENCY_ACE = 1,
+	DRM_PANFROST_GPU_COHERENCY_NONE = 31,
 };
 
 struct drm_panfrost_get_param {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 13/16] drm/panfrost: Add a PANFROST_SYNC_BO ioctl
  2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
                   ` (11 preceding siblings ...)
  2025-10-30 14:05 ` [PATCH v5 12/16] drm/panfrost: Expose the selected coherency protocol to the UMD Boris Brezillon
@ 2025-10-30 14:05 ` Boris Brezillon
  2025-10-31  7:08   ` Marcin Ślusarz
  2025-10-30 14:05 ` [PATCH v5 14/16] drm/panfrost: Add an ioctl to query BO flags Boris Brezillon
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, kernel, Boris Brezillon

From: Faith Ekstrand <faith.ekstrand@collabora.com>

This will be used by the UMD to synchronize CPU-cached mappings when
the UMD can't do it directly (no usermode cache maintenance instruction
on Arm32).

v2:
- Add more to the commit message
- Change the flags to better match the drm_gem_shmem_sync semantics

v3:
- Add Steve's R-b

v4:
- No changes

v5:
- Drop Steve's R-b (semantics changes requiring a new review)

Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 48 +++++++++++++++++++++++++
 drivers/gpu/drm/panfrost/panfrost_gem.c | 21 +++++++++++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  2 ++
 include/uapi/drm/panfrost_drm.h         | 45 +++++++++++++++++++++++
 4 files changed, 116 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 6c08a4907eea..f6692bd3fdaf 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -579,6 +579,53 @@ static int panfrost_ioctl_jm_ctx_destroy(struct drm_device *dev, void *data,
 	return panfrost_jm_ctx_destroy(file, args->handle);
 }
 
+static int panfrost_ioctl_sync_bo(struct drm_device *ddev, void *data,
+				  struct drm_file *file)
+{
+	struct drm_panfrost_sync_bo *args = data;
+	struct drm_panfrost_bo_sync_op *ops;
+	struct drm_gem_object *obj;
+	int ret;
+	u32 i;
+
+	if (args->pad)
+		return -EINVAL;
+
+	ops = kvmalloc_array(args->op_count, sizeof(*ops), GFP_KERNEL);
+	if (!ops) {
+		DRM_DEBUG("Failed to allocate incoming BO sync ops array\n");
+		return -ENOMEM;
+	}
+
+	if (copy_from_user(ops, (void __user *)(uintptr_t)args->ops,
+			   args->op_count * sizeof(*ops))) {
+		DRM_DEBUG("Failed to copy in BO sync ops\n");
+		ret = -EFAULT;
+		goto err_ops;
+	}
+
+	for (i = 0; i < args->op_count; i++) {
+		obj = drm_gem_object_lookup(file, ops[i].handle);
+		if (!obj) {
+			ret = -ENOENT;
+			goto err_ops;
+		}
+
+		ret = panfrost_gem_sync(obj, ops[i].type,
+					ops[i].offset, ops[i].size);
+
+		drm_gem_object_put(obj);
+
+		if (ret)
+			goto err_ops;
+	}
+
+err_ops:
+	kvfree(ops);
+
+	return ret;
+}
+
 int panfrost_unstable_ioctl_check(void)
 {
 	if (!unstable_ioctls)
@@ -648,6 +695,7 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
 	PANFROST_IOCTL(SET_LABEL_BO,	set_label_bo,	DRM_RENDER_ALLOW),
 	PANFROST_IOCTL(JM_CTX_CREATE,	jm_ctx_create,	DRM_RENDER_ALLOW),
 	PANFROST_IOCTL(JM_CTX_DESTROY,	jm_ctx_destroy,	DRM_RENDER_ALLOW),
+	PANFROST_IOCTL(SYNC_BO,		sync_bo,	DRM_RENDER_ALLOW),
 };
 
 static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 070ea7108af6..05d3f8a6fa78 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -384,6 +384,27 @@ panfrost_gem_set_label(struct drm_gem_object *obj, const char *label)
 	kfree_const(old_label);
 }
 
+int
+panfrost_gem_sync(struct drm_gem_object *obj, u32 type,
+		  u32 offset, u32 size)
+{
+	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+	enum drm_gem_shmem_sync_type shmem_sync_type;
+
+	switch (type) {
+	case PANFROST_BO_SYNC_CPU_CACHE_FLUSH:
+		shmem_sync_type = DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH;
+		break;
+	case PANFROST_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE:
+		shmem_sync_type = DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return drm_gem_shmem_sync(&bo->base, offset, size, shmem_sync_type);
+}
+
 void
 panfrost_gem_internal_set_label(struct drm_gem_object *obj, const char *label)
 {
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
index c63264464271..87b918f30baa 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -150,6 +150,8 @@ int panfrost_gem_shrinker_init(struct drm_device *dev);
 void panfrost_gem_shrinker_cleanup(struct drm_device *dev);
 
 void panfrost_gem_set_label(struct drm_gem_object *obj, const char *label);
+int panfrost_gem_sync(struct drm_gem_object *obj, u32 type,
+		      u32 offset, u32 size);
 void panfrost_gem_internal_set_label(struct drm_gem_object *obj, const char *label);
 
 #ifdef CONFIG_DEBUG_FS
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index 9bd8fa401400..98f93dc87b34 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -24,6 +24,7 @@ extern "C" {
 #define DRM_PANFROST_SET_LABEL_BO		0x09
 #define DRM_PANFROST_JM_CTX_CREATE		0x0a
 #define DRM_PANFROST_JM_CTX_DESTROY		0x0b
+#define DRM_PANFROST_SYNC_BO			0x0c
 
 #define DRM_IOCTL_PANFROST_SUBMIT		DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
 #define DRM_IOCTL_PANFROST_WAIT_BO		DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
@@ -35,6 +36,7 @@ extern "C" {
 #define DRM_IOCTL_PANFROST_SET_LABEL_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SET_LABEL_BO, struct drm_panfrost_set_label_bo)
 #define DRM_IOCTL_PANFROST_JM_CTX_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_CREATE, struct drm_panfrost_jm_ctx_create)
 #define DRM_IOCTL_PANFROST_JM_CTX_DESTROY	DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_DESTROY, struct drm_panfrost_jm_ctx_destroy)
+#define DRM_IOCTL_PANFROST_SYNC_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SYNC_BO, struct drm_panfrost_sync_bo)
 
 /*
  * Unstable ioctl(s): only exposed when the unsafe unstable_ioctls module
@@ -266,6 +268,49 @@ struct drm_panfrost_set_label_bo {
 	__u64 label;
 };
 
+/* Valid flags to pass to drm_panfrost_bo_sync_op */
+#define PANFROST_BO_SYNC_CPU_CACHE_FLUSH			0
+#define PANFROST_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE		1
+
+/**
+ * struct drm_panthor_bo_flush_map_op - BO map sync op
+ */
+struct drm_panfrost_bo_sync_op {
+	/** @handle: Handle of the buffer object to sync. */
+	__u32 handle;
+
+	/** @type: Type of sync operation. */
+	__u32 type;
+
+	/**
+	 * @offset: Offset into the BO at which the sync range starts.
+	 *
+	 * This will be rounded down to the nearest cache line as needed.
+	 */
+	__u32 offset;
+
+	/**
+	 * @size: Size of the range to sync
+	 *
+	 * @size + @offset will be rounded up to the nearest cache line as
+	 * needed.
+	 */
+	__u32 size;
+};
+
+/**
+ * struct drm_panfrost_sync_bo - ioctl argument for syncing BO maps
+ */
+struct drm_panfrost_sync_bo {
+	/** Array of struct drm_panfrost_bo_sync_op */
+	__u64 ops;
+
+	/** Number of BO sync ops */
+	__u32 op_count;
+
+	__u32 pad;
+};
+
 /* Definitions for coredump decoding in user space */
 #define PANFROSTDUMP_MAJOR 1
 #define PANFROSTDUMP_MINOR 0
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 14/16] drm/panfrost: Add an ioctl to query BO flags
  2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
                   ` (12 preceding siblings ...)
  2025-10-30 14:05 ` [PATCH v5 13/16] drm/panfrost: Add a PANFROST_SYNC_BO ioctl Boris Brezillon
@ 2025-10-30 14:05 ` Boris Brezillon
  2025-10-30 14:05 ` [PATCH v5 15/16] drm/panfrost: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
  2025-10-30 14:05 ` [PATCH v5 16/16] drm/panfrost: Bump the driver version to 1.6 Boris Brezillon
  15 siblings, 0 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, Boris Brezillon, kernel

This is useful when importing BOs, so we can know about cacheability
and flush the caches when needed.

v2:
- New commit

v3:
- Add Steve's R-b

v4:
- No changes

v5:
- No changes

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 33 +++++++++++++++++++++++++
 include/uapi/drm/panfrost_drm.h         | 19 ++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index f6692bd3fdaf..ba03a4420264 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -626,6 +626,38 @@ static int panfrost_ioctl_sync_bo(struct drm_device *ddev, void *data,
 	return ret;
 }
 
+static int panfrost_ioctl_query_bo_info(struct drm_device *dev, void *data,
+					struct drm_file *file_priv)
+{
+	struct drm_panfrost_query_bo_info *args = data;
+	struct drm_gem_object *gem_obj;
+	struct panfrost_gem_object *bo;
+
+	gem_obj = drm_gem_object_lookup(file_priv, args->handle);
+	if (!gem_obj) {
+		DRM_DEBUG("Failed to look up GEM BO %d\n", args->handle);
+		return -ENOENT;
+	}
+
+	bo = to_panfrost_bo(gem_obj);
+	args->pad = 0;
+	args->create_flags = 0;
+	args->extra_flags = 0;
+
+	if (drm_gem_is_imported(gem_obj)) {
+		args->extra_flags |= DRM_PANFROST_BO_IS_IMPORTED;
+	} else {
+		if (bo->noexec)
+			args->create_flags |= PANFROST_BO_NOEXEC;
+
+		if (bo->is_heap)
+			args->create_flags |= PANFROST_BO_HEAP;
+	}
+
+	drm_gem_object_put(gem_obj);
+	return 0;
+}
+
 int panfrost_unstable_ioctl_check(void)
 {
 	if (!unstable_ioctls)
@@ -696,6 +728,7 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
 	PANFROST_IOCTL(JM_CTX_CREATE,	jm_ctx_create,	DRM_RENDER_ALLOW),
 	PANFROST_IOCTL(JM_CTX_DESTROY,	jm_ctx_destroy,	DRM_RENDER_ALLOW),
 	PANFROST_IOCTL(SYNC_BO,		sync_bo,	DRM_RENDER_ALLOW),
+	PANFROST_IOCTL(QUERY_BO_INFO,	query_bo_info,	DRM_RENDER_ALLOW),
 };
 
 static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index 98f93dc87b34..743c79a38f1b 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -25,6 +25,7 @@ extern "C" {
 #define DRM_PANFROST_JM_CTX_CREATE		0x0a
 #define DRM_PANFROST_JM_CTX_DESTROY		0x0b
 #define DRM_PANFROST_SYNC_BO			0x0c
+#define DRM_PANFROST_QUERY_BO_INFO		0x0d
 
 #define DRM_IOCTL_PANFROST_SUBMIT		DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
 #define DRM_IOCTL_PANFROST_WAIT_BO		DRM_IOW(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
@@ -37,6 +38,7 @@ extern "C" {
 #define DRM_IOCTL_PANFROST_JM_CTX_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_CREATE, struct drm_panfrost_jm_ctx_create)
 #define DRM_IOCTL_PANFROST_JM_CTX_DESTROY	DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_JM_CTX_DESTROY, struct drm_panfrost_jm_ctx_destroy)
 #define DRM_IOCTL_PANFROST_SYNC_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SYNC_BO, struct drm_panfrost_sync_bo)
+#define DRM_IOCTL_PANFROST_QUERY_BO_INFO	DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_QUERY_BO_INFO, struct drm_panfrost_query_bo_info)
 
 /*
  * Unstable ioctl(s): only exposed when the unsafe unstable_ioctls module
@@ -311,6 +313,23 @@ struct drm_panfrost_sync_bo {
 	__u32 pad;
 };
 
+/** BO comes from a different subsystem. */
+#define DRM_PANFROST_BO_IS_IMPORTED (1 << 0)
+
+struct drm_panfrost_query_bo_info {
+	/** Handle of the object being queried. */
+	__u32 handle;
+
+	/** Extra flags that are not coming from the BO_CREATE ioctl(). */
+	__u32 extra_flags;
+
+	/** Flags passed at creation time. */
+	__u32 create_flags;
+
+	/** Will be zero on return. */
+	__u32 pad;
+};
+
 /* Definitions for coredump decoding in user space */
 #define PANFROSTDUMP_MAJOR 1
 #define PANFROSTDUMP_MINOR 0
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 15/16] drm/panfrost: Add flag to map GEM object Write-Back Cacheable
  2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
                   ` (13 preceding siblings ...)
  2025-10-30 14:05 ` [PATCH v5 14/16] drm/panfrost: Add an ioctl to query BO flags Boris Brezillon
@ 2025-10-30 14:05 ` Boris Brezillon
  2025-10-30 14:05 ` [PATCH v5 16/16] drm/panfrost: Bump the driver version to 1.6 Boris Brezillon
  15 siblings, 0 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, kernel, Boris Brezillon

From: Faith Ekstrand <faith.ekstrand@collabora.com>

Will be used by the UMD to optimize CPU accesses to buffers
that are frequently read by the CPU, or on which the access
pattern makes non-cacheable mappings inefficient.

Mapping buffers CPU-cached implies taking care of the CPU
cache maintenance in the UMD, unless the GPU is IO coherent.

v2:
- Add more to the commit message

v3:
- No changes

v4:
- Fix the map_wc test in panfrost_ioctl_query_bo_info()

v5:
- Drop Steve's R-b (enough has changed to justify a new review)

Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 10 ++++++--
 drivers/gpu/drm/panfrost/panfrost_gem.c | 33 +++++++++++++++++++++++++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  5 ++++
 include/uapi/drm/panfrost_drm.h         |  5 +++-
 4 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index ba03a4420264..74b7dc75d88b 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -125,6 +125,10 @@ static int panfrost_ioctl_get_param(struct drm_device *ddev, void *data, struct
 	return 0;
 }
 
+#define PANFROST_BO_FLAGS	(PANFROST_BO_NOEXEC | \
+				 PANFROST_BO_HEAP | \
+				 PANFROST_BO_WB_MMAP)
+
 static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data,
 		struct drm_file *file)
 {
@@ -134,8 +138,7 @@ static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data,
 	struct panfrost_gem_mapping *mapping;
 	int ret;
 
-	if (!args->size || args->pad ||
-	    (args->flags & ~(PANFROST_BO_NOEXEC | PANFROST_BO_HEAP)))
+	if (!args->size || args->pad || (args->flags & ~PANFROST_BO_FLAGS))
 		return -EINVAL;
 
 	/* Heaps should never be executable */
@@ -652,6 +655,9 @@ static int panfrost_ioctl_query_bo_info(struct drm_device *dev, void *data,
 
 		if (bo->is_heap)
 			args->create_flags |= PANFROST_BO_HEAP;
+
+		if (!bo->base.map_wc)
+			args->create_flags |= PANFROST_BO_WB_MMAP;
 	}
 
 	drm_gem_object_put(gem_obj);
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 05d3f8a6fa78..1c600939c17a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -269,6 +269,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = {
 	.vmap = drm_gem_shmem_object_vmap,
 	.vunmap = drm_gem_shmem_object_vunmap,
 	.mmap = drm_gem_shmem_object_mmap,
+	.export = drm_gem_prime_export,
 	.status = panfrost_gem_status,
 	.rss = panfrost_gem_rss,
 	.vm_ops = &drm_gem_shmem_vm_ops,
@@ -302,12 +303,42 @@ struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t
 	return &obj->base.base;
 }
 
+static bool
+should_map_wc(struct panfrost_gem_object *bo)
+{
+	struct panfrost_device *pfdev = to_panfrost_device(bo->base.base.dev);
+
+	/* We can't do uncached mappings if the device is coherent,
+	 * because the zeroing done by the shmem layer at page allocation
+	 * time happens on a cached mapping which isn't CPU-flushed (at least
+	 * not on Arm64 where the flush is deferred to PTE setup time, and
+	 * only done conditionally based on the mapping permissions). We can't
+	 * rely on dma_map_sgtable()/dma_sync_sgtable_for_xxx() either to flush
+	 * those, because they are NOPed if dma_dev_coherent() returns true.
+	 */
+	if (pfdev->coherent)
+		return false;
+
+	/* Cached mappings are explicitly requested, so no write-combine. */
+	if (bo->wb_mmap)
+		return false;
+
+	/* The default is write-combine. */
+	return true;
+}
+
 struct panfrost_gem_object *
 panfrost_gem_create(struct drm_device *dev, size_t size, u32 flags)
 {
 	struct drm_gem_shmem_object *shmem;
 	struct panfrost_gem_object *bo;
 
+	/* The heap buffer is not supposed to be CPU-visible, so don't allow
+	 * WB_MMAP on those.
+	 */
+	if ((flags & PANFROST_BO_HEAP) && (flags & PANFROST_BO_WB_MMAP))
+		return ERR_PTR(-EINVAL);
+
 	/* Round up heap allocations to 2MB to keep fault handling simple */
 	if (flags & PANFROST_BO_HEAP)
 		size = roundup(size, SZ_2M);
@@ -319,6 +350,8 @@ panfrost_gem_create(struct drm_device *dev, size_t size, u32 flags)
 	bo = to_panfrost_bo(&shmem->base);
 	bo->noexec = !!(flags & PANFROST_BO_NOEXEC);
 	bo->is_heap = !!(flags & PANFROST_BO_HEAP);
+	bo->wb_mmap = !!(flags & PANFROST_BO_WB_MMAP);
+	bo->base.map_wc = should_map_wc(bo);
 
 	return bo;
 }
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
index 87b918f30baa..d2d532b3007a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -98,6 +98,11 @@ struct panfrost_gem_object {
 	bool noexec		:1;
 	bool is_heap		:1;
 
+	/* On coherent devices, this reflects the creation flags, not the true
+	 * cacheability attribute of the mapping.
+	 */
+	bool wb_mmap		:1;
+
 #ifdef CONFIG_DEBUG_FS
 	struct panfrost_gem_debugfs debugfs;
 #endif
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index 743c79a38f1b..82f4e69bafb4 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -101,9 +101,12 @@ struct drm_panfrost_wait_bo {
 	__s64 timeout_ns;	/* absolute */
 };
 
-/* Valid flags to pass to drm_panfrost_create_bo */
+/* Valid flags to pass to drm_panfrost_create_bo.
+ * PANFROST_BO_WB_MMAP can't be set if PANFROST_BO_HEAP is.
+ */
 #define PANFROST_BO_NOEXEC	1
 #define PANFROST_BO_HEAP	2
+#define PANFROST_BO_WB_MMAP	4
 
 /**
  * struct drm_panfrost_create_bo - ioctl argument for creating Panfrost BOs.
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 16/16] drm/panfrost: Bump the driver version to 1.6
  2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
                   ` (14 preceding siblings ...)
  2025-10-30 14:05 ` [PATCH v5 15/16] drm/panfrost: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
@ 2025-10-30 14:05 ` Boris Brezillon
  15 siblings, 0 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, kernel, Boris Brezillon

From: Faith Ekstrand <faith.ekstrand@collabora.com>

Bump the driver version to reflect the new cached-CPU mapping
capability.

v2:
- Quickly describe what the new version exposes in the commit message

v3:
- Add Steve's R-b

v4:
- No changes

v5:
- No changes

Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 74b7dc75d88b..b0f90b85cb39 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -924,6 +924,9 @@ static void panfrost_debugfs_init(struct drm_minor *minor)
  * - 1.4 - adds SET_LABEL_BO
  * - 1.5 - adds JM_CTX_{CREATE,DESTROY} ioctls and extend SUBMIT to allow
  *	   context creation with configurable priorities/affinity
+ * - 1.6 - adds PANFROST_BO_MAP_WB, PANFROST_IOCTL_SYNC_BO,
+ *	   PANFROST_IOCTL_QUERY_BO_INFO and
+ *	   DRM_PANFROST_PARAM_SELECTED_COHERENCY
  */
 static const struct drm_driver panfrost_drm_driver = {
 	.driver_features	= DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ,
@@ -936,7 +939,7 @@ static const struct drm_driver panfrost_drm_driver = {
 	.name			= "panfrost",
 	.desc			= "panfrost DRM",
 	.major			= 1,
-	.minor			= 5,
+	.minor			= 6,
 
 	.gem_create_object	= panfrost_gem_create_object,
 	.gem_prime_import_sg_table = panfrost_gem_prime_import_sg_table,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops
  2025-10-30 14:05 ` [PATCH v5 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops Boris Brezillon
@ 2025-10-30 14:25   ` Christian König
  2025-10-30 14:35     ` Boris Brezillon
  0 siblings, 1 reply; 30+ messages in thread
From: Christian König @ 2025-10-30 14:25 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, amd-gfx, kernel

On 10/30/25 15:05, Boris Brezillon wrote:
> drm_gem_is_prime_exported_dma_buf() checks the dma_buf->ops against
> drm_gem_prime_dmabuf_ops, which makes it impossible to use if the
> driver implements custom dma_buf_ops. Instead of duplicating a bunch
> of helpers to work around it, let's provide a way for drivers to
> expose their custom dma_buf_ops so the core prime helpers can rely on
> that instead of hardcoding &drm_gem_prime_dmabuf_ops.

That's generally nice to have, I've re-implemented quite a number of functions in amdgpu because of this as well.

> 
> v5:
> - New patch
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>  drivers/gpu/drm/drm_prime.c | 14 +++++++++++---
>  include/drm/drm_drv.h       |  8 ++++++++
>  2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index 43a10b4af43a..3796844af418 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -903,6 +903,15 @@ unsigned long drm_prime_get_contiguous_size(struct sg_table *sgt)
>  }
>  EXPORT_SYMBOL(drm_prime_get_contiguous_size);
>  
> +static const struct dma_buf_ops *
> +drm_gem_prime_get_dma_buf_ops(struct drm_device *dev)
> +{
> +	if (dev->driver->gem_prime_get_dma_buf_ops)
> +		return dev->driver->gem_prime_get_dma_buf_ops(dev);

I have strong doubts that a driver changes their dma_buf ops during their runtime, so instead of a callback could we just have it as pointer in drm_driver?

Regards,
Christian.

> +
> +	return &drm_gem_prime_dmabuf_ops;
> +}
> +
>  /**
>   * drm_gem_prime_export - helper library implementation of the export callback
>   * @obj: GEM object to export
> @@ -919,7 +928,7 @@ struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
>  	struct dma_buf_export_info exp_info = {
>  		.exp_name = KBUILD_MODNAME, /* white lie for debug */
>  		.owner = dev->driver->fops->owner,
> -		.ops = &drm_gem_prime_dmabuf_ops,
> +		.ops = drm_gem_prime_get_dma_buf_ops(dev),
>  		.size = obj->size,
>  		.flags = flags,
>  		.priv = obj,
> @@ -930,7 +939,6 @@ struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
>  }
>  EXPORT_SYMBOL(drm_gem_prime_export);
>  
> -
>  /**
>   * drm_gem_is_prime_exported_dma_buf -
>   * checks if the DMA-BUF was exported from a GEM object belonging to @dev.
> @@ -946,7 +954,7 @@ bool drm_gem_is_prime_exported_dma_buf(struct drm_device *dev,
>  {
>  	struct drm_gem_object *obj = dma_buf->priv;
>  
> -	return (dma_buf->ops == &drm_gem_prime_dmabuf_ops) && (obj->dev == dev);
> +	return (dma_buf->ops == drm_gem_prime_get_dma_buf_ops(dev)) && (obj->dev == dev);
>  }
>  EXPORT_SYMBOL(drm_gem_is_prime_exported_dma_buf);
>  
> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> index 42fc085f986d..f18da3c0edb8 100644
> --- a/include/drm/drm_drv.h
> +++ b/include/drm/drm_drv.h
> @@ -326,6 +326,14 @@ struct drm_driver {
>  				struct dma_buf_attachment *attach,
>  				struct sg_table *sgt);
>  
> +	/**
> +	 * @gem_prime_get_dma_buf_ops:
> +	 *
> +	 * Optional hook used by the PRIME helpers to get the custom dma_buf_ops
> +	 * used by this driver.
> +	 */
> +	const struct dma_buf_ops *(*gem_prime_get_dma_buf_ops)(struct drm_device *dev);
> +
>  	/**
>  	 * @dumb_create:
>  	 *


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/16] drm/shmem: Provide a generic {begin,end}_cpu_access() implementation
  2025-10-30 14:05 ` [PATCH v5 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation Boris Brezillon
@ 2025-10-30 14:31   ` Christian König
  2025-11-04  8:08     ` Boris Brezillon
  2025-11-03 20:34   ` [PATCH v5 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation Akash Goel
  1 sibling, 1 reply; 30+ messages in thread
From: Christian König @ 2025-10-30 14:31 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, amd-gfx, kernel

On 10/30/25 15:05, Boris Brezillon wrote:
> The default implementation simply takes care of invalidating/flushing
> caches around CPU accesses. It takes care of both the exporter and
> the importers, which forces us to overload the default
> ::[un]map_dma_buf() implementation provided by drm_gem.c to store the
> sgt.
> 
> v5:
> - New patch
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>  drivers/gpu/drm/drm_gem_shmem_helper.c | 114 +++++++++++++++++++++++++
>  include/drm/drm_gem_shmem_helper.h     |  10 +++
>  2 files changed, 124 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
> index dc94a27710e5..e49c75739c20 100644
> --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> @@ -893,6 +893,120 @@ struct drm_gem_object *drm_gem_shmem_prime_import_no_map(struct drm_device *dev,
>  }
>  EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_import_no_map);
>  
> +/**
> + * drm_gem_shmem_prime_unmap_dma_buf - Default unmap_dma_buf() for exported buffers
> + * @attach: attachment
> + * @sgt: SG table to unmap
> + * @dir: type of access done by this attachment
> + *
> + * Default implementation for dma_buf_ops::map_dma_buf(). This is just a wrapper
> + * around drm_gem_map_dma_buf() that lets us set the dma_buf_attachment::priv
> + * to the sgt so that drm_gem_shmem_prime_{begin,end}_cpu_access() can sync
> + * around CPU accesses.
> + */
> +struct sg_table *
> +drm_gem_shmem_prime_map_dma_buf(struct dma_buf_attachment *attach,
> +				enum dma_data_direction dir)
> +{
> +	struct sg_table *sgt = drm_gem_map_dma_buf(attach, dir);
> +
> +	if (!IS_ERR(sgt))
> +		attach->priv = sgt;
> +
> +	return sgt;
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_map_dma_buf);
> +
> +/**
> + * drm_gem_shmem_prime_unmap_dma_buf - Default unmap_dma_buf() for exported buffers
> + * @attach: attachment
> + * @sgt: SG table to unmap
> + * @dir: type of access done by this attachment
> + *
> + * Default implementation for dma_buf_ops::unmap_dma_buf(). This is just a wrapper
> + * around drm_gem_unmap_dma_buf() that lets us reset the dma_buf_attachment::priv
> + * field so that drm_gem_shmem_prime_{begin,end}_cpu_access() don't consider it
> + * as a mapped attachment to sync against.
> + */
> +void drm_gem_shmem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
> +				       struct sg_table *sgt,
> +				       enum dma_data_direction dir)
> +{
> +	attach->priv = NULL;
> +	drm_gem_unmap_dma_buf(attach, sgt, dir);
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_unmap_dma_buf);
> +
> +/**
> + * drm_gem_shmem_prime_begin_cpu_access - Default end_cpu_access() for exported buffers
> + * @dma_buf: The exported DMA buffer this acts on
> + * @dir: direction of the access
> + *
> + * Default implementation for dma_buf_ops::begin_cpu_access(). This only takes care of
> + * cache maintenance.
> + */
> +int drm_gem_shmem_prime_begin_cpu_access(struct dma_buf *dma_buf,
> +					 enum dma_data_direction dir)
> +{
> +	struct drm_gem_object *obj = dma_buf->priv;
> +	struct drm_device *dev = obj->dev;
> +	struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
> +	struct dma_buf_attachment *attach;
> +
> +	dma_resv_lock(obj->resv, NULL);
> +	if (shmem->sgt)
> +		dma_sync_sgtable_for_cpu(dev->dev, shmem->sgt, dir);
> +
> +	if (shmem->vaddr)
> +		invalidate_kernel_vmap_range(shmem->vaddr, shmem->base.size);
> +


> +	list_for_each_entry(attach, &dma_buf->attachments, node) {
> +		struct sg_table *sgt = attach->priv;
> +
> +		if (sgt)
> +			dma_sync_sgtable_for_cpu(attach->dev, sgt, dir);
> +	}

This conflicts with the debugging hack in DMA-buf.

I've recently send out a patch to fix that, but it hasn't been pushed yet.

Apart from that looks absolutely reasonable to me.

Regards,
Christian.

> +	dma_resv_unlock(obj->resv);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_begin_cpu_access);
> +
> +/**
> + * drm_gem_shmem_prime_end_cpu_access - Default end_cpu_access() for exported buffers
> + * @dma_buf: The exported DMA buffer this acts on
> + * @dir: direction of the access
> + *
> + * Default implementation for dma_buf_ops::end_cpu_access(). This only takes care of
> + * cache maintenance.
> + */
> +int drm_gem_shmem_prime_end_cpu_access(struct dma_buf *dma_buf,
> +				       enum dma_data_direction dir)
> +{
> +	struct drm_gem_object *obj = dma_buf->priv;
> +	struct drm_device *dev = obj->dev;
> +	struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
> +	struct dma_buf_attachment *attach;
> +
> +	dma_resv_lock(obj->resv, NULL);
> +	list_for_each_entry(attach, &dma_buf->attachments, node) {
> +		struct sg_table *sgt = attach->priv;
> +
> +		if (sgt)
> +			dma_sync_sgtable_for_device(attach->dev, sgt, dir);
> +	}
> +
> +	if (shmem->vaddr)
> +		flush_kernel_vmap_range(shmem->vaddr, shmem->base.size);
> +
> +	if (shmem->sgt)
> +		dma_sync_sgtable_for_device(dev->dev, shmem->sgt, dir);
> +
> +	dma_resv_unlock(obj->resv);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_end_cpu_access);
> +
>  MODULE_DESCRIPTION("DRM SHMEM memory-management helpers");
>  MODULE_IMPORT_NS("DMA_BUF");
>  MODULE_LICENSE("GPL v2");
> diff --git a/include/drm/drm_gem_shmem_helper.h b/include/drm/drm_gem_shmem_helper.h
> index 589f7bfe7506..075275d6b2fd 100644
> --- a/include/drm/drm_gem_shmem_helper.h
> +++ b/include/drm/drm_gem_shmem_helper.h
> @@ -291,6 +291,16 @@ int drm_gem_shmem_dumb_create(struct drm_file *file, struct drm_device *dev,
>  			      struct drm_mode_create_dumb *args);
>  struct drm_gem_object *drm_gem_shmem_prime_import_no_map(struct drm_device *dev,
>  							 struct dma_buf *buf);
> +struct sg_table *
> +drm_gem_shmem_prime_map_dma_buf(struct dma_buf_attachment *attach,
> +				enum dma_data_direction dir);
> +void drm_gem_shmem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
> +				       struct sg_table *sgt,
> +				       enum dma_data_direction dir);
> +int drm_gem_shmem_prime_begin_cpu_access(struct dma_buf *dma_buf,
> +					 enum dma_data_direction dir);
> +int drm_gem_shmem_prime_end_cpu_access(struct dma_buf *dma_buf,
> +				       enum dma_data_direction dir);
>  
>  /**
>   * DRM_GEM_SHMEM_DRIVER_OPS - Default shmem GEM operations


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops
  2025-10-30 14:25   ` Christian König
@ 2025-10-30 14:35     ` Boris Brezillon
  0 siblings, 0 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:35 UTC (permalink / raw)
  To: Christian König
  Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
	Thierry Reding, Mikko Perttunen, Melissa Wen, Maíra Canal,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, Frank Binns,
	Matt Coster, Rob Clark, Dmitry Baryshkov, Abhinav Kumar,
	Jessica Zhang, Sean Paul, Marijn Suijten, Alex Deucher, amd-gfx,
	kernel

On Thu, 30 Oct 2025 15:25:50 +0100
Christian König <christian.koenig@amd.com> wrote:

> On 10/30/25 15:05, Boris Brezillon wrote:
> > drm_gem_is_prime_exported_dma_buf() checks the dma_buf->ops against
> > drm_gem_prime_dmabuf_ops, which makes it impossible to use if the
> > driver implements custom dma_buf_ops. Instead of duplicating a bunch
> > of helpers to work around it, let's provide a way for drivers to
> > expose their custom dma_buf_ops so the core prime helpers can rely on
> > that instead of hardcoding &drm_gem_prime_dmabuf_ops.  
> 
> That's generally nice to have, I've re-implemented quite a number of functions in amdgpu because of this as well.
> 
> > 
> > v5:
> > - New patch
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> >  drivers/gpu/drm/drm_prime.c | 14 +++++++++++---
> >  include/drm/drm_drv.h       |  8 ++++++++
> >  2 files changed, 19 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> > index 43a10b4af43a..3796844af418 100644
> > --- a/drivers/gpu/drm/drm_prime.c
> > +++ b/drivers/gpu/drm/drm_prime.c
> > @@ -903,6 +903,15 @@ unsigned long drm_prime_get_contiguous_size(struct sg_table *sgt)
> >  }
> >  EXPORT_SYMBOL(drm_prime_get_contiguous_size);
> >  
> > +static const struct dma_buf_ops *
> > +drm_gem_prime_get_dma_buf_ops(struct drm_device *dev)
> > +{
> > +	if (dev->driver->gem_prime_get_dma_buf_ops)
> > +		return dev->driver->gem_prime_get_dma_buf_ops(dev);  
> 
> I have strong doubts that a driver changes their dma_buf ops during their runtime, so instead of a callback could we just have it as pointer in drm_driver?

Sure thing, I considered doing that too actually.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 13/16] drm/panfrost: Add a PANFROST_SYNC_BO ioctl
  2025-10-30 14:05 ` [PATCH v5 13/16] drm/panfrost: Add a PANFROST_SYNC_BO ioctl Boris Brezillon
@ 2025-10-31  7:08   ` Marcin Ślusarz
  2025-10-31  8:49     ` Boris Brezillon
  0 siblings, 1 reply; 30+ messages in thread
From: Marcin Ślusarz @ 2025-10-31  7:08 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
	Thierry Reding, Mikko Perttunen, Melissa Wen, Maíra Canal,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, Frank Binns,
	Matt Coster, Rob Clark, Dmitry Baryshkov, Abhinav Kumar,
	Jessica Zhang, Sean Paul, Marijn Suijten, Alex Deucher,
	Christian König, amd-gfx, kernel, nd

On Thu, Oct 30, 2025 at 03:05:22PM +0100, Boris Brezillon wrote:
> +static int panfrost_ioctl_sync_bo(struct drm_device *ddev, void *data,
> +				  struct drm_file *file)
> +{
> +	struct drm_panfrost_sync_bo *args = data;
> +	struct drm_panfrost_bo_sync_op *ops;
> +	struct drm_gem_object *obj;
> +	int ret;
> +	u32 i;
> +
> +	if (args->pad)
> +		return -EINVAL;
> +
> +	ops = kvmalloc_array(args->op_count, sizeof(*ops), GFP_KERNEL);
> +	if (!ops) {
> +		DRM_DEBUG("Failed to allocate incoming BO sync ops array\n");
> +		return -ENOMEM;
> +	}
> +
> +	if (copy_from_user(ops, (void __user *)(uintptr_t)args->ops,
> +			   args->op_count * sizeof(*ops))) {
> +		DRM_DEBUG("Failed to copy in BO sync ops\n");
> +		ret = -EFAULT;
> +		goto err_ops;
> +	}
> +
> +	for (i = 0; i < args->op_count; i++) {
> +		obj = drm_gem_object_lookup(file, ops[i].handle);
> +		if (!obj) {
> +			ret = -ENOENT;
> +			goto err_ops;
> +		}
> +
> +		ret = panfrost_gem_sync(obj, ops[i].type,
> +					ops[i].offset, ops[i].size);
> +
> +		drm_gem_object_put(obj);
> +
> +		if (ret)
> +			goto err_ops;
> +	}
> +
> +err_ops:
> +	kvfree(ops);
> +
> +	return ret;

This function will still return garbage if args->op_count is 0.

> +}
> +

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 07/16] drm/panthor: Add a PANTHOR_BO_SYNC ioctl
  2025-10-30 14:05 ` [PATCH v5 07/16] drm/panthor: Add a PANTHOR_BO_SYNC ioctl Boris Brezillon
@ 2025-10-31  7:25   ` Marcin Ślusarz
  2025-11-03 20:42   ` Akash Goel
  1 sibling, 0 replies; 30+ messages in thread
From: Marcin Ślusarz @ 2025-10-31  7:25 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
	Thierry Reding, Mikko Perttunen, Melissa Wen, Maíra Canal,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, Frank Binns,
	Matt Coster, Rob Clark, Dmitry Baryshkov, Abhinav Kumar,
	Jessica Zhang, Sean Paul, Marijn Suijten, Alex Deucher,
	Christian König, amd-gfx, kernel, nd

On Thu, Oct 30, 2025 at 03:05:16PM +0100, Boris Brezillon wrote:
> +static int panthor_ioctl_bo_sync(struct drm_device *ddev, void *data,
> +				 struct drm_file *file)
> +{
> +	struct drm_panthor_bo_sync *args = data;
> +	struct drm_panthor_bo_sync_op *ops;
> +	struct drm_gem_object *obj;
> +	int ret = 0;

Not needed initialization.

> +
> +	ret = PANTHOR_UOBJ_GET_ARRAY(ops, &args->ops);
> +	if (ret)
> +		return ret;
> +
> +	for (u32 i = 0; i < args->ops.count; i++) {
> +		obj = drm_gem_object_lookup(file, ops[i].handle);
> +		if (!obj) {
> +			ret = -ENOENT;
> +			goto err_ops;
> +		}
> +
> +		ret = panthor_gem_sync(obj, ops[i].type, ops[i].offset,
> +				       ops[i].size);
> +
> +		drm_gem_object_put(obj);
> +
> +		if (ret)
> +			goto err_ops;
> +	}
> +
> +err_ops:
> +	kvfree(ops);
> +
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 13/16] drm/panfrost: Add a PANFROST_SYNC_BO ioctl
  2025-10-31  7:08   ` Marcin Ślusarz
@ 2025-10-31  8:49     ` Boris Brezillon
  0 siblings, 0 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-10-31  8:49 UTC (permalink / raw)
  To: Marcin Ślusarz
  Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
	Thierry Reding, Mikko Perttunen, Melissa Wen, Maíra Canal,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, Frank Binns,
	Matt Coster, Rob Clark, Dmitry Baryshkov, Abhinav Kumar,
	Jessica Zhang, Sean Paul, Marijn Suijten, Alex Deucher,
	Christian König, amd-gfx, kernel, nd

On Fri, 31 Oct 2025 08:08:20 +0100
Marcin Ślusarz <marcin.slusarz@arm.com> wrote:

> On Thu, Oct 30, 2025 at 03:05:22PM +0100, Boris Brezillon wrote:
> > +static int panfrost_ioctl_sync_bo(struct drm_device *ddev, void *data,
> > +				  struct drm_file *file)
> > +{
> > +	struct drm_panfrost_sync_bo *args = data;
> > +	struct drm_panfrost_bo_sync_op *ops;
> > +	struct drm_gem_object *obj;
> > +	int ret;
> > +	u32 i;
> > +
> > +	if (args->pad)
> > +		return -EINVAL;
> > +
> > +	ops = kvmalloc_array(args->op_count, sizeof(*ops), GFP_KERNEL);
> > +	if (!ops) {
> > +		DRM_DEBUG("Failed to allocate incoming BO sync ops array\n");
> > +		return -ENOMEM;
> > +	}
> > +
> > +	if (copy_from_user(ops, (void __user *)(uintptr_t)args->ops,
> > +			   args->op_count * sizeof(*ops))) {
> > +		DRM_DEBUG("Failed to copy in BO sync ops\n");
> > +		ret = -EFAULT;
> > +		goto err_ops;
> > +	}
> > +
> > +	for (i = 0; i < args->op_count; i++) {
> > +		obj = drm_gem_object_lookup(file, ops[i].handle);
> > +		if (!obj) {
> > +			ret = -ENOENT;
> > +			goto err_ops;
> > +		}
> > +
> > +		ret = panfrost_gem_sync(obj, ops[i].type,
> > +					ops[i].offset, ops[i].size);
> > +
> > +		drm_gem_object_put(obj);
> > +
> > +		if (ret)
> > +			goto err_ops;
> > +	}
> > +
> > +err_ops:
> > +	kvfree(ops);
> > +
> > +	return ret;  
> 
> This function will still return garbage if args->op_count is 0.

Sorry, this fell through the cracks. Will be fixed in v6.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 09/16] drm/panthor: Add flag to map GEM object Write-Back Cacheable
  2025-10-30 14:05 ` [PATCH v5 09/16] drm/panthor: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
@ 2025-11-03 16:43   ` Akash Goel
  2025-11-03 17:13     ` Boris Brezillon
  0 siblings, 1 reply; 30+ messages in thread
From: Akash Goel @ 2025-11-03 16:43 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, Loïc Molinari, kernel



On 10/30/25 14:05, Boris Brezillon wrote:
> From: Loïc Molinari <loic.molinari@collabora.com>
>
> Will be used by the UMD to optimize CPU accesses to buffers
> that are frequently read by the CPU, or on which the access
> pattern makes non-cacheable mappings inefficient.
>
> Mapping buffers CPU-cached implies taking care of the CPU
> cache maintenance in the UMD, unless the GPU is IO coherent.
>
> v2:
> - Add more to the commit message
> - Tweak the doc
> - Make sure we sync the section of the BO pointing to the CS
>    syncobj before we read its seqno
>
> v3:
> - Fix formatting/spelling issues
>
> v4:
> - Add Steve's R-b
>
> v5:
> - Drop Steve's R-b (changes in the ioctl semantics requiring
>    new review)
>
> Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>   drivers/gpu/drm/panthor/panthor_drv.c   |  7 ++++-
>   drivers/gpu/drm/panthor/panthor_gem.c   | 37 +++++++++++++++++++++++--
>   drivers/gpu/drm/panthor/panthor_sched.c | 18 ++++++++++--
>   include/uapi/drm/panthor_drm.h          | 12 ++++++++
>   4 files changed, 69 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index c07fc5dcd4a1..4e915f5ef3fa 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -900,7 +900,8 @@ static int panthor_ioctl_vm_destroy(struct drm_device *ddev, void *data,
>       return panthor_vm_pool_destroy_vm(pfile->vms, args->id);
>   }
>
> -#define PANTHOR_BO_FLAGS             DRM_PANTHOR_BO_NO_MMAP
> +#define PANTHOR_BO_FLAGS             (DRM_PANTHOR_BO_NO_MMAP | \
> +                                      DRM_PANTHOR_BO_WB_MMAP)
>
>   static int panthor_ioctl_bo_create(struct drm_device *ddev, void *data,
>                                  struct drm_file *file)
> @@ -919,6 +920,10 @@ static int panthor_ioctl_bo_create(struct drm_device *ddev, void *data,
>               goto out_dev_exit;
>       }
>
> +     if ((args->flags & DRM_PANTHOR_BO_NO_MMAP) &&
> +         (args->flags & DRM_PANTHOR_BO_WB_MMAP))
> +             return -EINVAL;
> +
>       if (args->exclusive_vm_id) {
>               vm = panthor_vm_pool_get_vm(pfile->vms, args->exclusive_vm_id);
>               if (!vm) {
> diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
> index 1b1e98c61b8c..479a779ee59d 100644
> --- a/drivers/gpu/drm/panthor/panthor_gem.c
> +++ b/drivers/gpu/drm/panthor/panthor_gem.c
> @@ -58,6 +58,39 @@ static void panthor_gem_debugfs_set_usage_flags(struct panthor_gem_object *bo, u
>   static void panthor_gem_debugfs_bo_init(struct panthor_gem_object *bo) {}
>   #endif
>
> +static bool
> +should_map_wc(struct panthor_gem_object *bo, struct panthor_vm *exclusive_vm)
> +{
> +     struct panthor_device *ptdev = container_of(bo->base.base.dev, struct panthor_device, base);
> +
> +     /* We can't do uncached mappings if the device is coherent,
> +      * because the zeroing done by the shmem layer at page allocation
> +      * time happens on a cached mapping which isn't CPU-flushed (at least
> +      * not on Arm64 where the flush is deferred to PTE setup time, and
> +      * only done conditionally based on the mapping permissions). We can't
> +      * rely on dma_map_sgtable()/dma_sync_sgtable_for_xxx() either to flush
> +      * those, because they are NOPed if dma_dev_coherent() returns true.
> +      *
> +      * FIXME: Note that this problem is going to pop up again when we
> +      * decide to support mapping buffers with the NO_MMAP flag as
> +      * non-shareable (AKA buffers accessed only by the GPU), because we
> +      * need the same CPU flush to happen after page allocation, otherwise
> +      * there's a risk of data leak or late corruption caused by a dirty
> +      * cacheline being evicted. At this point we'll need a way to force
> +      * CPU cache maintenance regardless of whether the device is coherent
> +      * or not.
> +      */
> +     if (ptdev->coherent)
> +             return false;
> +
> +     /* Cached mappings are explicitly requested, so no write-combine. */
> +     if (bo->flags & DRM_PANTHOR_BO_WB_MMAP)
> +             return false;
> +
> +     /* The default is write-combine. */
> +     return true;
> +}
> +
>   static void panthor_gem_free_object(struct drm_gem_object *obj)
>   {
>       struct panthor_gem_object *bo = to_panthor_bo(obj);
> @@ -152,6 +185,7 @@ panthor_kernel_bo_create(struct panthor_device *ptdev, struct panthor_vm *vm,
>       bo = to_panthor_bo(&obj->base);
>       kbo->obj = &obj->base;
>       bo->flags = bo_flags;
> +     bo->base.map_wc = should_map_wc(bo, vm);
>
>       if (vm == panthor_fw_vm(ptdev))
>               debug_flags |= PANTHOR_DEBUGFS_GEM_USAGE_FLAG_FW_MAPPED;
> @@ -255,7 +289,6 @@ static const struct drm_gem_object_funcs panthor_gem_funcs = {
>    */
>   struct drm_gem_object *panthor_gem_create_object(struct drm_device *ddev, size_t size)
>   {
> -     struct panthor_device *ptdev = container_of(ddev, struct panthor_device, base);
>       struct panthor_gem_object *obj;
>
>       obj = kzalloc(sizeof(*obj), GFP_KERNEL);
> @@ -263,7 +296,6 @@ struct drm_gem_object *panthor_gem_create_object(struct drm_device *ddev, size_t
>               return ERR_PTR(-ENOMEM);
>
>       obj->base.base.funcs = &panthor_gem_funcs;
> -     obj->base.map_wc = !ptdev->coherent;
>       mutex_init(&obj->label.lock);
>
>       panthor_gem_debugfs_bo_init(obj);
> @@ -298,6 +330,7 @@ panthor_gem_create_with_handle(struct drm_file *file,
>
>       bo = to_panthor_bo(&shmem->base);
>       bo->flags = flags;
> +     bo->base.map_wc = should_map_wc(bo, exclusive_vm);
>
>       if (exclusive_vm) {
>               bo->exclusive_vm_root_gem = panthor_vm_root_gem(exclusive_vm);
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index f5e01cb16cfc..7d5da5717de2 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -868,8 +868,11 @@ panthor_queue_get_syncwait_obj(struct panthor_group *group, struct panthor_queue
>       struct iosys_map map;
>       int ret;
>
> -     if (queue->syncwait.kmap)
> -             return queue->syncwait.kmap + queue->syncwait.offset;
> +     if (queue->syncwait.kmap) {
> +             bo = container_of(queue->syncwait.obj,
> +                               struct panthor_gem_object, base.base);
> +             goto out_sync;
> +     }
>
>       bo = panthor_vm_get_bo_for_va(group->vm,
>                                     queue->syncwait.gpu_va,
> @@ -886,6 +889,17 @@ panthor_queue_get_syncwait_obj(struct panthor_group *group, struct panthor_queue
>       if (drm_WARN_ON(&ptdev->base, !queue->syncwait.kmap))
>               goto err_put_syncwait_obj;
>
> +out_sync:
> +     /* Make sure the CPU caches are invalidated before the seqno is read.
> +      * drm_gem_shmem_sync() is a NOP if map_wc=false, so no need to check

Sorry nitpick.

IIUC, drm_gem_shmem_sync() would be a NOP if 'map_wc' is true.



> +      * it here.
> +      */
> +     drm_gem_shmem_sync(&bo->base, queue->syncwait.offset,
> +                        queue->syncwait.sync64 ?
> +                                sizeof(struct panthor_syncobj_64b) :
> +                                sizeof(struct panthor_syncobj_32b),
> +                        DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE);
> +
>       return queue->syncwait.kmap + queue->syncwait.offset;
>
>   err_put_syncwait_obj:
> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> index 7eec9f922183..57e2f5ffa03c 100644
> --- a/include/uapi/drm/panthor_drm.h
> +++ b/include/uapi/drm/panthor_drm.h
> @@ -681,6 +681,18 @@ struct drm_panthor_vm_get_state {
>   enum drm_panthor_bo_flags {
>       /** @DRM_PANTHOR_BO_NO_MMAP: The buffer object will never be CPU-mapped in userspace. */
>       DRM_PANTHOR_BO_NO_MMAP = (1 << 0),
> +
> +     /**
> +      * @DRM_PANTHOR_BO_WB_MMAP: Force "Write-Back Cacheable" CPU mapping.
> +      *
> +      * CPU map the buffer object in userspace by forcing the "Write-Back
> +      * Cacheable" cacheability attribute. The mapping otherwise uses the
> +      * "Non-Cacheable" attribute if the GPU is not IO coherent.
> +      *
> +      * Can't be set if exclusive_vm_id=0 (only private BOs can be mapped
> +      * cacheable).

Sorry Boris, I may have misinterpreted the code.

As per the comment, DRM_PANTHOR_BO_WB_MMAP flag should be rejected if
'exclusive_vm' is NULL. But I don't see any check for 'exclusive_vm'
pointer inside should_map_wc().

Please can you clarify.

Best regards
Akash


> +      */
> +     DRM_PANTHOR_BO_WB_MMAP = (1 << 1),
>   };
>
>   /**
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 09/16] drm/panthor: Add flag to map GEM object Write-Back Cacheable
  2025-11-03 16:43   ` Akash Goel
@ 2025-11-03 17:13     ` Boris Brezillon
  0 siblings, 0 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-11-03 17:13 UTC (permalink / raw)
  To: Akash Goel
  Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
	Thierry Reding, Mikko Perttunen, Melissa Wen, Maíra Canal,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, Frank Binns,
	Matt Coster, Rob Clark, Dmitry Baryshkov, Abhinav Kumar,
	Jessica Zhang, Sean Paul, Marijn Suijten, Alex Deucher,
	Christian König, amd-gfx, Loïc Molinari, kernel

On Mon, 3 Nov 2025 16:43:12 +0000
Akash Goel <akash.goel@arm.com> wrote:

> On 10/30/25 14:05, Boris Brezillon wrote:
> > From: Loïc Molinari <loic.molinari@collabora.com>
> >
> > Will be used by the UMD to optimize CPU accesses to buffers
> > that are frequently read by the CPU, or on which the access
> > pattern makes non-cacheable mappings inefficient.
> >
> > Mapping buffers CPU-cached implies taking care of the CPU
> > cache maintenance in the UMD, unless the GPU is IO coherent.
> >
> > v2:
> > - Add more to the commit message
> > - Tweak the doc
> > - Make sure we sync the section of the BO pointing to the CS
> >    syncobj before we read its seqno
> >
> > v3:
> > - Fix formatting/spelling issues
> >
> > v4:
> > - Add Steve's R-b
> >
> > v5:
> > - Drop Steve's R-b (changes in the ioctl semantics requiring
> >    new review)
> >
> > Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> >   drivers/gpu/drm/panthor/panthor_drv.c   |  7 ++++-
> >   drivers/gpu/drm/panthor/panthor_gem.c   | 37 +++++++++++++++++++++++--
> >   drivers/gpu/drm/panthor/panthor_sched.c | 18 ++++++++++--
> >   include/uapi/drm/panthor_drm.h          | 12 ++++++++
> >   4 files changed, 69 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> > index c07fc5dcd4a1..4e915f5ef3fa 100644
> > --- a/drivers/gpu/drm/panthor/panthor_drv.c
> > +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> > @@ -900,7 +900,8 @@ static int panthor_ioctl_vm_destroy(struct drm_device *ddev, void *data,
> >       return panthor_vm_pool_destroy_vm(pfile->vms, args->id);
> >   }
> >
> > -#define PANTHOR_BO_FLAGS             DRM_PANTHOR_BO_NO_MMAP
> > +#define PANTHOR_BO_FLAGS             (DRM_PANTHOR_BO_NO_MMAP | \
> > +                                      DRM_PANTHOR_BO_WB_MMAP)
> >
> >   static int panthor_ioctl_bo_create(struct drm_device *ddev, void *data,
> >                                  struct drm_file *file)
> > @@ -919,6 +920,10 @@ static int panthor_ioctl_bo_create(struct drm_device *ddev, void *data,
> >               goto out_dev_exit;
> >       }
> >
> > +     if ((args->flags & DRM_PANTHOR_BO_NO_MMAP) &&
> > +         (args->flags & DRM_PANTHOR_BO_WB_MMAP))
> > +             return -EINVAL;
> > +
> >       if (args->exclusive_vm_id) {
> >               vm = panthor_vm_pool_get_vm(pfile->vms, args->exclusive_vm_id);
> >               if (!vm) {
> > diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
> > index 1b1e98c61b8c..479a779ee59d 100644
> > --- a/drivers/gpu/drm/panthor/panthor_gem.c
> > +++ b/drivers/gpu/drm/panthor/panthor_gem.c
> > @@ -58,6 +58,39 @@ static void panthor_gem_debugfs_set_usage_flags(struct panthor_gem_object *bo, u
> >   static void panthor_gem_debugfs_bo_init(struct panthor_gem_object *bo) {}
> >   #endif
> >
> > +static bool
> > +should_map_wc(struct panthor_gem_object *bo, struct panthor_vm *exclusive_vm)
> > +{
> > +     struct panthor_device *ptdev = container_of(bo->base.base.dev, struct panthor_device, base);
> > +
> > +     /* We can't do uncached mappings if the device is coherent,
> > +      * because the zeroing done by the shmem layer at page allocation
> > +      * time happens on a cached mapping which isn't CPU-flushed (at least
> > +      * not on Arm64 where the flush is deferred to PTE setup time, and
> > +      * only done conditionally based on the mapping permissions). We can't
> > +      * rely on dma_map_sgtable()/dma_sync_sgtable_for_xxx() either to flush
> > +      * those, because they are NOPed if dma_dev_coherent() returns true.
> > +      *
> > +      * FIXME: Note that this problem is going to pop up again when we
> > +      * decide to support mapping buffers with the NO_MMAP flag as
> > +      * non-shareable (AKA buffers accessed only by the GPU), because we
> > +      * need the same CPU flush to happen after page allocation, otherwise
> > +      * there's a risk of data leak or late corruption caused by a dirty
> > +      * cacheline being evicted. At this point we'll need a way to force
> > +      * CPU cache maintenance regardless of whether the device is coherent
> > +      * or not.
> > +      */
> > +     if (ptdev->coherent)
> > +             return false;
> > +
> > +     /* Cached mappings are explicitly requested, so no write-combine. */
> > +     if (bo->flags & DRM_PANTHOR_BO_WB_MMAP)
> > +             return false;
> > +
> > +     /* The default is write-combine. */
> > +     return true;
> > +}
> > +
> >   static void panthor_gem_free_object(struct drm_gem_object *obj)
> >   {
> >       struct panthor_gem_object *bo = to_panthor_bo(obj);
> > @@ -152,6 +185,7 @@ panthor_kernel_bo_create(struct panthor_device *ptdev, struct panthor_vm *vm,
> >       bo = to_panthor_bo(&obj->base);
> >       kbo->obj = &obj->base;
> >       bo->flags = bo_flags;
> > +     bo->base.map_wc = should_map_wc(bo, vm);
> >
> >       if (vm == panthor_fw_vm(ptdev))
> >               debug_flags |= PANTHOR_DEBUGFS_GEM_USAGE_FLAG_FW_MAPPED;
> > @@ -255,7 +289,6 @@ static const struct drm_gem_object_funcs panthor_gem_funcs = {
> >    */
> >   struct drm_gem_object *panthor_gem_create_object(struct drm_device *ddev, size_t size)
> >   {
> > -     struct panthor_device *ptdev = container_of(ddev, struct panthor_device, base);
> >       struct panthor_gem_object *obj;
> >
> >       obj = kzalloc(sizeof(*obj), GFP_KERNEL);
> > @@ -263,7 +296,6 @@ struct drm_gem_object *panthor_gem_create_object(struct drm_device *ddev, size_t
> >               return ERR_PTR(-ENOMEM);
> >
> >       obj->base.base.funcs = &panthor_gem_funcs;
> > -     obj->base.map_wc = !ptdev->coherent;
> >       mutex_init(&obj->label.lock);
> >
> >       panthor_gem_debugfs_bo_init(obj);
> > @@ -298,6 +330,7 @@ panthor_gem_create_with_handle(struct drm_file *file,
> >
> >       bo = to_panthor_bo(&shmem->base);
> >       bo->flags = flags;
> > +     bo->base.map_wc = should_map_wc(bo, exclusive_vm);
> >
> >       if (exclusive_vm) {
> >               bo->exclusive_vm_root_gem = panthor_vm_root_gem(exclusive_vm);
> > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> > index f5e01cb16cfc..7d5da5717de2 100644
> > --- a/drivers/gpu/drm/panthor/panthor_sched.c
> > +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> > @@ -868,8 +868,11 @@ panthor_queue_get_syncwait_obj(struct panthor_group *group, struct panthor_queue
> >       struct iosys_map map;
> >       int ret;
> >
> > -     if (queue->syncwait.kmap)
> > -             return queue->syncwait.kmap + queue->syncwait.offset;
> > +     if (queue->syncwait.kmap) {
> > +             bo = container_of(queue->syncwait.obj,
> > +                               struct panthor_gem_object, base.base);
> > +             goto out_sync;
> > +     }
> >
> >       bo = panthor_vm_get_bo_for_va(group->vm,
> >                                     queue->syncwait.gpu_va,
> > @@ -886,6 +889,17 @@ panthor_queue_get_syncwait_obj(struct panthor_group *group, struct panthor_queue
> >       if (drm_WARN_ON(&ptdev->base, !queue->syncwait.kmap))
> >               goto err_put_syncwait_obj;
> >
> > +out_sync:
> > +     /* Make sure the CPU caches are invalidated before the seqno is read.
> > +      * drm_gem_shmem_sync() is a NOP if map_wc=false, so no need to check  
> 
> Sorry nitpick.
> 
> IIUC, drm_gem_shmem_sync() would be a NOP if 'map_wc' is true.

Oops, will fix that.

> 
> 
> 
> > +      * it here.
> > +      */
> > +     drm_gem_shmem_sync(&bo->base, queue->syncwait.offset,
> > +                        queue->syncwait.sync64 ?
> > +                                sizeof(struct panthor_syncobj_64b) :
> > +                                sizeof(struct panthor_syncobj_32b),
> > +                        DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE);
> > +
> >       return queue->syncwait.kmap + queue->syncwait.offset;
> >
> >   err_put_syncwait_obj:
> > diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> > index 7eec9f922183..57e2f5ffa03c 100644
> > --- a/include/uapi/drm/panthor_drm.h
> > +++ b/include/uapi/drm/panthor_drm.h
> > @@ -681,6 +681,18 @@ struct drm_panthor_vm_get_state {
> >   enum drm_panthor_bo_flags {
> >       /** @DRM_PANTHOR_BO_NO_MMAP: The buffer object will never be CPU-mapped in userspace. */
> >       DRM_PANTHOR_BO_NO_MMAP = (1 << 0),
> > +
> > +     /**
> > +      * @DRM_PANTHOR_BO_WB_MMAP: Force "Write-Back Cacheable" CPU mapping.
> > +      *
> > +      * CPU map the buffer object in userspace by forcing the "Write-Back
> > +      * Cacheable" cacheability attribute. The mapping otherwise uses the
> > +      * "Non-Cacheable" attribute if the GPU is not IO coherent.
> > +      *
> > +      * Can't be set if exclusive_vm_id=0 (only private BOs can be mapped
> > +      * cacheable).  
> 
> Sorry Boris, I may have misinterpreted the code.
> 
> As per the comment, DRM_PANTHOR_BO_WB_MMAP flag should be rejected if
> 'exclusive_vm' is NULL. But I don't see any check for 'exclusive_vm'
> pointer inside should_map_wc().

You're right, I had this behavior enforced at some point, and dropped
it after adding {begin,end}_cpu_access() implementations to panthor.
I'll revisit the comment or re-introduce the check in v6 based on how
the review process goes.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation
  2025-10-30 14:05 ` [PATCH v5 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation Boris Brezillon
  2025-10-30 14:31   ` [PATCH v5 02/16] drm/shmem: Provide a generic {begin,end}_cpu_access() implementation Christian König
@ 2025-11-03 20:34   ` Akash Goel
  2025-11-04  7:42     ` Boris Brezillon
  1 sibling, 1 reply; 30+ messages in thread
From: Akash Goel @ 2025-11-03 20:34 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, kernel



On 10/30/25 14:05, Boris Brezillon wrote:
> The default implementation simply takes care of invalidating/flushing
> caches around CPU accesses. It takes care of both the exporter and
> the importers, which forces us to overload the default
> ::[un]map_dma_buf() implementation provided by drm_gem.c to store the
> sgt.
>
> v5:
> - New patch
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>   drivers/gpu/drm/drm_gem_shmem_helper.c | 114 +++++++++++++++++++++++++
>   include/drm/drm_gem_shmem_helper.h     |  10 +++
>   2 files changed, 124 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
> index dc94a27710e5..e49c75739c20 100644
> --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> @@ -893,6 +893,120 @@ struct drm_gem_object *drm_gem_shmem_prime_import_no_map(struct drm_device *dev,
>   }
>   EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_import_no_map);
>
> +/**
> + * drm_gem_shmem_prime_unmap_dma_buf - Default unmap_dma_buf() for exported buffers
> + * @attach: attachment
> + * @sgt: SG table to unmap
> + * @dir: type of access done by this attachment
> + *
> + * Default implementation for dma_buf_ops::map_dma_buf(). This is just a wrapper
> + * around drm_gem_map_dma_buf() that lets us set the dma_buf_attachment::priv
> + * to the sgt so that drm_gem_shmem_prime_{begin,end}_cpu_access() can sync
> + * around CPU accesses.
> + */
> +struct sg_table *
> +drm_gem_shmem_prime_map_dma_buf(struct dma_buf_attachment *attach,
> +                             enum dma_data_direction dir)
> +{
> +     struct sg_table *sgt = drm_gem_map_dma_buf(attach, dir);
> +
> +     if (!IS_ERR(sgt))
> +             attach->priv = sgt;
> +
> +     return sgt;
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_map_dma_buf);
> +
> +/**
> + * drm_gem_shmem_prime_unmap_dma_buf - Default unmap_dma_buf() for exported buffers
> + * @attach: attachment
> + * @sgt: SG table to unmap
> + * @dir: type of access done by this attachment
> + *
> + * Default implementation for dma_buf_ops::unmap_dma_buf(). This is just a wrapper
> + * around drm_gem_unmap_dma_buf() that lets us reset the dma_buf_attachment::priv
> + * field so that drm_gem_shmem_prime_{begin,end}_cpu_access() don't consider it
> + * as a mapped attachment to sync against.
> + */
> +void drm_gem_shmem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
> +                                    struct sg_table *sgt,
> +                                    enum dma_data_direction dir)
> +{
> +     attach->priv = NULL;
> +     drm_gem_unmap_dma_buf(attach, sgt, dir);
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_unmap_dma_buf);
> +
> +/**
> + * drm_gem_shmem_prime_begin_cpu_access - Default end_cpu_access() for exported buffers


Sorry nitpick. There is a typo here. Should be 'Default begin_cpu_access()`.


> + * @dma_buf: The exported DMA buffer this acts on
> + * @dir: direction of the access
> + *
> + * Default implementation for dma_buf_ops::begin_cpu_access(). This only takes care of
> + * cache maintenance.
> + */
> +int drm_gem_shmem_prime_begin_cpu_access(struct dma_buf *dma_buf,
> +                                      enum dma_data_direction dir)
> +{
> +     struct drm_gem_object *obj = dma_buf->priv;
> +     struct drm_device *dev = obj->dev;
> +     struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
> +     struct dma_buf_attachment *attach;
> +
> +     dma_resv_lock(obj->resv, NULL);
> +     if (shmem->sgt)
> +             dma_sync_sgtable_for_cpu(dev->dev, shmem->sgt, dir);
> +
> +     if (shmem->vaddr)
> +             invalidate_kernel_vmap_range(shmem->vaddr, shmem->base.size);
> +
> +     list_for_each_entry(attach, &dma_buf->attachments, node) {
> +             struct sg_table *sgt = attach->priv;
> +
> +             if (sgt)
> +                     dma_sync_sgtable_for_cpu(attach->dev, sgt, dir);
> +     }
> +     dma_resv_unlock(obj->resv);
> +
> +     return 0;
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_begin_cpu_access);
> +
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 07/16] drm/panthor: Add a PANTHOR_BO_SYNC ioctl
  2025-10-30 14:05 ` [PATCH v5 07/16] drm/panthor: Add a PANTHOR_BO_SYNC ioctl Boris Brezillon
  2025-10-31  7:25   ` Marcin Ślusarz
@ 2025-11-03 20:42   ` Akash Goel
  2025-11-04  7:41     ` Boris Brezillon
  1 sibling, 1 reply; 30+ messages in thread
From: Akash Goel @ 2025-11-03 20:42 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, kernel



On 10/30/25 14:05, Boris Brezillon wrote:
> This will be used by the UMD to synchronize CPU-cached mappings when
> the UMD can't do it directly (no usermode cache maintenance instruction
> on Arm32).
>
> v2:
> - Change the flags so they better match the drm_gem_shmem_sync()
>    semantics
>
> v3:
> - Add Steve's R-b
>
> v4:
> - No changes
>
> v5:
> - Drop Steve's R-b (the semantics changes call for a new review)
>
> Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
>   drivers/gpu/drm/panthor/panthor_drv.c | 42 +++++++++++++++++++++-
>   drivers/gpu/drm/panthor/panthor_gem.c | 21 +++++++++++
>   drivers/gpu/drm/panthor/panthor_gem.h |  3 ++
>   include/uapi/drm/panthor_drm.h        | 52 +++++++++++++++++++++++++++
>   4 files changed, 117 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index 99a4534c0074..cad5c4270b04 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -175,7 +175,8 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
>                PANTHOR_UOBJ_DECL(struct drm_panthor_sync_op, timeline_value), \
>                PANTHOR_UOBJ_DECL(struct drm_panthor_queue_submit, syncs), \
>                PANTHOR_UOBJ_DECL(struct drm_panthor_queue_create, ringbuf_size), \
> -              PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs))
> +              PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs), \
> +              PANTHOR_UOBJ_DECL(struct drm_panthor_bo_sync_op, size))
>
>   /**
>    * PANTHOR_UOBJ_SET() - Copy a kernel object to a user object.
> @@ -1394,6 +1395,44 @@ static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev,
>       return 0;
>   }
>
> +#define PANTHOR_BO_SYNC_OP_FLAGS                                      \
> +     (DRM_PANTHOR_BO_SYNC_FOR_DEV | DRM_PANTHOR_BO_SYNC_FOR_READ | \
> +      DRM_PANTHOR_BO_SYNC_FOR_WRITE)
> +


Sorry, couldn't find where PANTHOR_BO_SYNC_OP_FLAGS and
DRM_PANTHOR_BO_SYNC_FOR_xxx macros get used.



> +static int panthor_ioctl_bo_sync(struct drm_device *ddev, void *data,
> +                              struct drm_file *file)
> +{
> +     struct drm_panthor_bo_sync *args = data;
> +     struct drm_panthor_bo_sync_op *ops;
> +     struct drm_gem_object *obj;
> +     int ret = 0;
> +
> +     ret = PANTHOR_UOBJ_GET_ARRAY(ops, &args->ops);
> +     if (ret)
> +             return ret;
> +
> +     for (u32 i = 0; i < args->ops.count; i++) {
> +             obj = drm_gem_object_lookup(file, ops[i].handle);
> +             if (!obj) {
> +                     ret = -ENOENT;
> +                     goto err_ops;
> +             }
> +
> +             ret = panthor_gem_sync(obj, ops[i].type, ops[i].offset,
> +                                    ops[i].size);
> +
> +             drm_gem_object_put(obj);
> +
> +             if (ret)
> +                     goto err_ops;
> +     }
> +
> +err_ops:
> +     kvfree(ops);
> +
> +     return ret;
> +}
> +
>   static int
>   panthor_open(struct drm_device *ddev, struct drm_file *file)
>   {
> @@ -1468,6 +1507,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
>       PANTHOR_IOCTL(GROUP_SUBMIT, group_submit, DRM_RENDER_ALLOW),
>       PANTHOR_IOCTL(BO_SET_LABEL, bo_set_label, DRM_RENDER_ALLOW),
>       PANTHOR_IOCTL(SET_USER_MMIO_OFFSET, set_user_mmio_offset, DRM_RENDER_ALLOW),
> +     PANTHOR_IOCTL(BO_SYNC, bo_sync, DRM_RENDER_ALLOW),
>   };
>
>   static int panthor_mmap(struct file *filp, struct vm_area_struct *vma)
> diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
> index 160692e45f44..1b1e98c61b8c 100644
> --- a/drivers/gpu/drm/panthor/panthor_gem.c
> +++ b/drivers/gpu/drm/panthor/panthor_gem.c
> @@ -357,6 +357,27 @@ panthor_gem_kernel_bo_set_label(struct panthor_kernel_bo *bo, const char *label)
>       panthor_gem_bo_set_label(bo->obj, str);
>   }
>
> +int
> +panthor_gem_sync(struct drm_gem_object *obj, u32 type,
> +              u64 offset, u64 size)
> +{
> +     enum drm_gem_shmem_sync_type shmem_sync_type;
> +     struct panthor_gem_object *bo = to_panthor_bo(obj);
> +
> +     switch (type) {
> +     case DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH:
> +             shmem_sync_type = DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH;
> +             break;
> +     case DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE:
> +             shmem_sync_type = DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE;
> +             break;
> +     default:
> +             return -EINVAL;
> +     }
> +
> +     return drm_gem_shmem_sync(&bo->base, offset, size, shmem_sync_type);
> +}
> +
>   #ifdef CONFIG_DEBUG_FS
>   struct gem_size_totals {
>       size_t size;
> diff --git a/drivers/gpu/drm/panthor/panthor_gem.h b/drivers/gpu/drm/panthor/panthor_gem.h
> index 528088839468..8705c492c5b6 100644
> --- a/drivers/gpu/drm/panthor/panthor_gem.h
> +++ b/drivers/gpu/drm/panthor/panthor_gem.h
> @@ -147,6 +147,9 @@ panthor_gem_create_with_handle(struct drm_file *file,
>   void panthor_gem_bo_set_label(struct drm_gem_object *obj, const char *label);
>   void panthor_gem_kernel_bo_set_label(struct panthor_kernel_bo *bo, const char *label);
>
> +int panthor_gem_sync(struct drm_gem_object *obj,
> +                  u32 type, u64 offset, u64 size);
> +
>   const struct dma_buf_ops *
>   panthor_gem_prime_get_dma_buf_ops(struct drm_device *dev);
>
> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> index f0f637e0631d..bb12760abe99 100644
> --- a/include/uapi/drm/panthor_drm.h
> +++ b/include/uapi/drm/panthor_drm.h
> @@ -144,6 +144,9 @@ enum drm_panthor_ioctl_id {
>        * pgoff_t size.
>        */
>       DRM_PANTHOR_SET_USER_MMIO_OFFSET,
> +
> +     /** @DRM_PANTHOR_BO_SYNC: Sync BO data to/from the device */
> +     DRM_PANTHOR_BO_SYNC,
>   };
>
>   /**
> @@ -1073,6 +1076,53 @@ struct drm_panthor_set_user_mmio_offset {
>       __u64 offset;
>   };
>
> +/**
> + * enum drm_panthor_bo_sync_op_type - BO sync type
> + */
> +enum drm_panthor_bo_sync_op_type {
> +     /** @DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH: Flush CPU caches. */
> +     DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH = 0,
> +
> +     /** @DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE: Flush and invalidate CPU caches. */
> +     DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE = 1,
> +};
> +
> +/**
> + * struct drm_panthor_bo_sync_op - BO map sync op
> + */
> +struct drm_panthor_bo_sync_op {
> +     /** @handle: Handle of the buffer object to sync. */
> +     __u32 handle;
> +
> +     /** @type: Type of operation. */
> +     __u32 type;
> +
> +     /**
> +      * @offset: Offset into the BO at which the sync range starts.
> +      *
> +      * This will be rounded down to the nearest cache line as needed.
> +      */
> +     __u64 offset;
> +
> +     /**
> +      * @size: Size of the range to sync
> +      *
> +      * @size + @offset will be rounded up to the nearest cache line as
> +      * needed.
> +      */
> +     __u64 size;
> +};
> +
> +/**
> + * struct drm_panthor_bo_sync - BO map sync request
> + */
> +struct drm_panthor_bo_sync {
> +     /**
> +      * @ops: Array of struct drm_panthor_bo_sync_op sync operations.
> +      */
> +     struct drm_panthor_obj_array ops;
> +};
> +
>   /**
>    * DRM_IOCTL_PANTHOR() - Build a Panthor IOCTL number
>    * @__access: Access type. Must be R, W or RW.
> @@ -1119,6 +1169,8 @@ enum {
>               DRM_IOCTL_PANTHOR(WR, BO_SET_LABEL, bo_set_label),
>       DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET =
>               DRM_IOCTL_PANTHOR(WR, SET_USER_MMIO_OFFSET, set_user_mmio_offset),
> +     DRM_IOCTL_PANTHOR_BO_SYNC =
> +             DRM_IOCTL_PANTHOR(WR, BO_SYNC, bo_sync),
>   };
>
>   #if defined(__cplusplus)
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 07/16] drm/panthor: Add a PANTHOR_BO_SYNC ioctl
  2025-11-03 20:42   ` Akash Goel
@ 2025-11-04  7:41     ` Boris Brezillon
  0 siblings, 0 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-11-04  7:41 UTC (permalink / raw)
  To: Akash Goel
  Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
	Thierry Reding, Mikko Perttunen, Melissa Wen, Maíra Canal,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, Frank Binns,
	Matt Coster, Rob Clark, Dmitry Baryshkov, Abhinav Kumar,
	Jessica Zhang, Sean Paul, Marijn Suijten, Alex Deucher,
	Christian König, amd-gfx, kernel

On Mon, 3 Nov 2025 20:42:05 +0000
Akash Goel <akash.goel@arm.com> wrote:

> On 10/30/25 14:05, Boris Brezillon wrote:
> > This will be used by the UMD to synchronize CPU-cached mappings when
> > the UMD can't do it directly (no usermode cache maintenance instruction
> > on Arm32).
> >
> > v2:
> > - Change the flags so they better match the drm_gem_shmem_sync()
> >    semantics
> >
> > v3:
> > - Add Steve's R-b
> >
> > v4:
> > - No changes
> >
> > v5:
> > - Drop Steve's R-b (the semantics changes call for a new review)
> >
> > Signed-off-by: Faith Ekstrand <faith.ekstrand@collabora.com>
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> >   drivers/gpu/drm/panthor/panthor_drv.c | 42 +++++++++++++++++++++-
> >   drivers/gpu/drm/panthor/panthor_gem.c | 21 +++++++++++
> >   drivers/gpu/drm/panthor/panthor_gem.h |  3 ++
> >   include/uapi/drm/panthor_drm.h        | 52 +++++++++++++++++++++++++++
> >   4 files changed, 117 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> > index 99a4534c0074..cad5c4270b04 100644
> > --- a/drivers/gpu/drm/panthor/panthor_drv.c
> > +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> > @@ -175,7 +175,8 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
> >                PANTHOR_UOBJ_DECL(struct drm_panthor_sync_op, timeline_value), \
> >                PANTHOR_UOBJ_DECL(struct drm_panthor_queue_submit, syncs), \
> >                PANTHOR_UOBJ_DECL(struct drm_panthor_queue_create, ringbuf_size), \
> > -              PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs))
> > +              PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs), \
> > +              PANTHOR_UOBJ_DECL(struct drm_panthor_bo_sync_op, size))
> >
> >   /**
> >    * PANTHOR_UOBJ_SET() - Copy a kernel object to a user object.
> > @@ -1394,6 +1395,44 @@ static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev,
> >       return 0;
> >   }
> >
> > +#define PANTHOR_BO_SYNC_OP_FLAGS                                      \
> > +     (DRM_PANTHOR_BO_SYNC_FOR_DEV | DRM_PANTHOR_BO_SYNC_FOR_READ | \
> > +      DRM_PANTHOR_BO_SYNC_FOR_WRITE)
> > +  
> 
> 
> Sorry, couldn't find where PANTHOR_BO_SYNC_OP_FLAGS and
> DRM_PANTHOR_BO_SYNC_FOR_xxx macros get used.

That's a leftover from v4, it's not supposed to be there. I'll drop
that.

> 
> 
> 
> > +static int panthor_ioctl_bo_sync(struct drm_device *ddev, void *data,
> > +                              struct drm_file *file)
> > +{
> > +     struct drm_panthor_bo_sync *args = data;
> > +     struct drm_panthor_bo_sync_op *ops;
> > +     struct drm_gem_object *obj;
> > +     int ret = 0;
> > +
> > +     ret = PANTHOR_UOBJ_GET_ARRAY(ops, &args->ops);
> > +     if (ret)
> > +             return ret;
> > +
> > +     for (u32 i = 0; i < args->ops.count; i++) {
> > +             obj = drm_gem_object_lookup(file, ops[i].handle);
> > +             if (!obj) {
> > +                     ret = -ENOENT;
> > +                     goto err_ops;
> > +             }
> > +
> > +             ret = panthor_gem_sync(obj, ops[i].type, ops[i].offset,
> > +                                    ops[i].size);
> > +
> > +             drm_gem_object_put(obj);
> > +
> > +             if (ret)
> > +                     goto err_ops;
> > +     }
> > +
> > +err_ops:
> > +     kvfree(ops);
> > +
> > +     return ret;
> > +}
> > +
> >   static int
> >   panthor_open(struct drm_device *ddev, struct drm_file *file)
> >   {
> > @@ -1468,6 +1507,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
> >       PANTHOR_IOCTL(GROUP_SUBMIT, group_submit, DRM_RENDER_ALLOW),
> >       PANTHOR_IOCTL(BO_SET_LABEL, bo_set_label, DRM_RENDER_ALLOW),
> >       PANTHOR_IOCTL(SET_USER_MMIO_OFFSET, set_user_mmio_offset, DRM_RENDER_ALLOW),
> > +     PANTHOR_IOCTL(BO_SYNC, bo_sync, DRM_RENDER_ALLOW),
> >   };
> >
> >   static int panthor_mmap(struct file *filp, struct vm_area_struct *vma)
> > diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
> > index 160692e45f44..1b1e98c61b8c 100644
> > --- a/drivers/gpu/drm/panthor/panthor_gem.c
> > +++ b/drivers/gpu/drm/panthor/panthor_gem.c
> > @@ -357,6 +357,27 @@ panthor_gem_kernel_bo_set_label(struct panthor_kernel_bo *bo, const char *label)
> >       panthor_gem_bo_set_label(bo->obj, str);
> >   }
> >
> > +int
> > +panthor_gem_sync(struct drm_gem_object *obj, u32 type,
> > +              u64 offset, u64 size)
> > +{
> > +     enum drm_gem_shmem_sync_type shmem_sync_type;
> > +     struct panthor_gem_object *bo = to_panthor_bo(obj);
> > +
> > +     switch (type) {
> > +     case DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH:
> > +             shmem_sync_type = DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH;
> > +             break;
> > +     case DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE:
> > +             shmem_sync_type = DRM_GEM_SHMEM_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE;
> > +             break;
> > +     default:
> > +             return -EINVAL;
> > +     }
> > +
> > +     return drm_gem_shmem_sync(&bo->base, offset, size, shmem_sync_type);
> > +}
> > +
> >   #ifdef CONFIG_DEBUG_FS
> >   struct gem_size_totals {
> >       size_t size;
> > diff --git a/drivers/gpu/drm/panthor/panthor_gem.h b/drivers/gpu/drm/panthor/panthor_gem.h
> > index 528088839468..8705c492c5b6 100644
> > --- a/drivers/gpu/drm/panthor/panthor_gem.h
> > +++ b/drivers/gpu/drm/panthor/panthor_gem.h
> > @@ -147,6 +147,9 @@ panthor_gem_create_with_handle(struct drm_file *file,
> >   void panthor_gem_bo_set_label(struct drm_gem_object *obj, const char *label);
> >   void panthor_gem_kernel_bo_set_label(struct panthor_kernel_bo *bo, const char *label);
> >
> > +int panthor_gem_sync(struct drm_gem_object *obj,
> > +                  u32 type, u64 offset, u64 size);
> > +
> >   const struct dma_buf_ops *
> >   panthor_gem_prime_get_dma_buf_ops(struct drm_device *dev);
> >
> > diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> > index f0f637e0631d..bb12760abe99 100644
> > --- a/include/uapi/drm/panthor_drm.h
> > +++ b/include/uapi/drm/panthor_drm.h
> > @@ -144,6 +144,9 @@ enum drm_panthor_ioctl_id {
> >        * pgoff_t size.
> >        */
> >       DRM_PANTHOR_SET_USER_MMIO_OFFSET,
> > +
> > +     /** @DRM_PANTHOR_BO_SYNC: Sync BO data to/from the device */
> > +     DRM_PANTHOR_BO_SYNC,
> >   };
> >
> >   /**
> > @@ -1073,6 +1076,53 @@ struct drm_panthor_set_user_mmio_offset {
> >       __u64 offset;
> >   };
> >
> > +/**
> > + * enum drm_panthor_bo_sync_op_type - BO sync type
> > + */
> > +enum drm_panthor_bo_sync_op_type {
> > +     /** @DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH: Flush CPU caches. */
> > +     DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH = 0,
> > +
> > +     /** @DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE: Flush and invalidate CPU caches. */
> > +     DRM_PANTHOR_BO_SYNC_CPU_CACHE_FLUSH_AND_INVALIDATE = 1,
> > +};
> > +
> > +/**
> > + * struct drm_panthor_bo_sync_op - BO map sync op
> > + */
> > +struct drm_panthor_bo_sync_op {
> > +     /** @handle: Handle of the buffer object to sync. */
> > +     __u32 handle;
> > +
> > +     /** @type: Type of operation. */
> > +     __u32 type;
> > +
> > +     /**
> > +      * @offset: Offset into the BO at which the sync range starts.
> > +      *
> > +      * This will be rounded down to the nearest cache line as needed.
> > +      */
> > +     __u64 offset;
> > +
> > +     /**
> > +      * @size: Size of the range to sync
> > +      *
> > +      * @size + @offset will be rounded up to the nearest cache line as
> > +      * needed.
> > +      */
> > +     __u64 size;
> > +};
> > +
> > +/**
> > + * struct drm_panthor_bo_sync - BO map sync request
> > + */
> > +struct drm_panthor_bo_sync {
> > +     /**
> > +      * @ops: Array of struct drm_panthor_bo_sync_op sync operations.
> > +      */
> > +     struct drm_panthor_obj_array ops;
> > +};
> > +
> >   /**
> >    * DRM_IOCTL_PANTHOR() - Build a Panthor IOCTL number
> >    * @__access: Access type. Must be R, W or RW.
> > @@ -1119,6 +1169,8 @@ enum {
> >               DRM_IOCTL_PANTHOR(WR, BO_SET_LABEL, bo_set_label),
> >       DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET =
> >               DRM_IOCTL_PANTHOR(WR, SET_USER_MMIO_OFFSET, set_user_mmio_offset),
> > +     DRM_IOCTL_PANTHOR_BO_SYNC =
> > +             DRM_IOCTL_PANTHOR(WR, BO_SYNC, bo_sync),
> >   };
> >
> >   #if defined(__cplusplus)  
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation
  2025-11-03 20:34   ` [PATCH v5 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation Akash Goel
@ 2025-11-04  7:42     ` Boris Brezillon
  0 siblings, 0 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-11-04  7:42 UTC (permalink / raw)
  To: Akash Goel
  Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
	Thierry Reding, Mikko Perttunen, Melissa Wen, Maíra Canal,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, Frank Binns,
	Matt Coster, Rob Clark, Dmitry Baryshkov, Abhinav Kumar,
	Jessica Zhang, Sean Paul, Marijn Suijten, Alex Deucher,
	Christian König, amd-gfx, kernel

On Mon, 3 Nov 2025 20:34:39 +0000
Akash Goel <akash.goel@arm.com> wrote:

> On 10/30/25 14:05, Boris Brezillon wrote:
> > The default implementation simply takes care of invalidating/flushing
> > caches around CPU accesses. It takes care of both the exporter and
> > the importers, which forces us to overload the default
> > ::[un]map_dma_buf() implementation provided by drm_gem.c to store the
> > sgt.
> >
> > v5:
> > - New patch
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> >   drivers/gpu/drm/drm_gem_shmem_helper.c | 114 +++++++++++++++++++++++++
> >   include/drm/drm_gem_shmem_helper.h     |  10 +++
> >   2 files changed, 124 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
> > index dc94a27710e5..e49c75739c20 100644
> > --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> > +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> > @@ -893,6 +893,120 @@ struct drm_gem_object *drm_gem_shmem_prime_import_no_map(struct drm_device *dev,
> >   }
> >   EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_import_no_map);
> >
> > +/**
> > + * drm_gem_shmem_prime_unmap_dma_buf - Default unmap_dma_buf() for exported buffers
> > + * @attach: attachment
> > + * @sgt: SG table to unmap
> > + * @dir: type of access done by this attachment
> > + *
> > + * Default implementation for dma_buf_ops::map_dma_buf(). This is just a wrapper
> > + * around drm_gem_map_dma_buf() that lets us set the dma_buf_attachment::priv
> > + * to the sgt so that drm_gem_shmem_prime_{begin,end}_cpu_access() can sync
> > + * around CPU accesses.
> > + */
> > +struct sg_table *
> > +drm_gem_shmem_prime_map_dma_buf(struct dma_buf_attachment *attach,
> > +                             enum dma_data_direction dir)
> > +{
> > +     struct sg_table *sgt = drm_gem_map_dma_buf(attach, dir);
> > +
> > +     if (!IS_ERR(sgt))
> > +             attach->priv = sgt;
> > +
> > +     return sgt;
> > +}
> > +EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_map_dma_buf);
> > +
> > +/**
> > + * drm_gem_shmem_prime_unmap_dma_buf - Default unmap_dma_buf() for exported buffers
> > + * @attach: attachment
> > + * @sgt: SG table to unmap
> > + * @dir: type of access done by this attachment
> > + *
> > + * Default implementation for dma_buf_ops::unmap_dma_buf(). This is just a wrapper
> > + * around drm_gem_unmap_dma_buf() that lets us reset the dma_buf_attachment::priv
> > + * field so that drm_gem_shmem_prime_{begin,end}_cpu_access() don't consider it
> > + * as a mapped attachment to sync against.
> > + */
> > +void drm_gem_shmem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
> > +                                    struct sg_table *sgt,
> > +                                    enum dma_data_direction dir)
> > +{
> > +     attach->priv = NULL;
> > +     drm_gem_unmap_dma_buf(attach, sgt, dir);
> > +}
> > +EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_unmap_dma_buf);
> > +
> > +/**
> > + * drm_gem_shmem_prime_begin_cpu_access - Default end_cpu_access() for exported buffers  
> 
> 
> Sorry nitpick. There is a typo here. Should be 'Default begin_cpu_access()`.

Nice catch. Will fix in v6.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/16] drm/shmem: Provide a generic {begin,end}_cpu_access() implementation
  2025-10-30 14:31   ` [PATCH v5 02/16] drm/shmem: Provide a generic {begin,end}_cpu_access() implementation Christian König
@ 2025-11-04  8:08     ` Boris Brezillon
  0 siblings, 0 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-11-04  8:08 UTC (permalink / raw)
  To: Christian König
  Cc: Steven Price, dri-devel, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Faith Ekstrand,
	Thierry Reding, Mikko Perttunen, Melissa Wen, Maíra Canal,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, Frank Binns,
	Matt Coster, Rob Clark, Dmitry Baryshkov, Abhinav Kumar,
	Jessica Zhang, Sean Paul, Marijn Suijten, Alex Deucher, amd-gfx,
	kernel

On Thu, 30 Oct 2025 15:31:44 +0100
Christian König <christian.koenig@amd.com> wrote:

> On 10/30/25 15:05, Boris Brezillon wrote:
> > The default implementation simply takes care of invalidating/flushing
> > caches around CPU accesses. It takes care of both the exporter and
> > the importers, which forces us to overload the default
> > ::[un]map_dma_buf() implementation provided by drm_gem.c to store the
> > sgt.
> > 
> > v5:
> > - New patch
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> >  drivers/gpu/drm/drm_gem_shmem_helper.c | 114 +++++++++++++++++++++++++
> >  include/drm/drm_gem_shmem_helper.h     |  10 +++
> >  2 files changed, 124 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
> > index dc94a27710e5..e49c75739c20 100644
> > --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> > +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> > @@ -893,6 +893,120 @@ struct drm_gem_object *drm_gem_shmem_prime_import_no_map(struct drm_device *dev,
> >  }
> >  EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_import_no_map);
> >  
> > +/**
> > + * drm_gem_shmem_prime_unmap_dma_buf - Default unmap_dma_buf() for exported buffers
> > + * @attach: attachment
> > + * @sgt: SG table to unmap
> > + * @dir: type of access done by this attachment
> > + *
> > + * Default implementation for dma_buf_ops::map_dma_buf(). This is just a wrapper
> > + * around drm_gem_map_dma_buf() that lets us set the dma_buf_attachment::priv
> > + * to the sgt so that drm_gem_shmem_prime_{begin,end}_cpu_access() can sync
> > + * around CPU accesses.
> > + */
> > +struct sg_table *
> > +drm_gem_shmem_prime_map_dma_buf(struct dma_buf_attachment *attach,
> > +				enum dma_data_direction dir)
> > +{
> > +	struct sg_table *sgt = drm_gem_map_dma_buf(attach, dir);
> > +
> > +	if (!IS_ERR(sgt))
> > +		attach->priv = sgt;
> > +
> > +	return sgt;
> > +}
> > +EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_map_dma_buf);
> > +
> > +/**
> > + * drm_gem_shmem_prime_unmap_dma_buf - Default unmap_dma_buf() for exported buffers
> > + * @attach: attachment
> > + * @sgt: SG table to unmap
> > + * @dir: type of access done by this attachment
> > + *
> > + * Default implementation for dma_buf_ops::unmap_dma_buf(). This is just a wrapper
> > + * around drm_gem_unmap_dma_buf() that lets us reset the dma_buf_attachment::priv
> > + * field so that drm_gem_shmem_prime_{begin,end}_cpu_access() don't consider it
> > + * as a mapped attachment to sync against.
> > + */
> > +void drm_gem_shmem_prime_unmap_dma_buf(struct dma_buf_attachment *attach,
> > +				       struct sg_table *sgt,
> > +				       enum dma_data_direction dir)
> > +{
> > +	attach->priv = NULL;
> > +	drm_gem_unmap_dma_buf(attach, sgt, dir);
> > +}
> > +EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_unmap_dma_buf);
> > +
> > +/**
> > + * drm_gem_shmem_prime_begin_cpu_access - Default end_cpu_access() for exported buffers
> > + * @dma_buf: The exported DMA buffer this acts on
> > + * @dir: direction of the access
> > + *
> > + * Default implementation for dma_buf_ops::begin_cpu_access(). This only takes care of
> > + * cache maintenance.
> > + */
> > +int drm_gem_shmem_prime_begin_cpu_access(struct dma_buf *dma_buf,
> > +					 enum dma_data_direction dir)
> > +{
> > +	struct drm_gem_object *obj = dma_buf->priv;
> > +	struct drm_device *dev = obj->dev;
> > +	struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
> > +	struct dma_buf_attachment *attach;
> > +
> > +	dma_resv_lock(obj->resv, NULL);
> > +	if (shmem->sgt)
> > +		dma_sync_sgtable_for_cpu(dev->dev, shmem->sgt, dir);
> > +
> > +	if (shmem->vaddr)
> > +		invalidate_kernel_vmap_range(shmem->vaddr, shmem->base.size);
> > +  
> 
> 
> > +	list_for_each_entry(attach, &dma_buf->attachments, node) {
> > +		struct sg_table *sgt = attach->priv;
> > +
> > +		if (sgt)
> > +			dma_sync_sgtable_for_cpu(attach->dev, sgt, dir);
> > +	}  
> 
> This conflicts with the debugging hack in DMA-buf.
> 
> I've recently send out a patch to fix that, but it hasn't been pushed yet.

I guess you're talking about [1]. I'm not sure this is a problem
actually, because we don't pass the exporter's sg_table, we allocate a
new one each time. As for the importer's sg_table we attach to
dma_buf_attach::priv, it would still be mangled, and I believe it's
fine, because the dma_sync_xxx helpers manipulate dma_addr_t not pages.
Am I missing something?

[1]https://lists.linaro.org/archives/list/linaro-mm-sig@lists.linaro.org/thread/HZDQ7SFOG4722BISB6BEEA34B7QESM2O/

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2025-11-04  8:08 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops Boris Brezillon
2025-10-30 14:25   ` Christian König
2025-10-30 14:35     ` Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation Boris Brezillon
2025-10-30 14:31   ` [PATCH v5 02/16] drm/shmem: Provide a generic {begin,end}_cpu_access() implementation Christian König
2025-11-04  8:08     ` Boris Brezillon
2025-11-03 20:34   ` [PATCH v5 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation Akash Goel
2025-11-04  7:42     ` Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 03/16] drm/shmem: Add a drm_gem_shmem_sync() helper Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 04/16] drm/panthor: Provide a custom dma_buf implementation Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 05/16] drm/panthor: Fix panthor_gpu_coherency_set() Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 06/16] drm/panthor: Expose the selected coherency protocol to the UMD Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 07/16] drm/panthor: Add a PANTHOR_BO_SYNC ioctl Boris Brezillon
2025-10-31  7:25   ` Marcin Ślusarz
2025-11-03 20:42   ` Akash Goel
2025-11-04  7:41     ` Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 08/16] drm/panthor: Add an ioctl to query BO flags Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 09/16] drm/panthor: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
2025-11-03 16:43   ` Akash Goel
2025-11-03 17:13     ` Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 10/16] drm/panthor: Bump the driver version to 1.6 Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 11/16] drm/panfrost: Provide a custom dma_buf implementation Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 12/16] drm/panfrost: Expose the selected coherency protocol to the UMD Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 13/16] drm/panfrost: Add a PANFROST_SYNC_BO ioctl Boris Brezillon
2025-10-31  7:08   ` Marcin Ślusarz
2025-10-31  8:49     ` Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 14/16] drm/panfrost: Add an ioctl to query BO flags Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 15/16] drm/panfrost: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 16/16] drm/panfrost: Bump the driver version to 1.6 Boris Brezillon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).