dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing
@ 2025-10-30 14:05 Boris Brezillon
  2025-10-30 14:05 ` [PATCH v5 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops Boris Brezillon
                   ` (15 more replies)
  0 siblings, 16 replies; 30+ messages in thread
From: Boris Brezillon @ 2025-10-30 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: dri-devel, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Faith Ekstrand, Thierry Reding,
	Mikko Perttunen, Melissa Wen, Maíra Canal, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Frank Binns, Matt Coster,
	Rob Clark, Dmitry Baryshkov, Abhinav Kumar, Jessica Zhang,
	Sean Paul, Marijn Suijten, Alex Deucher, Christian König,
	amd-gfx, Boris Brezillon, kernel

This series implements cached maps and explicit flushing for both panfrost
and panthor. To avoid code/bug duplication, the tricky guts of the cache
flushing ioctl which walk the sg list are broken into a new common shmem
helper which can be used by any driver.

The PanVK MR to use this lives here:

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36385

A few words about this v5: Faith and I discussed a few cache
maintenance aspects that were bothering me, and she suggested to forbid
exports of CPU-cacheable BOs to simplify things. But while doing that I
realized this was still broken on systems where the Mali GPU is coherent
but other potential importers are not, because in that case, even BOs
declared without WB_MMAP are upgraded to CPU-cacheable. I first
considered making those truly uncached, but that's not possible because
coherency is flagged at the device level and making a BO non-cacheable
on these systems messes up with how shmem zeroes out the pages at alloc
time (with a temporary cached mapping) and how Arm64 lazily defers
flushes. Because dma_map_sgtable() ends up being a NOP if the device is
coherent, we end up with dirty cachelines hanging around even after
we've created an uncached mapping to the same memory region, and this
leads to corruptions later on, or even worse, potential data leaks
because the uncached mapping can access memory before it's been zeroed
out.

TLDR; CPU cache maintenance is an mess on Arm unless everything is
coherent, and we'll have to sort it out at some point, but I believe
this is orthogonal to us implementing proper
dma_buf_ops::{begin,end}_cpu_access(), so, I eventually decided to
implement dma_buf_ops::{begin,end}_cpu_access() instead of pretending
we're good if we only export CPU-uncached BOs (which we can't really
do on systems where Mali is coherent, see above). I've hacked an
IGT test to make sure this does the right thing, and it seems to work.
Now, the real question is, is this the right thing to do. I basically
went for the system dma_heap approach where not only the exporter, but
also all importers have a dma_sync_sgtable_for_xxx() called on their
sgtable to prepare/end the CPU access. This should work if only CPU
cache maintenance is involved, but as soon as the importer needs more
than that (copying memory around to make it CPU/device visible), it's
going to be problematic. The only way to do that properly, would be
to add {begin,end}_cpu_access() hooks to dma_buf_attach_ops and let
the dma_buf core walk the importers to forward CPU access requests.

Sorry for the noise if you've been Cc-ed and don't care, but my goal
is to gather feedback on what exactly is expected from GPU drivers
exporting their CPU-cacheable buffers, and why so many drivers get
away with no dma_buf_ops::xxx_cpu_access() hooks, or very simple
ones that don't cover the cases I described above.

Changes in v2:
- Expose the coherency so userspace can know when it should skip cache
  maintenance
- Hook things up at drm_gem_object_funcs level to dma-buf cpu_prep hooks
  can be implemented generically
- Revisit the semantics of the flags passed to gem_sync()
- Add BO_QUERY_INFO ioctls to query BO flags on imported objects and
  let the UMD know when cache maintenance is needed on those

Changes in v3:
- New patch to fix panthor_gpu_coherency_set()
- No other major changes, check each patch changelog for more details

Changes in v4:
- Two trivial fixes, check each patch changelog for more details

Changes in v5:
- Add a way to overload dma_buf_ops while still relying on the drm_prime
  boilerplate
- Add default shmem implementation for
  dma_buf_ops::{begin,end}_cpu_access()
- Provide custom dma_buf_ops to deal with CPU cache flushes around CPU
  accesses when the BO is CPU-cacheable
- Go back to a version of drm_gem_shmem_sync() that only deals with
  cache maintenance, and adjust the semantics to make it clear this is
  the only thing it cares about
- Adjust the BO_SYNC ioctls according to the new drm_gem_shmem_sync()
  semantics

Boris Brezillon (10):
  drm/prime: Simplify life of drivers needing custom dma_buf_ops
  drm/shmem: Provide a generic {begin,end}_cpu_access() implementation
  drm/panthor: Provide a custom dma_buf implementation
  drm/panthor: Fix panthor_gpu_coherency_set()
  drm/panthor: Expose the selected coherency protocol to the UMD
  drm/panthor: Add a PANTHOR_BO_SYNC ioctl
  drm/panthor: Add an ioctl to query BO flags
  drm/panfrost: Provide a custom dma_buf implementation
  drm/panfrost: Expose the selected coherency protocol to the UMD
  drm/panfrost: Add an ioctl to query BO flags

Faith Ekstrand (5):
  drm/shmem: Add a drm_gem_shmem_sync() helper
  drm/panthor: Bump the driver version to 1.6
  drm/panfrost: Add a PANFROST_SYNC_BO ioctl
  drm/panfrost: Add flag to map GEM object Write-Back Cacheable
  drm/panfrost: Bump the driver version to 1.6

Loïc Molinari (1):
  drm/panthor: Add flag to map GEM object Write-Back Cacheable

 drivers/gpu/drm/drm_gem_shmem_helper.c     | 207 +++++++++++++++++++++
 drivers/gpu/drm/drm_prime.c                |  14 +-
 drivers/gpu/drm/panfrost/panfrost_device.h |   1 +
 drivers/gpu/drm/panfrost/panfrost_drv.c    |  98 +++++++++-
 drivers/gpu/drm/panfrost/panfrost_gem.c    |  73 ++++++++
 drivers/gpu/drm/panfrost/panfrost_gem.h    |   9 +
 drivers/gpu/drm/panfrost/panfrost_gpu.c    |  26 ++-
 drivers/gpu/drm/panfrost/panfrost_regs.h   |  10 +-
 drivers/gpu/drm/panthor/panthor_device.c   |  10 +-
 drivers/gpu/drm/panthor/panthor_drv.c      |  80 +++++++-
 drivers/gpu/drm/panthor/panthor_gem.c      |  77 +++++++-
 drivers/gpu/drm/panthor/panthor_gem.h      |   6 +
 drivers/gpu/drm/panthor/panthor_gpu.c      |   2 +-
 drivers/gpu/drm/panthor/panthor_sched.c    |  18 +-
 include/drm/drm_drv.h                      |   8 +
 include/drm/drm_gem_shmem_helper.h         |  24 +++
 include/uapi/drm/panfrost_drm.h            |  76 +++++++-
 include/uapi/drm/panthor_drm.h             | 160 +++++++++++++++-
 18 files changed, 875 insertions(+), 24 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2025-11-04  8:08 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-30 14:05 [PATCH v5 00/16] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 01/16] drm/prime: Simplify life of drivers needing custom dma_buf_ops Boris Brezillon
2025-10-30 14:25   ` Christian König
2025-10-30 14:35     ` Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation Boris Brezillon
2025-10-30 14:31   ` [PATCH v5 02/16] drm/shmem: Provide a generic {begin,end}_cpu_access() implementation Christian König
2025-11-04  8:08     ` Boris Brezillon
2025-11-03 20:34   ` [PATCH v5 02/16] drm/shmem: Provide a generic {begin, end}_cpu_access() implementation Akash Goel
2025-11-04  7:42     ` Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 03/16] drm/shmem: Add a drm_gem_shmem_sync() helper Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 04/16] drm/panthor: Provide a custom dma_buf implementation Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 05/16] drm/panthor: Fix panthor_gpu_coherency_set() Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 06/16] drm/panthor: Expose the selected coherency protocol to the UMD Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 07/16] drm/panthor: Add a PANTHOR_BO_SYNC ioctl Boris Brezillon
2025-10-31  7:25   ` Marcin Ślusarz
2025-11-03 20:42   ` Akash Goel
2025-11-04  7:41     ` Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 08/16] drm/panthor: Add an ioctl to query BO flags Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 09/16] drm/panthor: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
2025-11-03 16:43   ` Akash Goel
2025-11-03 17:13     ` Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 10/16] drm/panthor: Bump the driver version to 1.6 Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 11/16] drm/panfrost: Provide a custom dma_buf implementation Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 12/16] drm/panfrost: Expose the selected coherency protocol to the UMD Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 13/16] drm/panfrost: Add a PANFROST_SYNC_BO ioctl Boris Brezillon
2025-10-31  7:08   ` Marcin Ślusarz
2025-10-31  8:49     ` Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 14/16] drm/panfrost: Add an ioctl to query BO flags Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 15/16] drm/panfrost: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
2025-10-30 14:05 ` [PATCH v5 16/16] drm/panfrost: Bump the driver version to 1.6 Boris Brezillon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).