All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Faith Ekstrand <faith@gfxstrand.net>
Cc: Steven Price <steven.price@arm.com>,
	dri-devel@lists.freedesktop.org,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>,
	Faith Ekstrand <faith.ekstrand@collabora.com>,
	kernel@collabora.com, Matt Coster <Matt.Coster@imgtec.com>
Subject: Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper
Date: Fri, 17 Oct 2025 18:35:05 +0200	[thread overview]
Message-ID: <20251017183505.063a831a@fedora> (raw)
In-Reply-To: <CAOFGe96n+XZC_Hu7NBahA8Zw206FYXmJMC_hKfbZ2cqu453oWA@mail.gmail.com>

On Fri, 17 Oct 2025 11:57:08 -0400
Faith Ekstrand <faith@gfxstrand.net> wrote:

> On Fri, Oct 17, 2025 at 11:50 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > +Matt for my comment on PVR having the same issue.
> >
> > On Fri, 17 Oct 2025 11:35:54 -0400
> > Faith Ekstrand <faith@gfxstrand.net> wrote:
> >  
> > > On Fri, Oct 17, 2025 at 11:27 AM Boris Brezillon
> > > <boris.brezillon@collabora.com> wrote:  
> > > >
> > > > On Fri, 17 Oct 2025 10:40:46 -0400
> > > > Faith Ekstrand <faith@gfxstrand.net> wrote:
> > > >  
> > > > > On Fri, Oct 17, 2025 at 10:32 AM Faith Ekstrand <faith@gfxstrand.net> wrote:  
> > > > > >
> > > > > > On Wed, Oct 15, 2025 at 12:04 PM Boris Brezillon
> > > > > > <boris.brezillon@collabora.com> wrote:  
> > > > > > >
> > > > > > > Prepare things for standardizing synchronization around CPU accesses
> > > > > > > of GEM buffers. This will be used to provide default
> > > > > > > drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and provide
> > > > > > > a way for drivers to add their own ioctls to synchronize CPU
> > > > > > > writes/reads when they can't do it directly from userland.
> > > > > > >
> > > > > > > v2:
> > > > > > > - New commit
> > > > > > >
> > > > > > > v3:
> > > > > > > - No changes
> > > > > > >
> > > > > > > v4:
> > > > > > > - Add Steve's R-b
> > > > > > >
> > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > > > > > Reviewed-by: Steven Price <steven.price@arm.com>
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/drm_gem.c | 10 +++++++++
> > > > > > >  include/drm/drm_gem.h     | 45 +++++++++++++++++++++++++++++++++++++++
> > > > > > >  2 files changed, 55 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> > > > > > > index a1a9c828938b..a1431e4f2404 100644
> > > > > > > --- a/drivers/gpu/drm/drm_gem.c
> > > > > > > +++ b/drivers/gpu/drm/drm_gem.c
> > > > > > > @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
> > > > > > >  }
> > > > > > >  EXPORT_SYMBOL(drm_gem_vunmap);
> > > > > > >
> > > > > > > +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, size_t size,
> > > > > > > +                enum drm_gem_object_access_flags access)
> > > > > > > +{
> > > > > > > +       if (obj->funcs->sync)
> > > > > > > +               return obj->funcs->sync(obj, offset, size, access);
> > > > > > > +
> > > > > > > +       return 0;
> > > > > > > +}
> > > > > > > +EXPORT_SYMBOL(drm_gem_sync);
> > > > > > > +
> > > > > > >  /**
> > > > > > >   * drm_gem_lock_reservations - Sets up the ww context and acquires
> > > > > > >   * the lock on an array of GEM objects.
> > > > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > > > > index 8d48d2af2649..1c33e59ab305 100644
> > > > > > > --- a/include/drm/drm_gem.h
> > > > > > > +++ b/include/drm/drm_gem.h
> > > > > > > @@ -66,6 +66,33 @@ enum drm_gem_object_status {
> > > > > > >         DRM_GEM_OBJECT_ACTIVE    = BIT(2),
> > > > > > >  };
> > > > > > >
> > > > > > > +/**
> > > > > > > + * enum drm_gem_object_status - bitmask describing GEM access types to prepare for
> > > > > > > + */
> > > > > > > +enum drm_gem_object_access_flags {
> > > > > > > +       /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU access. */
> > > > > > > +       DRM_GEM_OBJECT_CPU_ACCESS = 0,
> > > > > > > +
> > > > > > > +       /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device access. */
> > > > > > > +       DRM_GEM_OBJECT_DEV_ACCESS = BIT(0),
> > > > > > > +
> > > > > > > +       /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check the entity doing the access. */
> > > > > > > +       DRM_GEM_OBJECT_ACCESSOR_MASK = BIT(0),
> > > > > > > +
> > > > > > > +       /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-only accesses. */
> > > > > > > +       DRM_GEM_OBJECT_READ_ACCESS = BIT(1),
> > > > > > > +
> > > > > > > +       /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-only accesses. */
> > > > > > > +       DRM_GEM_OBJECT_WRITE_ACCESS = BIT(2),
> > > > > > > +
> > > > > > > +       /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/write accesses. */
> > > > > > > +       DRM_GEM_OBJECT_RW_ACCESS = DRM_GEM_OBJECT_READ_ACCESS |
> > > > > > > +                                  DRM_GEM_OBJECT_WRITE_ACCESS,
> > > > > > > +
> > > > > > > +       /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to check the access type. */
> > > > > > > +       DRM_GEM_OBJECT_ACCESS_TYPE_MASK = DRM_GEM_OBJECT_RW_ACCESS,
> > > > > > > +};
> > > > > > > +
> > > > > > >  /**
> > > > > > >   * struct drm_gem_object_funcs - GEM object functions
> > > > > > >   */
> > > > > > > @@ -191,6 +218,21 @@ struct drm_gem_object_funcs {
> > > > > > >          */
> > > > > > >         int (*mmap)(struct drm_gem_object *obj, struct vm_area_struct *vma);
> > > > > > >
> > > > > > > +       /**
> > > > > > > +        * @sync:
> > > > > > > +        *
> > > > > > > +        * Prepare for CPU/device access. This can involve migration of
> > > > > > > +        * a buffer to the system-RAM/VRAM, or for UMA, flushing/invalidating
> > > > > > > +        * the CPU caches. The range can be used to optimize the synchronization
> > > > > > > +        * when possible.  
> > > > > >
> > > > > > This has gone in a very different direction from the version I sent
> > > > > > out and the added generality makes me really nervous. The idea of sync
> > > > > > involving migration and that the range is a mere hint are antithetical
> > > > > > with Vulkan. It's a very GLish design that assumes that a BO is
> > > > > > exclusively used by one of the CPU or the GPU at the same time. This
> > > > > > simply isn't the case in modern APIs. Older DRM uAPIs (as well as
> > > > > > dma-buf itself) are littered with such ioctls and we're in the process
> > > > > > of deleting them all.  
> > > > >
> > > > > And yes, I realize I sent this on the patch for the hook which you
> > > > > intended to plumb through to dma-buf. However, I also saw it being
> > > > > propagated to an ioctl and I didn't know where else to put it that had
> > > > > the relevant details.
> > > > >
> > > > > ~Faith
> > > > >  
> > > > > > If the BO needs to be migrated in order to be accessed from the CPU,
> > > > > > that needs to happen on map, not on some sort of begin/end. Or better
> > > > > > yet, just disallow mapping such buffers. Once the client has a map,
> > > > > > they are free to access from the CPU while stuff is running on the
> > > > > > GPU. They have to be careful, of course, not to cause data races, but
> > > > > > accessing the same BO from the CPU and GPU or even the same range is
> > > > > > totally okay if you aren't racing.
> > > > > >
> > > > > > As a corollary, just don't map PRIME buffers.
> > > > > >
> > > > > > And the range really shouldn't be just a hint. With Vulkan, clients
> > > > > > are regularly sub-allocating from larger memory objects. If they ask
> > > > > > to flush 64B and end up flushing 64M, that's pretty bad.
> > > > > >
> > > > > > All we need is something which lets us trap through to the kernel for
> > > > > > CPU cache management. That's all we need and that's really all it
> > > > > > should do.  
> > > >
> > > > Okay, so there's actually a problem with that I think, because we can't
> > > > know how the buffer we export will be used. It can be imported by the
> > > > same driver, and we're all good, but it can also be imported by a
> > > > different driver, which decides to vmap or allow mmap() on it, and then
> > > > we have to implement the dma_buf CPU sync hooks. Unless we decide that
> > > > all exported buffers should be write-combine only? This is the very
> > > > reason I started hooking things up on the dma_buf side, because we're
> > > > not in control of who the importer of our buffers is.  
> > >
> > > Exported buffers should be WC-only. We could try to get creative but
> > > the moment we let the lack of coherency leak to other processes,
> > > potentially to other drivers, we're in a world of hurt. Even with the
> > > dma-buf begin/end hooks, if it's imported into a driver that does
> > > Vulkan, those hooks don't make sense and we're screwed.  
> >
> > Well, yeah, the 'entire-buf' granularity is a problem, indeed,  
> 
> If all that's being done is cache flushing, it's kind of okay. It's a
> big hammer but it's okay. However, if the driver is doing literally
> anything else in begin/end, it's all a lie the moment you allow that
> buffer to be mapped in Vulkan. The client may be reading from the CPU
> for GPU download in one subrange, writing from the CPU for GPU upload
> in another subrange, and reading/writing from the GPU in another
> subrange all at the same time. That's totally incompatible with the
> dma-buf begin/end model.
> 
> > and
> > there's no way around it with the current dma-buf API, which is why I
> > prevented external bufs from being mapped (AKA not host-visible in
> > Vulkan's term). But I really thought imported buffers coming from
> > panthor should be mapped cached.  
> 
> Yeah. So we could potentially allow WB maps if the client requests an
> export via OPAQUE_FD since we know a priori that such a buffer will
> only ever be re-imported into panfrost/panvk. However, I really don't
> think that's a common enough case to be worth optimizing just yet.
> 
> > > And, yes, I
> > > know panvk is the only Vulkan implementation you're going to see on a
> > > system with an Arm GPU, but thinking about things in the general case
> > > across all of DRM, exporting non-coherent memory in 2025 is just
> > > cursed.  
> >
> > Hm, okay. If that's the way to go, then we should enforce
> >
> >         !WB_MMAP || !PRIVATE
> >
> > in panthor, and fail the export of a WB_MMAP buffer in panfrost (we
> > don't have a way to know that a buffer is private there).  
> 
> Yeah, I'm a fan of that for now.

Might also require more changes when the GPU is IO-coherent, because
IO-coherency is a per-device thing on Arm. Right now we forcibly map
things WB even if the user didn't ask for it when the GPU is coherent,
but the importer might not be IO-coherent. I think it's fine in Panthor
because we can properly define the shareability/cachebility attributes,
but on older Mali gens, I remember hitting some issues. For instance, we
had issues when mapping things uncached on some IO-coherent amlogic
integration of the G52. I've managed to disable coherency on my VIM3
with a few hacks (switching to the new page table format was one of
them, but I'm not sure what I did is portable to older gens that only
support the old page table format), but I'd have to make sure this can
be done on a per mapping basis. Steve probably knows more about that.

TLDR; maybe we should focus on Panthor first, so we don't block this MR
on some changes breaking old HW in Panfrost.

  reply	other threads:[~2025-10-17 16:35 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-15 16:03 [PATCH v4 00/14] drm/panfrost, panthor: Cached maps and explicit flushing Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 01/14] drm/panthor: Fix panthor_gpu_coherency_set() Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper Boris Brezillon
2025-10-16  8:13   ` Maxime Ripard
2025-10-16 12:57     ` Boris Brezillon
2025-10-16 13:07       ` Boris Brezillon
2025-10-29  9:30         ` Maxime Ripard
2025-10-30 14:07           ` Boris Brezillon
2025-10-16  8:32   ` Thomas Zimmermann
2025-10-16  9:40     ` Boris Brezillon
2025-10-16  9:58       ` Thomas Zimmermann
2025-10-16 10:32         ` Boris Brezillon
2025-10-16 11:42           ` Thomas Zimmermann
2025-10-16 12:24             ` Boris Brezillon
2025-10-30 14:11             ` Boris Brezillon
2025-10-17 14:32   ` Faith Ekstrand
2025-10-17 14:40     ` Faith Ekstrand
2025-10-17 15:26       ` Boris Brezillon
2025-10-17 15:35         ` Faith Ekstrand
2025-10-17 15:50           ` Boris Brezillon
2025-10-17 15:57             ` Faith Ekstrand
2025-10-17 16:35               ` Boris Brezillon [this message]
2025-10-15 16:03 ` [PATCH v4 03/14] drm/prime: Provide default ::{begin, end}_cpu_access() implementations Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 04/14] drm/shmem: Add a drm_gem_shmem_sync() helper Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 05/14] drm/panthor: Expose the selected coherency protocol to the UMD Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 06/14] drm/panthor: Add a PANTHOR_BO_SYNC ioctl Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 07/14] drm/panthor: Add an ioctl to query BO flags Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 08/14] drm/panthor: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 09/14] drm/panthor: Bump the driver version to 1.6 Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 10/14] drm/panfrost: Expose the selected coherency protocol to the UMD Boris Brezillon
2025-10-15 16:06   ` Steven Price
2025-10-15 16:03 ` [PATCH v4 11/14] drm/panfrost: Add a PANFROST_SYNC_BO ioctl Boris Brezillon
2025-10-16  8:42   ` Marcin Ślusarz
2025-10-16  9:52     ` Boris Brezillon
2025-10-16 10:02       ` Marcin Ślusarz
2025-10-16 10:35         ` Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 12/14] drm/panfrost: Add an ioctl to query BO flags Boris Brezillon
2025-10-15 16:03 ` [PATCH v4 13/14] drm/panfrost: Add flag to map GEM object Write-Back Cacheable Boris Brezillon
2025-10-15 16:06   ` Steven Price
2025-10-15 16:03 ` [PATCH v4 14/14] drm/panfrost: Bump the driver version to 1.6 Boris Brezillon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251017183505.063a831a@fedora \
    --to=boris.brezillon@collabora.com \
    --cc=Matt.Coster@imgtec.com \
    --cc=airlied@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=faith@gfxstrand.net \
    --cc=kernel@collabora.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=simona@ffwll.ch \
    --cc=steven.price@arm.com \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.