From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 43B15CCD199 for ; Fri, 17 Oct 2025 16:35:13 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9463C10ECA3; Fri, 17 Oct 2025 16:35:12 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=collabora.com header.i=@collabora.com header.b="BtOkqvX7"; dkim-atps=neutral Received: from bali.collaboradmins.com (bali.collaboradmins.com [148.251.105.195]) by gabe.freedesktop.org (Postfix) with ESMTPS id 60EA310EC88 for ; Fri, 17 Oct 2025 16:35:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1760718909; bh=m4n8KlJ9ZvdgC6ZXvYXyuYACea8cfYiGk2EFxV0MhCQ=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=BtOkqvX73ysXYWRElrlYes2fe16UerGoduY6uiuPivJPdf135fVoIJLhaC9PDRomh e1Dpw5W1ULTfd/z9rufBTTCQgjuQUfFWf+DcziID1NtBTy+8rwksFA78KwzipNXIBn IahKBtWqsbTWHGxDYPyFAy50jOmQBy9q8L+lgLbfo24fnVJwZ/ETLfR0ht6yPLGN3s 3xJeC8/zK2alaI3J3Y5oIbRbH3U1UYX/REJs27mYmbHwcE2FTX7BE6ejN+PU8ToaK6 iwgFeg9nBVnpCWxE8mUy025edYNPopchYw/OnuDAG+uC3oAnXT5araWznmoqQU1WUX 3mqtT9FrWCmBQ== Received: from fedora (unknown [IPv6:2a01:e0a:2c:6930:d919:a6e:5ea1:8a9f]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by bali.collaboradmins.com (Postfix) with ESMTPSA id 6A72217E0125; Fri, 17 Oct 2025 18:35:09 +0200 (CEST) Date: Fri, 17 Oct 2025 18:35:05 +0200 From: Boris Brezillon To: Faith Ekstrand Cc: Steven Price , dri-devel@lists.freedesktop.org, Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Simona Vetter , Faith Ekstrand , kernel@collabora.com, Matt Coster Subject: Re: [PATCH v4 02/14] drm/gem: Add a drm_gem_object_funcs::sync() and a drm_gem_sync() helper Message-ID: <20251017183505.063a831a@fedora> In-Reply-To: References: <20251015160326.3657287-1-boris.brezillon@collabora.com> <20251015160326.3657287-3-boris.brezillon@collabora.com> <20251017172657.2690bbca@fedora> <20251017175008.3ac142c7@fedora> Organization: Collabora X-Mailer: Claws Mail 4.3.1 (GTK 3.24.49; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Fri, 17 Oct 2025 11:57:08 -0400 Faith Ekstrand wrote: > On Fri, Oct 17, 2025 at 11:50=E2=80=AFAM Boris Brezillon > wrote: > > > > +Matt for my comment on PVR having the same issue. > > > > On Fri, 17 Oct 2025 11:35:54 -0400 > > Faith Ekstrand wrote: > > =20 > > > On Fri, Oct 17, 2025 at 11:27=E2=80=AFAM Boris Brezillon > > > wrote: =20 > > > > > > > > On Fri, 17 Oct 2025 10:40:46 -0400 > > > > Faith Ekstrand wrote: > > > > =20 > > > > > On Fri, Oct 17, 2025 at 10:32=E2=80=AFAM Faith Ekstrand wrote: =20 > > > > > > > > > > > > On Wed, Oct 15, 2025 at 12:04=E2=80=AFPM Boris Brezillon > > > > > > wrote: =20 > > > > > > > > > > > > > > Prepare things for standardizing synchronization around CPU a= ccesses > > > > > > > of GEM buffers. This will be used to provide default > > > > > > > drm_gem_dmabuf_{begin,end}_cpu_access() implementations, and = provide > > > > > > > a way for drivers to add their own ioctls to synchronize CPU > > > > > > > writes/reads when they can't do it directly from userland. > > > > > > > > > > > > > > v2: > > > > > > > - New commit > > > > > > > > > > > > > > v3: > > > > > > > - No changes > > > > > > > > > > > > > > v4: > > > > > > > - Add Steve's R-b > > > > > > > > > > > > > > Signed-off-by: Boris Brezillon > > > > > > > Reviewed-by: Steven Price > > > > > > > --- > > > > > > > drivers/gpu/drm/drm_gem.c | 10 +++++++++ > > > > > > > include/drm/drm_gem.h | 45 +++++++++++++++++++++++++++++= ++++++++++ > > > > > > > 2 files changed, 55 insertions(+) > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_= gem.c > > > > > > > index a1a9c828938b..a1431e4f2404 100644 > > > > > > > --- a/drivers/gpu/drm/drm_gem.c > > > > > > > +++ b/drivers/gpu/drm/drm_gem.c > > > > > > > @@ -1333,6 +1333,16 @@ void drm_gem_vunmap(struct drm_gem_obj= ect *obj, struct iosys_map *map) > > > > > > > } > > > > > > > EXPORT_SYMBOL(drm_gem_vunmap); > > > > > > > > > > > > > > +int drm_gem_sync(struct drm_gem_object *obj, size_t offset, = size_t size, > > > > > > > + enum drm_gem_object_access_flags access) > > > > > > > +{ > > > > > > > + if (obj->funcs->sync) > > > > > > > + return obj->funcs->sync(obj, offset, size, ac= cess); > > > > > > > + > > > > > > > + return 0; > > > > > > > +} > > > > > > > +EXPORT_SYMBOL(drm_gem_sync); > > > > > > > + > > > > > > > /** > > > > > > > * drm_gem_lock_reservations - Sets up the ww context and ac= quires > > > > > > > * the lock on an array of GEM objects. > > > > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h > > > > > > > index 8d48d2af2649..1c33e59ab305 100644 > > > > > > > --- a/include/drm/drm_gem.h > > > > > > > +++ b/include/drm/drm_gem.h > > > > > > > @@ -66,6 +66,33 @@ enum drm_gem_object_status { > > > > > > > DRM_GEM_OBJECT_ACTIVE =3D BIT(2), > > > > > > > }; > > > > > > > > > > > > > > +/** > > > > > > > + * enum drm_gem_object_status - bitmask describing GEM acces= s types to prepare for > > > > > > > + */ > > > > > > > +enum drm_gem_object_access_flags { > > > > > > > + /** @DRM_GEM_OBJECT_CPU_ACCESS: Prepare for a CPU acc= ess. */ > > > > > > > + DRM_GEM_OBJECT_CPU_ACCESS =3D 0, > > > > > > > + > > > > > > > + /** @DRM_GEM_OBJECT_DEV_ACCESS: Prepare for a device = access. */ > > > > > > > + DRM_GEM_OBJECT_DEV_ACCESS =3D BIT(0), > > > > > > > + > > > > > > > + /** @DRM_GEM_OBJECT_ACCESSOR_MASK: Mask used to check= the entity doing the access. */ > > > > > > > + DRM_GEM_OBJECT_ACCESSOR_MASK =3D BIT(0), > > > > > > > + > > > > > > > + /** @DRM_GEM_OBJECT_READ_ACCESS: Prepare for read-onl= y accesses. */ > > > > > > > + DRM_GEM_OBJECT_READ_ACCESS =3D BIT(1), > > > > > > > + > > > > > > > + /** @DRM_GEM_OBJECT_WRITE_ACCESS: Prepare for write-o= nly accesses. */ > > > > > > > + DRM_GEM_OBJECT_WRITE_ACCESS =3D BIT(2), > > > > > > > + > > > > > > > + /** @DRM_GEM_OBJECT_RW_ACCESS: Prepare for a read/wri= te accesses. */ > > > > > > > + DRM_GEM_OBJECT_RW_ACCESS =3D DRM_GEM_OBJECT_READ_ACCE= SS | > > > > > > > + DRM_GEM_OBJECT_WRITE_ACCES= S, > > > > > > > + > > > > > > > + /** @DRM_GEM_OBJECT_ACCESS_TYPE_MASK: Mask used to ch= eck the access type. */ > > > > > > > + DRM_GEM_OBJECT_ACCESS_TYPE_MASK =3D DRM_GEM_OBJECT_RW= _ACCESS, > > > > > > > +}; > > > > > > > + > > > > > > > /** > > > > > > > * struct drm_gem_object_funcs - GEM object functions > > > > > > > */ > > > > > > > @@ -191,6 +218,21 @@ struct drm_gem_object_funcs { > > > > > > > */ > > > > > > > int (*mmap)(struct drm_gem_object *obj, struct vm_are= a_struct *vma); > > > > > > > > > > > > > > + /** > > > > > > > + * @sync: > > > > > > > + * > > > > > > > + * Prepare for CPU/device access. This can involve mi= gration of > > > > > > > + * a buffer to the system-RAM/VRAM, or for UMA, flush= ing/invalidating > > > > > > > + * the CPU caches. The range can be used to optimize = the synchronization > > > > > > > + * when possible. =20 > > > > > > > > > > > > This has gone in a very different direction from the version I = sent > > > > > > out and the added generality makes me really nervous. The idea = of sync > > > > > > involving migration and that the range is a mere hint are antit= hetical > > > > > > with Vulkan. It's a very GLish design that assumes that a BO is > > > > > > exclusively used by one of the CPU or the GPU at the same time.= This > > > > > > simply isn't the case in modern APIs. Older DRM uAPIs (as well = as > > > > > > dma-buf itself) are littered with such ioctls and we're in the = process > > > > > > of deleting them all. =20 > > > > > > > > > > And yes, I realize I sent this on the patch for the hook which you > > > > > intended to plumb through to dma-buf. However, I also saw it being > > > > > propagated to an ioctl and I didn't know where else to put it tha= t had > > > > > the relevant details. > > > > > > > > > > ~Faith > > > > > =20 > > > > > > If the BO needs to be migrated in order to be accessed from the= CPU, > > > > > > that needs to happen on map, not on some sort of begin/end. Or = better > > > > > > yet, just disallow mapping such buffers. Once the client has a = map, > > > > > > they are free to access from the CPU while stuff is running on = the > > > > > > GPU. They have to be careful, of course, not to cause data race= s, but > > > > > > accessing the same BO from the CPU and GPU or even the same ran= ge is > > > > > > totally okay if you aren't racing. > > > > > > > > > > > > As a corollary, just don't map PRIME buffers. > > > > > > > > > > > > And the range really shouldn't be just a hint. With Vulkan, cli= ents > > > > > > are regularly sub-allocating from larger memory objects. If the= y ask > > > > > > to flush 64B and end up flushing 64M, that's pretty bad. > > > > > > > > > > > > All we need is something which lets us trap through to the kern= el for > > > > > > CPU cache management. That's all we need and that's really all = it > > > > > > should do. =20 > > > > > > > > Okay, so there's actually a problem with that I think, because we c= an't > > > > know how the buffer we export will be used. It can be imported by t= he > > > > same driver, and we're all good, but it can also be imported by a > > > > different driver, which decides to vmap or allow mmap() on it, and = then > > > > we have to implement the dma_buf CPU sync hooks. Unless we decide t= hat > > > > all exported buffers should be write-combine only? This is the very > > > > reason I started hooking things up on the dma_buf side, because we'= re > > > > not in control of who the importer of our buffers is. =20 > > > > > > Exported buffers should be WC-only. We could try to get creative but > > > the moment we let the lack of coherency leak to other processes, > > > potentially to other drivers, we're in a world of hurt. Even with the > > > dma-buf begin/end hooks, if it's imported into a driver that does > > > Vulkan, those hooks don't make sense and we're screwed. =20 > > > > Well, yeah, the 'entire-buf' granularity is a problem, indeed, =20 >=20 > If all that's being done is cache flushing, it's kind of okay. It's a > big hammer but it's okay. However, if the driver is doing literally > anything else in begin/end, it's all a lie the moment you allow that > buffer to be mapped in Vulkan. The client may be reading from the CPU > for GPU download in one subrange, writing from the CPU for GPU upload > in another subrange, and reading/writing from the GPU in another > subrange all at the same time. That's totally incompatible with the > dma-buf begin/end model. >=20 > > and > > there's no way around it with the current dma-buf API, which is why I > > prevented external bufs from being mapped (AKA not host-visible in > > Vulkan's term). But I really thought imported buffers coming from > > panthor should be mapped cached. =20 >=20 > Yeah. So we could potentially allow WB maps if the client requests an > export via OPAQUE_FD since we know a priori that such a buffer will > only ever be re-imported into panfrost/panvk. However, I really don't > think that's a common enough case to be worth optimizing just yet. >=20 > > > And, yes, I > > > know panvk is the only Vulkan implementation you're going to see on a > > > system with an Arm GPU, but thinking about things in the general case > > > across all of DRM, exporting non-coherent memory in 2025 is just > > > cursed. =20 > > > > Hm, okay. If that's the way to go, then we should enforce > > > > !WB_MMAP || !PRIVATE > > > > in panthor, and fail the export of a WB_MMAP buffer in panfrost (we > > don't have a way to know that a buffer is private there). =20 >=20 > Yeah, I'm a fan of that for now. Might also require more changes when the GPU is IO-coherent, because IO-coherency is a per-device thing on Arm. Right now we forcibly map things WB even if the user didn't ask for it when the GPU is coherent, but the importer might not be IO-coherent. I think it's fine in Panthor because we can properly define the shareability/cachebility attributes, but on older Mali gens, I remember hitting some issues. For instance, we had issues when mapping things uncached on some IO-coherent amlogic integration of the G52. I've managed to disable coherency on my VIM3 with a few hacks (switching to the new page table format was one of them, but I'm not sure what I did is portable to older gens that only support the old page table format), but I'd have to make sure this can be done on a per mapping basis. Steve probably knows more about that. TLDR; maybe we should focus on Panthor first, so we don't block this MR on some changes breaking old HW in Panfrost.