From: "Christian König" <christian.koenig@amd.com>
To: Xaver Hugl <xaver.hugl@kde.org>
Cc: "Julian Orth" <ju.orth@gmail.com>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Sumit Semwal" <sumit.semwal@linaro.org>,
"Jonathan Corbet" <corbet@lwn.net>,
"Shuah Khan" <skhan@linuxfoundation.org>,
"Arnd Bergmann" <arnd@arndb.de>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org,
linux-doc@vger.kernel.org, wayland-devel@lists.freedesktop.org,
"Michel Dänzer" <michel.daenzer@mailbox.org>
Subject: Re: [PATCH 00/12] misc/syncobj: add /dev/syncobj device
Date: Wed, 20 May 2026 16:06:40 +0200 [thread overview]
Message-ID: <c9fbfdaf-2a58-4423-8dc5-6e29a88f6293@amd.com> (raw)
In-Reply-To: <CAFZQkGxpPm081Fz8UtDuBA1PKD42+9YDA+cc6fbSpfawXwu9+g@mail.gmail.com>
On 5/20/26 14:33, Xaver Hugl wrote:
> Am Mi., 20. Mai 2026 um 10:08 Uhr schrieb Christian König
> <christian.koenig@amd.com>:
>> Well I would say the other way around is a pretty common use case.
>>
>> In other words the compositors uses the internal GPU for composing and displaying the picture. And the client uses the external GPU for fast rendering.
> Sure, but that's not what I'm talking about.
Yeah sorry for that, I wasn't sure if I misunderstood your use case because it's usually the other way around.
>>> - the buffers from the client stay valid
>>
>> Buffers from the hot plugged GPU don't stay valid. Accessing CPU mappings either result in a SIGBUS or are redirected to a dummy page.
> Again, not what I wrote about. The buffers are on the integrated GPU.
General rule of thumb is that as long as the exporter stays around the buffers stay around as well.
>>> - the syncobj stays valid on the client side
>>> - the syncobj becomes invalid on the compositor side
>>
>> Nope that's not correct. The syncobj itself stays valid even if you completely hot plug the device.
>>
>> It can just be that the fences inside the syncobj are terminated with an error.
> What about eventfd created for a point on the syncobj?
The eventfd unfortunately doesn't has error handling as far as I know, so when a fence signals with an error condition then the eventfd you only sees that it is signaled.
> Another (future) problem with hotplugs will be if the sync file hasn't
> materialized for the timeline point when the device is hotunplugged,
> since there can't be an error on the fence if there isn't one. Or
> could userspace somehow set an 'artificial' fence with an error in
> that case?
In general the answer is yes, userspace needs to take care of inserting fences when wait before signal is used and the work can not be submitted to the HW for some reason.
Currently we only have an IOCTL to insert the signaled dummy fence at some timeline sequence, but it should be trivial as well to insert a signaled fence with an error code.
But the compositor needs to be able to handle that case anyway, because it can be that a malicious or just buggy client just never inserts the fence.
So that a device is hot plugged is not different to just a client not inserting the fence in the first place.
>>> "invalid" there means either
>>> - the acquire point of the client is marked as signaled, before
>>> rendering on the client side is completed
>>> - the acquire point of the client is never signaled. Since the
>>> compositor waits for the acquire point, the Wayland surface is stuck
>>> forever
>>
>> Both of those would be a *massive* violation of documented kernel rules for hot-plugging which could lead to random data corruption and/or deadlocks.
>>
>> If you see any HW driver showing behavior like that please open up a bug report and ping the relevant maintainers immediately.
> If there are no error codes with syncobj yet, then to userspace, the
> latter behavior is exactly what we get, isn't it?
No, from userspace side you just see a signaled fence. It's just that you need to export the timeline point of the syncobj to a syncfile and then you can call the QUERY IOCTL on the syncfile to see the error code.
>> When a hotplug happens all operations of the device should return an -ENODEV error, even when exposed to other devices/application through syncobj or syncfile.
> Okay, that at least gives us a way to fail imports somewhat
> gracefully. Normally, failing to import a syncobj is a fatal error in
> the Wayland protocol.
So the task at hand would be to avoid importing the syncobj into a driver. That should be relatively trivial.
The only real problem I see is if you want to create a syncobj without having any device whatsoever.
>> One problem is that only syncfile allows for querying such error codes at the moment, we have patches pending to add that to syncobj as well but we lack a compositor with support for that as userspace client.
> As long as the error case can be detected with an eventfd,
Yeah that's the problem. The eventfd only tells you if the operation is completed (or at least has materialized).
To query the error you would need to ask the underlying syncobj or syncfile directly.
> implementing that in KWin shouldn't be a challenge.
>
>> Well the question here is if the device the compositor is using or the client is using is gone?
>>
>> If the client device is hot removed the compositor should be perfectly capable to import the syncobj.
>>
>> If the compositor device is gone then you don't have a device to display anything any more, so generating the next frame doesn't seem to make sense either.
>>
>> What could be is that you want the compositor to be kept alive even when the display device is gone to switch over to vkms or whatever so that a VNC session or other remote desktop still works.
> There are two GPUs in the example I gave. The compositor can use both
> for rendering (in cosmic-comp's case) or switch between them (what I'm
> trying to do with KWin), or use one device for rendering, and another
> for importing the syncobj.
Ah! I think I got the problem now. You basically want to avoid importing the syncobj because when the wrong device goes away you are busted.
The reason we didn't considered having the IOCTLs on the FD is because if you don't import them and instead keep them around you can run out file descriptors quite quickly.
When you have an use case where you receive an FD from the client and do a one shot conversion to an eventfd that will probably work, but for keeping them in the long run you need some kind of container for the syncobjs, don't you?
>>>>>>> 3. It removes the need to translate between syncobjs fds and handles.
>>>>>>
>>>>>> That's a pretty big no-go as well. The differentiation between FDs and handles is completely intentional.
>>>>> Could you expand on why it's needed? For compositors, the handle is
>>>>> just an intermediary thing when translating between file descriptors.
>>>>
>>>> Well what we could do is to add an IOCTL to directly attach an syncobj file descriptor to an eventfd.
>>> That would be nice.
>>
>> Take a look at drm_syncobj_file_fops and how drm_syncobj_add_eventfd() is used. Adding that functionality shouldn't be more than a typing exercise.
> Yeah, this patchset already adds that functionality (on the new device).
>
>> Do I see it right that this would already solve most problems in the compositor side?
> Skipping the syncobj handle step would only reduce the amounts of
> ioctls the compositor does, but afaict it wouldn't solve any
> compositor problems. At least not as long as it's still tied to a drm
> device.
Yeah, you need something like a syncobj container or dummy DRM device.
> For device hotplugs, the only new thing we need for correctly handling
> syncobj is a way to receive errors on the eventfd.
I need to look into the eventfd code, could be that this is somehow possible but it's clearly not something I used before.
> A device-independent way to create and use syncobj would still be
> useful to us though, both to simplify the compositor and to improve
> the software rendering use cases.
Yeah not sure how to cleanly do that. We could have a dummy /dev/dri/rendersync or something like that, but that would be quite a hack.
At least I understand the requirement now.
Thanks,
Christian.
>
> - Xaver
next prev parent reply other threads:[~2026-05-20 14:06 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-16 11:06 [PATCH 00/12] misc/syncobj: add /dev/syncobj device Julian Orth
2026-05-16 11:06 ` [PATCH 01/12] drm/syncobj: add drm_syncobj_from_fd Julian Orth
2026-05-16 11:06 ` [PATCH 02/12] drm/syncobj: add drm_syncobj_fence_lookup Julian Orth
2026-05-16 11:06 ` [PATCH 03/12] drm/syncobj: make drm_syncobj_array_wait_timeout public Julian Orth
2026-05-16 11:06 ` [PATCH 04/12] drm/syncobj: add drm_syncobj_register_eventfd Julian Orth
2026-05-16 11:06 ` [PATCH 05/12] drm/syncobj: have transfer functions accept drm_syncobj directly Julian Orth
2026-05-16 11:06 ` [PATCH 06/12] drm/syncobj: add drm_syncobj_transfer Julian Orth
2026-05-16 11:06 ` [PATCH 07/12] drm/syncobj: add drm_syncobj_timeline_signal Julian Orth
2026-05-16 11:06 ` [PATCH 08/12] drm/syncobj: add drm_syncobj_query Julian Orth
2026-05-16 11:06 ` [PATCH 09/12] drm/syncobj: fix resource leak in drm_syncobj_import_sync_file_fence Julian Orth
2026-05-19 8:22 ` Christian König
2026-05-16 11:06 ` [PATCH 10/12] drm/syncobj: add drm_syncobj_import_sync_file Julian Orth
2026-05-16 11:06 ` [PATCH 11/12] drm/syncobj: add drm_syncobj_export_sync_file Julian Orth
2026-05-16 11:06 ` [PATCH 12/12] misc/syncobj: add new device Julian Orth
2026-05-16 11:37 ` Greg Kroah-Hartman
2026-05-16 11:38 ` Greg Kroah-Hartman
2026-05-16 12:08 ` Julian Orth
2026-05-18 12:06 ` Christian König
2026-05-18 12:10 ` Julian Orth
2026-05-18 11:58 ` [PATCH 00/12] misc/syncobj: add /dev/syncobj device Christian König
2026-05-18 12:02 ` Julian Orth
2026-05-18 12:41 ` Christian König
2026-05-18 12:58 ` Julian Orth
2026-05-19 8:18 ` Christian König
2026-05-19 13:19 ` Julian Orth
2026-05-19 13:28 ` Christian König
2026-05-19 15:31 ` Xaver Hugl
2026-05-19 16:00 ` Christian König
2026-05-19 17:08 ` Xaver Hugl
2026-05-20 8:08 ` Christian König
2026-05-20 12:33 ` Xaver Hugl
2026-05-20 14:06 ` Christian König [this message]
2026-05-20 15:27 ` Xaver Hugl
2026-05-20 8:13 ` Michel Dänzer
2026-05-20 11:21 ` Christian König
2026-05-20 11:46 ` Julian Orth
2026-05-18 14:59 ` Michel Dänzer
2026-05-18 15:06 ` Christian König
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c9fbfdaf-2a58-4423-8dc5-6e29a88f6293@amd.com \
--to=christian.koenig@amd.com \
--cc=airlied@gmail.com \
--cc=arnd@arndb.de \
--cc=corbet@lwn.net \
--cc=dri-devel@lists.freedesktop.org \
--cc=gregkh@linuxfoundation.org \
--cc=ju.orth@gmail.com \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=michel.daenzer@mailbox.org \
--cc=mripard@kernel.org \
--cc=simona@ffwll.ch \
--cc=skhan@linuxfoundation.org \
--cc=sumit.semwal@linaro.org \
--cc=tzimmermann@suse.de \
--cc=wayland-devel@lists.freedesktop.org \
--cc=xaver.hugl@kde.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox