All of lore.kernel.org
 help / color / mirror / Atom feed
* simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
       [not found] <9215788.EvYhyI6sBW.ref@nerdopolis2>
@ 2024-05-09 13:06 ` nerdopolis
  2024-05-10  7:29   ` Pekka Paalanen
                     ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: nerdopolis @ 2024-05-09 13:06 UTC (permalink / raw)
  To: dri-devel

[-- Attachment #1: Type: text/plain, Size: 1555 bytes --]

Hi

So I have been made aware of an apparent race condition of some drivers taking a bit longer to load, which could lead to a possible race condition of display servers/greeters using the simpledrm device, and then experiencing problems once the real driver loads, the simpledrm device that the display servers are using as their primary GPU goes away. 

For example Weston crashes, Xorg crashes, wlroots seems to stay running, but doesn't draw anything on the screen, kwin aborts, 
This is if you boot on a QEMU machine with the virtio card, with modprobe.blacklist=virtio_gpu, and then, when the display server is running, run sudo modprobe virtio-gpu

Namely, it's been recently reported here: https://github.com/sddm/sddm/issues/1917[1] and here https://github.com/systemd/systemd/issues/32509[2]

My thinking: Instead of simpledrm's /dev/dri/card0 device going away when the real driver loads, is it possible for simpledrm to instead simulate an unplug of the fake display/CRTC?
That way in theory, the simpledrm device will now be useless for drawing for drawing to the screen at that point, since the real driver is now taken over, but this way here, at least the display server doesn't lose its handles to the /dev/dri/card0 device, (and then maybe only remove itself once the final handle to it closes?)

Is something like this possible to do with the way simpledrm works with the low level video memory? Or is this not possible?

Thanks

--------
[1] https://github.com/sddm/sddm/issues/1917
[2] https://github.com/systemd/systemd/issues/32509

[-- Attachment #2: Type: text/html, Size: 2371 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
  2024-05-09 13:06 ` simpledrm, running display servers, and drivers replacing simpledrm while the display server is running nerdopolis
@ 2024-05-10  7:29   ` Pekka Paalanen
  2024-05-10  7:32   ` Thomas Zimmermann
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Pekka Paalanen @ 2024-05-10  7:29 UTC (permalink / raw)
  To: nerdopolis; +Cc: dri-devel

[-- Attachment #1: Type: text/plain, Size: 3264 bytes --]

On Thu, 09 May 2024 09:06:29 -0400
nerdopolis <bluescreen_avenger@verizon.net> wrote:

> Hi
> 
> So I have been made aware of an apparent race condition of some
> drivers taking a bit longer to load, which could lead to a possible
> race condition of display servers/greeters using the simpledrm
> device, and then experiencing problems once the real driver loads,
> the simpledrm device that the display servers are using as their
> primary GPU goes away. 
> 
> For example Weston crashes, Xorg crashes, wlroots seems to stay
> running, but doesn't draw anything on the screen, kwin aborts, This
> is if you boot on a QEMU machine with the virtio card, with
> modprobe.blacklist=virtio_gpu, and then, when the display server is
> running, run sudo modprobe virtio-gpu
> 
> Namely, it's been recently reported here:
> https://github.com/sddm/sddm/issues/1917[1] and here
> https://github.com/systemd/systemd/issues/32509[2]
> 
> My thinking: Instead of simpledrm's /dev/dri/card0 device going away
> when the real driver loads, is it possible for simpledrm to instead
> simulate an unplug of the fake display/CRTC? That way in theory, the
> simpledrm device will now be useless for drawing for drawing to the
> screen at that point, since the real driver is now taken over, but
> this way here, at least the display server doesn't lose its handles
> to the /dev/dri/card0 device, (and then maybe only remove itself once
> the final handle to it closes?)
> 
> Is something like this possible to do with the way simpledrm works
> with the low level video memory? Or is this not possible?

Hi,

what you describe sounds similar to what has been agreed that drivers
should implement:
https://docs.kernel.org/gpu/drm-uapi.html#device-hot-unplug

That would be the first step. Then display servers would need fixing to
handle the hot-unplug. Then they would need to handle hot-plug of the
new DRM devices and ideally migrate to GPU accelerated compositing in
order to support GPU accelerated applications.

Simpledrm is not a GPU driver, and I assume that in the case you
describe, the GPU driver comes up later, just like the
hardware-specific display driver. Any userspace that initialized with
simpledrm will be using software rendering. Ideally if a hardware
rendering GPU driver turns up later and is usable with the displays,
userspace would migrate to that.

Essentially this is a display/GPU device switch. In general that's a
big problem, needing all applications to be able to handle a GPU
disappearing and another GPU appearing, and not die in between. For
the simpledrm case it is easier, because the migration is from no GPU
to a maybe GPU. So applications like Wayland clients could stay alive
as-is, they just don't use a GPU until they restart.

The problem is making display servers handle this switch of display
devices and a GPU hotplug. Theoretically I believe it is doable. E.g.
Weston used to be able to migrate from pixman-renderer to GL-renderer,
but I suspect it is lacking support for hot-unplug of the "main" DRM
display device.


Thanks,
pq

> Thanks
> 
> --------
> [1] https://github.com/sddm/sddm/issues/1917
> [2] https://github.com/systemd/systemd/issues/32509


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
  2024-05-09 13:06 ` simpledrm, running display servers, and drivers replacing simpledrm while the display server is running nerdopolis
  2024-05-10  7:29   ` Pekka Paalanen
@ 2024-05-10  7:32   ` Thomas Zimmermann
  2024-05-10  9:49     ` Jonas Ådahl
  2024-05-10  7:36   ` Javier Martinez Canillas
  2024-05-21 10:06   ` Jani Nikula
  3 siblings, 1 reply; 10+ messages in thread
From: Thomas Zimmermann @ 2024-05-10  7:32 UTC (permalink / raw)
  To: nerdopolis, dri-devel

Hi

Am 09.05.24 um 15:06 schrieb nerdopolis:
>
> Hi
>
>
> So I have been made aware of an apparent race condition of some 
> drivers taking a bit longer to load, which could lead to a possible 
> race condition of display servers/greeters using the simpledrm device, 
> and then experiencing problems once the real driver loads, the 
> simpledrm device that the display servers are using as their primary 
> GPU goes away.
>
>
> For example Weston crashes, Xorg crashes, wlroots seems to stay 
> running, but doesn't draw anything on the screen, kwin aborts,
>
> This is if you boot on a QEMU machine with the virtio card, with 
> modprobe.blacklist=virtio_gpu, and then, when the display server is 
> running, run sudo modprobe virtio-gpu
>
>
> Namely, it's been recently reported here: 
> https://github.com/sddm/sddm/issues/1917 and here 
> https://github.com/systemd/systemd/issues/32509
>
>
> My thinking: Instead of simpledrm's /dev/dri/card0 device going away 
> when the real driver loads, is it possible for simpledrm to instead 
> simulate an unplug of the fake display/CRTC?
>

To my knowledge, there's no hotplugging for CRTCs.

> That way in theory, the simpledrm device will now be useless for 
> drawing for drawing to the screen at that point, since the real driver 
> is now taken over, but this way here, at least the display server 
> doesn't lose its handles to the /dev/dri/card0 device, (and then maybe 
> only remove itself once the final handle to it closes?)
>
>
> Is something like this possible to do with the way simpledrm works 
> with the low level video memory? Or is this not possible?
>

Userspace needs to be prepared that graphics devices can do hotplugging. 
The correct solution is to make compositors work without graphics devices.

The next best solution is to keep the final DRM device open until a new 
one shows up. All DRM graphics drivers with hotplugging support are 
required to accept commands after their hardware has been unplugged. 
They simply won't display anything.

Best regards
Thomas


>
> Thanks
>

-- 
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
  2024-05-09 13:06 ` simpledrm, running display servers, and drivers replacing simpledrm while the display server is running nerdopolis
  2024-05-10  7:29   ` Pekka Paalanen
  2024-05-10  7:32   ` Thomas Zimmermann
@ 2024-05-10  7:36   ` Javier Martinez Canillas
  2024-05-21 10:06   ` Jani Nikula
  3 siblings, 0 replies; 10+ messages in thread
From: Javier Martinez Canillas @ 2024-05-10  7:36 UTC (permalink / raw)
  To: nerdopolis, dri-devel

nerdopolis <bluescreen_avenger@verizon.net> writes:

Hello,

> Hi
>
> So I have been made aware of an apparent race condition of some drivers taking a bit longer to load, which could lead to a possible race condition of display servers/greeters using the simpledrm device, and then experiencing problems once the real driver loads, the simpledrm device that the display servers are using as their primary GPU goes away. 
>

Plymouth also had this issue and that is the reason why simpledrm is not
treated as a KMS device by default (unless plymouth.use-simpledrm used).

> For example Weston crashes, Xorg crashes, wlroots seems to stay running, but doesn't draw anything on the screen, kwin aborts, 
> This is if you boot on a QEMU machine with the virtio card, with modprobe.blacklist=virtio_gpu, and then, when the display server is running, run sudo modprobe virtio-gpu
>
> Namely, it's been recently reported here: https://github.com/sddm/sddm/issues/1917[1] and here https://github.com/systemd/systemd/issues/32509[2]
>
> My thinking: Instead of simpledrm's /dev/dri/card0 device going away when the real driver loads, is it possible for simpledrm to instead simulate an unplug of the fake display/CRTC?
> That way in theory, the simpledrm device will now be useless for drawing for drawing to the screen at that point, since the real driver is now taken over, but this way here, at least the display server doesn't lose its handles to the /dev/dri/card0 device, (and then maybe only remove itself once the final handle to it closes?)
>
> Is something like this possible to do with the way simpledrm works with the low level video memory? Or is this not possible?
>

How it works is that when a native DRM driver is probed, it calls to the
drm_aperture_remove_conflicting_framebuffers() to kick out the generic
system framebuffer video drivers and the aperture infrastructure does a
device (e.g: "simple-framebuffer", "efi-framebuffer", etc) unregistration.

So is not only that the /dev/dri/card0 devnode is unregistered but that the
underlaying platform device bound to the simpledrm/efifb/vesafb/simplefb
drivers are unregistered, and this leads to the drivers being unregistered
as well by the Linux device model infrastructure.

But also, this seems to be user-space bugs for me and doing anything in
the kernel is papering over the real problem IMO.

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
  2024-05-10  7:32   ` Thomas Zimmermann
@ 2024-05-10  9:49     ` Jonas Ådahl
  2024-05-10 12:45       ` Thomas Zimmermann
  0 siblings, 1 reply; 10+ messages in thread
From: Jonas Ådahl @ 2024-05-10  9:49 UTC (permalink / raw)
  To: Thomas Zimmermann; +Cc: nerdopolis, dri-devel

On Fri, May 10, 2024 at 09:32:02AM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 09.05.24 um 15:06 schrieb nerdopolis:
> > 
> > Hi
> > 
> > 
> > So I have been made aware of an apparent race condition of some drivers
> > taking a bit longer to load, which could lead to a possible race
> > condition of display servers/greeters using the simpledrm device, and
> > then experiencing problems once the real driver loads, the simpledrm
> > device that the display servers are using as their primary GPU goes
> > away.
> > 
> > 
> > For example Weston crashes, Xorg crashes, wlroots seems to stay running,
> > but doesn't draw anything on the screen, kwin aborts,
> > 
> > This is if you boot on a QEMU machine with the virtio card, with
> > modprobe.blacklist=virtio_gpu, and then, when the display server is
> > running, run sudo modprobe virtio-gpu
> > 
> > 
> > Namely, it's been recently reported here:
> > https://github.com/sddm/sddm/issues/1917 and here
> > https://github.com/systemd/systemd/issues/32509
> > 
> > 
> > My thinking: Instead of simpledrm's /dev/dri/card0 device going away
> > when the real driver loads, is it possible for simpledrm to instead
> > simulate an unplug of the fake display/CRTC?
> > 
> 
> To my knowledge, there's no hotplugging for CRTCs.
> 
> > That way in theory, the simpledrm device will now be useless for drawing
> > for drawing to the screen at that point, since the real driver is now
> > taken over, but this way here, at least the display server doesn't lose
> > its handles to the /dev/dri/card0 device, (and then maybe only remove
> > itself once the final handle to it closes?)
> > 
> > 
> > Is something like this possible to do with the way simpledrm works with
> > the low level video memory? Or is this not possible?
> > 
> 
> Userspace needs to be prepared that graphics devices can do hotplugging. The
> correct solution is to make compositors work without graphics devices.

(This was discussed on #dri-devel, but I'll reiterate here as well).

There are two problems at hand; one is the race condition during boot
when the login screen (or whatever display server appears first) is
launched with simpledrm, only some moments later having the real GPU
driver appear.

The other is general purpose GPU hotplugging, including the unplugging
the GPU decided by the compositor to be the primary one.

The latter is something that should be handled in userspace, by
compositors, etc, I agree.

The former, however, is not properly solved by userspace learning how to
deal with primary GPU unplugging and switching to using a real GPU
driver, as it'd break the booting and login experience.

When it works, i.e. the race condition is not hit, is this:

 * System boots
 * Plymouth shows a "splash" screen
 * The login screen display server is launched with the real GPU driver
 * The login screen interface is smoothly animating using hardware
   accelerating, presenting "advanced" graphical content depending on
   hardware capabilities (e.g. high color bit depth, HDR, and so on)

If the race condition is hit, with a compositor supporting primary GPU
hotplugging, it'll work like this:

 * System boots
 * Plymouth shows a "splash" screen
 * The login screen display server is launched with simpledrm
 * Due to using simpldrm, the login screen interface is not animated and
   just plops up, and no "advanced" graphical content is enabled due to
   apparent missing hardware capabilities
 * The real GPU driver appears, the login screen now starts to become
   animated, and may suddenly change appearance due to capabilties
   having changed

Thus, by just supporting hotplugging the primary GPU in userspace, we'll
still end up with a glitchy boot experience, and it forces userspace to
add things like sleep(10) to work around this.

In other words, fixing userspace is *not* a correct solution to the
problem, it's a work around (albeit a behaivor we want for other
reasons) for the race condition.

Arguably, the only place a more educated guess about whether to wait or
not, and if so how long, is the kernel.


Jonas

> 
> The next best solution is to keep the final DRM device open until a new one
> shows up. All DRM graphics drivers with hotplugging support are required to
> accept commands after their hardware has been unplugged. They simply won't
> display anything.
> 
> Best regards
> Thomas
> 
> 
> > 
> > Thanks
> > 
> 
> -- 
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Frankenstrasse 146, 90461 Nuernberg, Germany
> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> HRB 36809 (AG Nuernberg)
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
  2024-05-10  9:49     ` Jonas Ådahl
@ 2024-05-10 12:45       ` Thomas Zimmermann
  2024-05-10 13:11         ` Jonas Ådahl
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Zimmermann @ 2024-05-10 12:45 UTC (permalink / raw)
  To: Jonas Ådahl; +Cc: nerdopolis, dri-devel

Hi

> (This was discussed on #dri-devel, but I'll reiterate here as well).
>
> There are two problems at hand; one is the race condition during boot
> when the login screen (or whatever display server appears first) is
> launched with simpledrm, only some moments later having the real GPU
> driver appear.
>
> The other is general purpose GPU hotplugging, including the unplugging
> the GPU decided by the compositor to be the primary one.

The situation of booting with simpledrm (problem 2) is a special case of 
problem 1. From the kernel's perspective, unloading simpledrm is the 
same as what you call general purpose GPU hotplugging. Even through 
there is not a full GPU, but a trivial scanout buffer. In userspace, you 
see the same sequence of events as in the general case.

>
> The latter is something that should be handled in userspace, by
> compositors, etc, I agree.
>
> The former, however, is not properly solved by userspace learning how to
> deal with primary GPU unplugging and switching to using a real GPU
> driver, as it'd break the booting and login experience.
>
> When it works, i.e. the race condition is not hit, is this:
>
>   * System boots
>   * Plymouth shows a "splash" screen
>   * The login screen display server is launched with the real GPU driver
>   * The login screen interface is smoothly animating using hardware
>     accelerating, presenting "advanced" graphical content depending on
>     hardware capabilities (e.g. high color bit depth, HDR, and so on)
>
> If the race condition is hit, with a compositor supporting primary GPU
> hotplugging, it'll work like this:
>
>   * System boots
>   * Plymouth shows a "splash" screen
>   * The login screen display server is launched with simpledrm
>   * Due to using simpldrm, the login screen interface is not animated and
>     just plops up, and no "advanced" graphical content is enabled due to
>     apparent missing hardware capabilities
>   * The real GPU driver appears, the login screen now starts to become
>     animated, and may suddenly change appearance due to capabilties
>     having changed
>
> Thus, by just supporting hotplugging the primary GPU in userspace, we'll
> still end up with a glitchy boot experience, and it forces userspace to
> add things like sleep(10) to work around this.
>
> In other words, fixing userspace is *not* a correct solution to the
> problem, it's a work around (albeit a behaivor we want for other
> reasons) for the race condition.

To really fix the flickering, you need to read the old DRM device's 
atomic state and apply it to the new device. Then tell the desktop and 
applications to re-init their rendering stack.

Depending on the DRM driver and its hardware, it might be possible to do 
this without flickering. The key is to not loose the original scanout 
buffer, while not probing the new device driver. But that needs work in 
each individual DRM driver.

>
> Arguably, the only place a more educated guess about whether to wait or
> not, and if so how long, is the kernel.

As I said before, driver modules come and go and hardware devices come 
and go.

To detect if there might be a native driver waiting to be loaded, you 
can test for

- 'nomodeset' on the command line -> no native driver
- 'systemd-load-modules' not started -> maybe wait
- look for drivers under /lib/modules/<version>/kernel/drivers/gpu/drm/ 
-> maybe wait
- maybe udev can tell you more
- it might for detection help that recently simpledrm devices refer to 
their parent PCI device
- maybe systemd tracks the probed devices

Best regards
Thomas

>
>
> Jonas
>
>> The next best solution is to keep the final DRM device open until a new one
>> shows up. All DRM graphics drivers with hotplugging support are required to
>> accept commands after their hardware has been unplugged. They simply won't
>> display anything.
>>
>> Best regards
>> Thomas
>>
>>
>>> Thanks
>>>
>> -- 
>> --
>> Thomas Zimmermann
>> Graphics Driver Developer
>> SUSE Software Solutions Germany GmbH
>> Frankenstrasse 146, 90461 Nuernberg, Germany
>> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
>> HRB 36809 (AG Nuernberg)
>>

-- 
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
  2024-05-10 12:45       ` Thomas Zimmermann
@ 2024-05-10 13:11         ` Jonas Ådahl
  2024-05-20  2:25           ` nerdopolis
  2024-05-21 12:52           ` Daniel Vetter
  0 siblings, 2 replies; 10+ messages in thread
From: Jonas Ådahl @ 2024-05-10 13:11 UTC (permalink / raw)
  To: Thomas Zimmermann; +Cc: nerdopolis, dri-devel

On Fri, May 10, 2024 at 02:45:48PM +0200, Thomas Zimmermann wrote:
> Hi
> 
> > (This was discussed on #dri-devel, but I'll reiterate here as well).
> > 
> > There are two problems at hand; one is the race condition during boot
> > when the login screen (or whatever display server appears first) is
> > launched with simpledrm, only some moments later having the real GPU
> > driver appear.
> > 
> > The other is general purpose GPU hotplugging, including the unplugging
> > the GPU decided by the compositor to be the primary one.
> 
> The situation of booting with simpledrm (problem 2) is a special case of
> problem 1. From the kernel's perspective, unloading simpledrm is the same as
> what you call general purpose GPU hotplugging. Even through there is not a
> full GPU, but a trivial scanout buffer. In userspace, you see the same
> sequence of events as in the general case.

Sure, in a way it is, but the consequence and frequency of occurence is
quite different, so I think it makes sense to think of them as different
problems, since they need different solutions. One is about fixing
userspace components support for arbitrary hotplugging, the other for
mitigating the race condition that caused this discussion to begin with.

> 
> > 
> > The latter is something that should be handled in userspace, by
> > compositors, etc, I agree.
> > 
> > The former, however, is not properly solved by userspace learning how to
> > deal with primary GPU unplugging and switching to using a real GPU
> > driver, as it'd break the booting and login experience.
> > 
> > When it works, i.e. the race condition is not hit, is this:
> > 
> >   * System boots
> >   * Plymouth shows a "splash" screen
> >   * The login screen display server is launched with the real GPU driver
> >   * The login screen interface is smoothly animating using hardware
> >     accelerating, presenting "advanced" graphical content depending on
> >     hardware capabilities (e.g. high color bit depth, HDR, and so on)
> > 
> > If the race condition is hit, with a compositor supporting primary GPU
> > hotplugging, it'll work like this:
> > 
> >   * System boots
> >   * Plymouth shows a "splash" screen
> >   * The login screen display server is launched with simpledrm
> >   * Due to using simpldrm, the login screen interface is not animated and
> >     just plops up, and no "advanced" graphical content is enabled due to
> >     apparent missing hardware capabilities
> >   * The real GPU driver appears, the login screen now starts to become
> >     animated, and may suddenly change appearance due to capabilties
> >     having changed
> > 
> > Thus, by just supporting hotplugging the primary GPU in userspace, we'll
> > still end up with a glitchy boot experience, and it forces userspace to
> > add things like sleep(10) to work around this.
> > 
> > In other words, fixing userspace is *not* a correct solution to the
> > problem, it's a work around (albeit a behaivor we want for other
> > reasons) for the race condition.
> 
> To really fix the flickering, you need to read the old DRM device's atomic
> state and apply it to the new device. Then tell the desktop and applications
> to re-init their rendering stack.
> 
> Depending on the DRM driver and its hardware, it might be possible to do
> this without flickering. The key is to not loose the original scanout
> buffer, while not probing the new device driver. But that needs work in each
> individual DRM driver.

This doesn't sound like it'll fix any flickering as I describe them.
First, the loss of initial animation when the login interface appears is
not something one can "fix", since it has already happened.

Avoiding flickering when switching to the new driver is only possible
if one limits oneself to what simpledrm was capable of doing, i.e. no
HDR signaling etc.

> 
> > 
> > Arguably, the only place a more educated guess about whether to wait or
> > not, and if so how long, is the kernel.
> 
> As I said before, driver modules come and go and hardware devices come and
> go.
> 
> To detect if there might be a native driver waiting to be loaded, you can
> test for
> 
> - 'nomodeset' on the command line -> no native driver

Makes sense to not wait here, and just assume simpledrm forever.

> - 'systemd-load-modules' not started -> maybe wait
> - look for drivers under /lib/modules/<version>/kernel/drivers/gpu/drm/ ->
> maybe wait

I suspect this is not useful for general purpose distributions. I have
43 kernel GPU modules there, on a F40 installation.

> - maybe udev can tell you more
> - it might for detection help that recently simpledrm devices refer to their
> parent PCI device
> - maybe systemd tracks the probed devices

If the kernel already plumbs enough state so userspace components can
make a decent decision, instead of just sleeping for an arbitrary amount
of time, then great. This is to some degree what
https://github.com/systemd/systemd/issues/32509 is about.


Jonas

> 
> Best regards
> Thomas
> 
> > 
> > 
> > Jonas
> > 
> > > The next best solution is to keep the final DRM device open until a new one
> > > shows up. All DRM graphics drivers with hotplugging support are required to
> > > accept commands after their hardware has been unplugged. They simply won't
> > > display anything.
> > > 
> > > Best regards
> > > Thomas
> > > 
> > > 
> > > > Thanks
> > > > 
> > > -- 
> > > --
> > > Thomas Zimmermann
> > > Graphics Driver Developer
> > > SUSE Software Solutions Germany GmbH
> > > Frankenstrasse 146, 90461 Nuernberg, Germany
> > > GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> > > HRB 36809 (AG Nuernberg)
> > > 
> 
> -- 
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Frankenstrasse 146, 90461 Nuernberg, Germany
> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> HRB 36809 (AG Nuernberg)
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
  2024-05-10 13:11         ` Jonas Ådahl
@ 2024-05-20  2:25           ` nerdopolis
  2024-05-21 12:52           ` Daniel Vetter
  1 sibling, 0 replies; 10+ messages in thread
From: nerdopolis @ 2024-05-20  2:25 UTC (permalink / raw)
  To: Thomas Zimmermann, dri-devel, Jonas Ådahl

[-- Attachment #1: Type: text/plain, Size: 6210 bytes --]

On Friday, May 10, 2024 9:11:13 AM EDT Jonas Ådahl wrote:
> On Fri, May 10, 2024 at 02:45:48PM +0200, Thomas Zimmermann wrote:
> > Hi
> > 
> > > (This was discussed on #dri-devel, but I'll reiterate here as well).
> > > 
> > > There are two problems at hand; one is the race condition during boot
> > > when the login screen (or whatever display server appears first) is
> > > launched with simpledrm, only some moments later having the real GPU
> > > driver appear.
> > > 
> > > The other is general purpose GPU hotplugging, including the unplugging
> > > the GPU decided by the compositor to be the primary one.
> > 
> > The situation of booting with simpledrm (problem 2) is a special case of
> > problem 1. From the kernel's perspective, unloading simpledrm is the same as
> > what you call general purpose GPU hotplugging. Even through there is not a
> > full GPU, but a trivial scanout buffer. In userspace, you see the same
> > sequence of events as in the general case.
> 
> Sure, in a way it is, but the consequence and frequency of occurence is
> quite different, so I think it makes sense to think of them as different
> problems, since they need different solutions. One is about fixing
> userspace components support for arbitrary hotplugging, the other for
> mitigating the race condition that caused this discussion to begin with.
> 
> > 
> > > 
> > > The latter is something that should be handled in userspace, by
> > > compositors, etc, I agree.
> > > 
> > > The former, however, is not properly solved by userspace learning how to
> > > deal with primary GPU unplugging and switching to using a real GPU
> > > driver, as it'd break the booting and login experience.
> > > 
> > > When it works, i.e. the race condition is not hit, is this:
> > > 
> > >   * System boots
> > >   * Plymouth shows a "splash" screen
> > >   * The login screen display server is launched with the real GPU driver
> > >   * The login screen interface is smoothly animating using hardware
> > >     accelerating, presenting "advanced" graphical content depending on
> > >     hardware capabilities (e.g. high color bit depth, HDR, and so on)
> > > 
> > > If the race condition is hit, with a compositor supporting primary GPU
> > > hotplugging, it'll work like this:
> > > 
> > >   * System boots
> > >   * Plymouth shows a "splash" screen
> > >   * The login screen display server is launched with simpledrm
> > >   * Due to using simpldrm, the login screen interface is not animated and
> > >     just plops up, and no "advanced" graphical content is enabled due to
> > >     apparent missing hardware capabilities
> > >   * The real GPU driver appears, the login screen now starts to become
> > >     animated, and may suddenly change appearance due to capabilties
> > >     having changed
> > > 
> > > Thus, by just supporting hotplugging the primary GPU in userspace, we'll
> > > still end up with a glitchy boot experience, and it forces userspace to
> > > add things like sleep(10) to work around this.
> > > 
> > > In other words, fixing userspace is *not* a correct solution to the
> > > problem, it's a work around (albeit a behaivor we want for other
> > > reasons) for the race condition.
> > 
> > To really fix the flickering, you need to read the old DRM device's atomic
> > state and apply it to the new device. Then tell the desktop and applications
> > to re-init their rendering stack.
> > 
> > Depending on the DRM driver and its hardware, it might be possible to do
> > this without flickering. The key is to not loose the original scanout
> > buffer, while not probing the new device driver. But that needs work in each
> > individual DRM driver.
> 
> This doesn't sound like it'll fix any flickering as I describe them.
> First, the loss of initial animation when the login interface appears is
> not something one can "fix", since it has already happened.
> 
I feel like whatever animations that a login screen has though is going to be 
in the realm of a fade-in animation, or maybe a sliding animation though, or 
one of those that are more on the simple side.

llvmpipe should be good enough for animations like that these days I would 
think, right? Or is it really bad on very very old CPUs, like say a Pentium III?
> Avoiding flickering when switching to the new driver is only possible
> if one limits oneself to what simpledrm was capable of doing, i.e. no
> HDR signaling etc.
> 
> > 
> > > 
> > > Arguably, the only place a more educated guess about whether to wait or
> > > not, and if so how long, is the kernel.
> > 
> > As I said before, driver modules come and go and hardware devices come and
> > go.
> > 
> > To detect if there might be a native driver waiting to be loaded, you can
> > test for
> > 
> > - 'nomodeset' on the command line -> no native driver
> 
> Makes sense to not wait here, and just assume simpledrm forever.
> 
> > - 'systemd-load-modules' not started -> maybe wait
> > - look for drivers under /lib/modules/<version>/kernel/drivers/gpu/drm/ ->
> > maybe wait
> 
> I suspect this is not useful for general purpose distributions. I have
> 43 kernel GPU modules there, on a F40 installation.
> 
> > - maybe udev can tell you more
> > - it might for detection help that recently simpledrm devices refer to their
> > parent PCI device
> > - maybe systemd tracks the probed devices
> 
> If the kernel already plumbs enough state so userspace components can
> make a decent decision, instead of just sleeping for an arbitrary amount
> of time, then great. This is to some degree what
> https://github.com/systemd/systemd/issues/32509 is about.
> 
> 
> Jonas
> 
> > 
> > Best regards
> > Thomas
> > 
> > > 
> > > 
> > > Jonas
> > > 
> > > > The next best solution is to keep the final DRM device open until a new one
> > > > shows up. All DRM graphics drivers with hotplugging support are required to
> > > > accept commands after their hardware has been unplugged. They simply won't
> > > > display anything.
> > > > 
> > > > Best regards
> > > > Thomas
> > > > 
> > > > 
> > > > > Thanks
> > > > > 
> > 
> 
> 



[-- Attachment #2: Type: text/html, Size: 18223 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
  2024-05-09 13:06 ` simpledrm, running display servers, and drivers replacing simpledrm while the display server is running nerdopolis
                     ` (2 preceding siblings ...)
  2024-05-10  7:36   ` Javier Martinez Canillas
@ 2024-05-21 10:06   ` Jani Nikula
  3 siblings, 0 replies; 10+ messages in thread
From: Jani Nikula @ 2024-05-21 10:06 UTC (permalink / raw)
  To: nerdopolis, dri-devel
  Cc: Pekka Paalanen, Thomas Zimmermann, Jonas Ådahl,
	Javier Martinez Canillas

On Thu, 09 May 2024, nerdopolis <bluescreen_avenger@verizon.net> wrote:
> Hi
>
> So I have been made aware of an apparent race condition of some drivers taking a bit longer to load, which could lead to a possible race condition of display servers/greeters using the simpledrm device, and then experiencing problems once the real driver loads, the simpledrm device that the display servers are using as their primary GPU goes away. 
>
> For example Weston crashes, Xorg crashes, wlroots seems to stay running, but doesn't draw anything on the screen, kwin aborts, 
> This is if you boot on a QEMU machine with the virtio card, with modprobe.blacklist=virtio_gpu, and then, when the display server is running, run sudo modprobe virtio-gpu
>
> Namely, it's been recently reported here: https://github.com/sddm/sddm/issues/1917[1] and here https://github.com/systemd/systemd/issues/32509[2]
>
> My thinking: Instead of simpledrm's /dev/dri/card0 device going away when the real driver loads, is it possible for simpledrm to instead simulate an unplug of the fake display/CRTC?
> That way in theory, the simpledrm device will now be useless for drawing for drawing to the screen at that point, since the real driver is now taken over, but this way here, at least the display server doesn't lose its handles to the /dev/dri/card0 device, (and then maybe only remove itself once the final handle to it closes?)
>
> Is something like this possible to do with the way simpledrm works with the low level video memory? Or is this not possible?

Related [1][2].

BR,
Jani.


[1] https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10133
[2] https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11158


>
> Thanks
>
> --------
> [1] https://github.com/sddm/sddm/issues/1917
> [2] https://github.com/systemd/systemd/issues/32509

-- 
Jani Nikula, Intel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: simpledrm, running display servers, and drivers replacing simpledrm while the display server is running
  2024-05-10 13:11         ` Jonas Ådahl
  2024-05-20  2:25           ` nerdopolis
@ 2024-05-21 12:52           ` Daniel Vetter
  1 sibling, 0 replies; 10+ messages in thread
From: Daniel Vetter @ 2024-05-21 12:52 UTC (permalink / raw)
  To: Jonas Ådahl; +Cc: Thomas Zimmermann, nerdopolis, dri-devel

On Fri, May 10, 2024 at 03:11:13PM +0200, Jonas Ådahl wrote:
> On Fri, May 10, 2024 at 02:45:48PM +0200, Thomas Zimmermann wrote:
> > Hi
> > 
> > > (This was discussed on #dri-devel, but I'll reiterate here as well).
> > > 
> > > There are two problems at hand; one is the race condition during boot
> > > when the login screen (or whatever display server appears first) is
> > > launched with simpledrm, only some moments later having the real GPU
> > > driver appear.
> > > 
> > > The other is general purpose GPU hotplugging, including the unplugging
> > > the GPU decided by the compositor to be the primary one.
> > 
> > The situation of booting with simpledrm (problem 2) is a special case of
> > problem 1. From the kernel's perspective, unloading simpledrm is the same as
> > what you call general purpose GPU hotplugging. Even through there is not a
> > full GPU, but a trivial scanout buffer. In userspace, you see the same
> > sequence of events as in the general case.
> 
> Sure, in a way it is, but the consequence and frequency of occurence is
> quite different, so I think it makes sense to think of them as different
> problems, since they need different solutions. One is about fixing
> userspace components support for arbitrary hotplugging, the other for
> mitigating the race condition that caused this discussion to begin with.

We're trying to document the hotunplug consensus here:

https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#device-hot-unplug

And yes hotunplug is really rough on userspace, but if that doesn't work,
we need to discuss what should be done instead in general. I agree with
Thomas that simpledrm really isn't special in that regard.

> > > The latter is something that should be handled in userspace, by
> > > compositors, etc, I agree.
> > > 
> > > The former, however, is not properly solved by userspace learning how to
> > > deal with primary GPU unplugging and switching to using a real GPU
> > > driver, as it'd break the booting and login experience.
> > > 
> > > When it works, i.e. the race condition is not hit, is this:
> > > 
> > >   * System boots
> > >   * Plymouth shows a "splash" screen
> > >   * The login screen display server is launched with the real GPU driver
> > >   * The login screen interface is smoothly animating using hardware
> > >     accelerating, presenting "advanced" graphical content depending on
> > >     hardware capabilities (e.g. high color bit depth, HDR, and so on)
> > > 
> > > If the race condition is hit, with a compositor supporting primary GPU
> > > hotplugging, it'll work like this:
> > > 
> > >   * System boots
> > >   * Plymouth shows a "splash" screen
> > >   * The login screen display server is launched with simpledrm
> > >   * Due to using simpldrm, the login screen interface is not animated and
> > >     just plops up, and no "advanced" graphical content is enabled due to
> > >     apparent missing hardware capabilities
> > >   * The real GPU driver appears, the login screen now starts to become
> > >     animated, and may suddenly change appearance due to capabilties
> > >     having changed
> > > 
> > > Thus, by just supporting hotplugging the primary GPU in userspace, we'll
> > > still end up with a glitchy boot experience, and it forces userspace to
> > > add things like sleep(10) to work around this.
> > > 
> > > In other words, fixing userspace is *not* a correct solution to the
> > > problem, it's a work around (albeit a behaivor we want for other
> > > reasons) for the race condition.
> > 
> > To really fix the flickering, you need to read the old DRM device's atomic
> > state and apply it to the new device. Then tell the desktop and applications
> > to re-init their rendering stack.
> > 
> > Depending on the DRM driver and its hardware, it might be possible to do
> > this without flickering. The key is to not loose the original scanout
> > buffer, while not probing the new device driver. But that needs work in each
> > individual DRM driver.
> 
> This doesn't sound like it'll fix any flickering as I describe them.
> First, the loss of initial animation when the login interface appears is
> not something one can "fix", since it has already happened.
> 
> Avoiding flickering when switching to the new driver is only possible
> if one limits oneself to what simpledrm was capable of doing, i.e. no
> HDR signaling etc.

As long as you use the atomic ioctls (I think at least) and the real
driver has full atomic state takeover support (only i915 to my knowledge),
and your userspace doesn't unecessarily mess with the display state when
it takes over a new driver, then that should lead to flicker free boot
even across a simpledrm->real driver takeover.

If your userspace doesn't crash&burn ofc :-)

But it's a real steep ask of all components to get this right.

> > > Arguably, the only place a more educated guess about whether to wait or
> > > not, and if so how long, is the kernel.
> > 
> > As I said before, driver modules come and go and hardware devices come and
> > go.
> > 
> > To detect if there might be a native driver waiting to be loaded, you can
> > test for
> > 
> > - 'nomodeset' on the command line -> no native driver
> 
> Makes sense to not wait here, and just assume simpledrm forever.
> 
> > - 'systemd-load-modules' not started -> maybe wait
> > - look for drivers under /lib/modules/<version>/kernel/drivers/gpu/drm/ ->
> > maybe wait
> 
> I suspect this is not useful for general purpose distributions. I have
> 43 kernel GPU modules there, on a F40 installation.
> 
> > - maybe udev can tell you more
> > - it might for detection help that recently simpledrm devices refer to their
> > parent PCI device
> > - maybe systemd tracks the probed devices
> 
> If the kernel already plumbs enough state so userspace components can
> make a decent decision, instead of just sleeping for an arbitrary amount
> of time, then great. This is to some degree what
> https://github.com/systemd/systemd/issues/32509 is about.

I think you can't avoid the timeout entirely for the use-case where the
user has disable the real driver by not compiling it, and simpledrm would
be the only driver you'll ever get.

But that's just not going to happen on any default distro setup, so I
think it's ok if it sucks a bit.

Cheers, Sima

> 
> 
> Jonas
> 
> > 
> > Best regards
> > Thomas
> > 
> > > 
> > > 
> > > Jonas
> > > 
> > > > The next best solution is to keep the final DRM device open until a new one
> > > > shows up. All DRM graphics drivers with hotplugging support are required to
> > > > accept commands after their hardware has been unplugged. They simply won't
> > > > display anything.
> > > > 
> > > > Best regards
> > > > Thomas
> > > > 
> > > > 
> > > > > Thanks
> > > > > 
> > > > -- 
> > > > --
> > > > Thomas Zimmermann
> > > > Graphics Driver Developer
> > > > SUSE Software Solutions Germany GmbH
> > > > Frankenstrasse 146, 90461 Nuernberg, Germany
> > > > GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> > > > HRB 36809 (AG Nuernberg)
> > > > 
> > 
> > -- 
> > --
> > Thomas Zimmermann
> > Graphics Driver Developer
> > SUSE Software Solutions Germany GmbH
> > Frankenstrasse 146, 90461 Nuernberg, Germany
> > GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> > HRB 36809 (AG Nuernberg)
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-05-21 12:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <9215788.EvYhyI6sBW.ref@nerdopolis2>
2024-05-09 13:06 ` simpledrm, running display servers, and drivers replacing simpledrm while the display server is running nerdopolis
2024-05-10  7:29   ` Pekka Paalanen
2024-05-10  7:32   ` Thomas Zimmermann
2024-05-10  9:49     ` Jonas Ådahl
2024-05-10 12:45       ` Thomas Zimmermann
2024-05-10 13:11         ` Jonas Ådahl
2024-05-20  2:25           ` nerdopolis
2024-05-21 12:52           ` Daniel Vetter
2024-05-10  7:36   ` Javier Martinez Canillas
2024-05-21 10:06   ` Jani Nikula

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.