On Thu, Oct 17, 2024 at 09:59:10AM +0200, Christian König wrote:
Purpose of this implementation is to provide drivers a generic way to
recover with the help of userspace intervention. Different drivers may
have different ideas of a "wedged device" depending on their hardware
implementation, and hence the vendor agnostic nature of the event.
It is up to the drivers to decide when they see the need for recovery
and how they want to recover from the available methods.
Current implementation defines three recovery methods, out of which,
drivers can choose to support any one or multiple of them. Preferred
recovery method will be sent in the uevent environment as WEDGED=<method>.
Userspace consumers (sysadmin) can define udev rules to parse this event
and take respective action to recover the device.
=============== ==================================
Recovery method Consumer expectations
=============== ==================================
rebind unbind + rebind driver
bus-reset unbind + reset bus device + rebind
reboot reboot system
=============== ==================================
Well that sounds like userspace would need to be involved in recovery.
That in turn is a complete no-go since we at least need to signal all
dma_fences to unblock the kernel. In other words things like bus reset needs
to happen inside the kernel and *not* in userspace.
What we can do is to signal to userspace: Hey a bus reset of device X
happened, maybe restart container, daemon, whatever service which was using
this device.
Well, when we declare device 'wedged' it is because we don't want to take
any drastic measures inside the kernel and want to leave it in a protected
and unusable state. In a way that users wouldn't lose display for instance,
or at least the device is in a debugable state.
Then, the instructions here is to tell what could possibly be attempted
from userspace to get the device to an usable state.
The 'wedge' mode (the one emiting this uevent) needs to be responsible
for signaling all the fences and everything needed for a clean unbind
and whatever next step might be indicated to userspace.
That should already be part of any wedged mode, regardless the uevent
to inform the userspace here.