public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] Recover sysfb after DRM probe failure
@ 2025-12-29 21:58 Zack Rusin
  2025-12-29 21:58 ` [PATCH 01/12] video/aperture: Add sysfb restore on " Zack Rusin
                   ` (12 more replies)
  0 siblings, 13 replies; 28+ messages in thread
From: Zack Rusin @ 2025-12-29 21:58 UTC (permalink / raw)
  To: dri-devel
  Cc: Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun, Chia-I Wu,
	Christian König, Danilo Krummrich, Dave Airlie, Deepak Rawat,
	Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh, Hans de Goede,
	Hawking Zhang, Helge Deller, intel-gfx, intel-xe, Jani Nikula,
	Javier Martinez Canillas, Jocelyn Falempe, Joonas Lahtinen,
	Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv, linux-kernel,
	Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
	Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
	nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
	Thomas Hellström, Thomas Zimmermann, Timur Kristóf,
	Tvrtko Ursulin, virtualization, Vitaly Prosyak

Almost a rite of passage for every DRM developer and most Linux users
is upgrading your DRM driver/updating boot flags/changing some config
and having DRM driver fail at probe resulting in a blank screen.

Currently there's no way to recover from DRM driver probe failure. PCI
DRM driver explicitly throw out the existing sysfb to get exclusive
access to PCI resources so if the probe fails the system is left without
a functioning display driver.

Add code to sysfb to recever system framebuffer when DRM driver's probe
fails. This means that a DRM driver that fails to load reloads the system
framebuffer driver.

This works best with simpledrm. Without it Xorg won't recover because
it still tries to load the vendor specific driver which ends up usually
not working at all. With simpledrm the system recovers really nicely
ending up with a working console and not a blank screen.

There's a caveat in that some hardware might require some special magic
register write to recover EFI display. I'd appreciate it a lot if
maintainers could introduce a temporary failure in their drivers
probe to validate that the sysfb recovers and they get a working console.
The easiest way to double check it is by adding:
 /* XXX: Temporary failure to test sysfb restore - REMOVE BEFORE COMMIT */
 dev_info(&pdev->dev, "Testing sysfb restore: forcing probe failure\n");
 ret = -EINVAL;
 goto out_error;
or such right after the devm_aperture_remove_conflicting_pci_devices .

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ce Sun <cesun102@amd.com>
Cc: Chia-I Wu <olvaffe@gmail.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Deepak Rawat <drawat.floss@gmail.com>
Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Cc: dri-devel@lists.freedesktop.org
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Gurchetan Singh <gurchetansingh@chromium.org>
Cc: Hans de Goede <hansg@kernel.org>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Helge Deller <deller@gmx.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: intel-xe@lists.freedesktop.org
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: linux-efi@vger.kernel.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-hyperv@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Mario Limonciello (AMD)" <superm1@kernel.org>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: nouveau@lists.freedesktop.org
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: spice-devel@lists.freedesktop.org
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: "Timur Kristóf" <timur.kristof@gmail.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Cc: virtualization@lists.linux.dev
Cc: Vitaly Prosyak <vitaly.prosyak@amd.com>

Zack Rusin (12):
  video/aperture: Add sysfb restore on DRM probe failure
  drm/vmwgfx: Use devm aperture helpers for sysfb restore on probe
    failure
  drm/xe: Use devm aperture helpers for sysfb restore on probe failure
  drm/amdgpu: Use devm aperture helpers for sysfb restore on probe
    failure
  drm/virtio: Add sysfb restore on probe failure
  drm/nouveau: Use devm aperture helpers for sysfb restore on probe
    failure
  drm/qxl: Use devm aperture helpers for sysfb restore on probe failure
  drm/vboxvideo: Use devm aperture helpers for sysfb restore on probe
    failure
  drm/hyperv: Add sysfb restore on probe failure
  drm/ast: Use devm aperture helpers for sysfb restore on probe failure
  drm/radeon: Use devm aperture helpers for sysfb restore on probe
    failure
  drm/i915: Use devm aperture helpers for sysfb restore on probe failure

 drivers/firmware/efi/sysfb_efi.c           |   2 +-
 drivers/firmware/sysfb.c                   | 191 +++++++++++++--------
 drivers/firmware/sysfb_simplefb.c          |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   7 +
 drivers/gpu/drm/ast/ast_drv.c              |  13 +-
 drivers/gpu/drm/hyperv/hyperv_drm_drv.c    |  23 +++
 drivers/gpu/drm/i915/i915_driver.c         |  13 +-
 drivers/gpu/drm/nouveau/nouveau_drm.c      |  16 +-
 drivers/gpu/drm/qxl/qxl_drv.c              |  14 +-
 drivers/gpu/drm/radeon/radeon_drv.c        |  15 +-
 drivers/gpu/drm/vboxvideo/vbox_drv.c       |  13 +-
 drivers/gpu/drm/virtio/virtgpu_drv.c       |  29 ++++
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c        |  13 +-
 drivers/gpu/drm/xe/xe_device.c             |   7 +-
 drivers/gpu/drm/xe/xe_pci.c                |   7 +
 drivers/video/aperture.c                   |  54 ++++++
 include/linux/aperture.h                   |  14 ++
 include/linux/sysfb.h                      |   6 +
 19 files changed, 368 insertions(+), 88 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/12] video/aperture: Add sysfb restore on DRM probe failure
  2025-12-29 21:58 [PATCH 00/12] Recover sysfb after DRM probe failure Zack Rusin
@ 2025-12-29 21:58 ` Zack Rusin
  2025-12-29 21:58 ` [PATCH 02/12] drm/vmwgfx: Use devm aperture helpers for sysfb restore on " Zack Rusin
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Zack Rusin @ 2025-12-29 21:58 UTC (permalink / raw)
  To: dri-devel
  Cc: Ard Biesheuvel, Thomas Zimmermann, Javier Martinez Canillas,
	Helge Deller, linux-efi, linux-kernel, linux-fbdev

When a DRM driver calls aperture_remove_conflicting_pci_devices(), the
firmware framebuffer (EFI, VESA, etc.) is disabled and its platform
device is unregistered. If the DRM driver's probe subsequently fails,
the user is left with no display output.

Add sysfb_restore() to re-enable the Generic System Framebuffers
support and re-create the platform device that was previously
unregistered by sysfb_disable().

Add devm_aperture_remove_conflicting_pci_devices() which wraps the
existing function and registers a devm action to automatically call
sysfb_restore() if the driver's probe fails or the driver is unloaded.
Drivers can call devm_aperture_remove_conflicting_pci_devices_done()
after successful probe to cancel the automatic restore.

Refactor sysfb_init() to use a shared __sysfb_create_device() helper
that can be called from both sysfb_init() and sysfb_restore(). Add a
quirks_applied flag to handle the edge case where a driver calls
sysfb_disable() before sysfb_init() runs, in this case sysfb_restore()
defers device creation to sysfb_init() since the __init quirk functions
cannot be called after init memory is freed.

Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-efi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-fbdev@vger.kernel.org
---
 drivers/firmware/efi/sysfb_efi.c  |   2 +-
 drivers/firmware/sysfb.c          | 191 +++++++++++++++++++-----------
 drivers/firmware/sysfb_simplefb.c |  10 +-
 drivers/video/aperture.c          |  54 +++++++++
 include/linux/aperture.h          |  14 +++
 include/linux/sysfb.h             |   6 +
 6 files changed, 201 insertions(+), 76 deletions(-)

diff --git a/drivers/firmware/efi/sysfb_efi.c b/drivers/firmware/efi/sysfb_efi.c
index 1e509595ac03..3fe7c57ad849 100644
--- a/drivers/firmware/efi/sysfb_efi.c
+++ b/drivers/firmware/efi/sysfb_efi.c
@@ -365,7 +365,7 @@ __init void sysfb_apply_efi_quirks(void)
 	}
 }
 
-__init void sysfb_set_efifb_fwnode(struct platform_device *pd)
+void sysfb_set_efifb_fwnode(struct platform_device *pd)
 {
 	if (screen_info.orig_video_isVGA == VIDEO_TYPE_EFI && IS_ENABLED(CONFIG_PCI)) {
 		fwnode_init(&efifb_fwnode, &efifb_fwnode_ops);
diff --git a/drivers/firmware/sysfb.c b/drivers/firmware/sysfb.c
index 889e5b05c739..c45b6f487103 100644
--- a/drivers/firmware/sysfb.c
+++ b/drivers/firmware/sysfb.c
@@ -38,6 +38,7 @@
 static struct platform_device *pd;
 static DEFINE_MUTEX(disable_lock);
 static bool disabled;
+static bool quirks_applied;
 
 static struct device *sysfb_parent_dev(const struct screen_info *si);
 
@@ -79,6 +80,121 @@ void sysfb_disable(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(sysfb_disable);
 
+/* Caller must hold disable_lock */
+static int __sysfb_create_device(bool restore)
+{
+	struct screen_info *si = &screen_info;
+	struct device *parent;
+	unsigned int type;
+	struct simplefb_platform_data mode;
+	const char *name;
+	bool compatible;
+	int ret = 0;
+
+	if (!IS_ERR_OR_NULL(pd))
+		return 0;
+
+	/*
+	 * If quirks haven't been applied yet, sysfb_init() hasn't run.
+	 * Don't create the device now - let sysfb_init() do it after
+	 * applying the necessary fixups and quirks. We can't call
+	 * sysfb_apply_efi_quirks() here because it's __init.
+	 */
+	if (!quirks_applied)
+		return 0;
+
+	parent = sysfb_parent_dev(si);
+	if (IS_ERR(parent))
+		return PTR_ERR(parent);
+
+	type = screen_info_video_type(si);
+
+	/* try to create a simple-framebuffer device */
+	compatible = sysfb_parse_mode(si, &mode);
+	if (compatible) {
+		pd = sysfb_create_simplefb(si, &mode, parent);
+		if (!IS_ERR(pd)) {
+			if (restore)
+				pr_info("sysfb: restored simple-framebuffer device\n");
+			goto put_device;
+		}
+	}
+
+	/* if the FB is incompatible, create a legacy framebuffer device */
+	switch (type) {
+	case VIDEO_TYPE_EGAC:
+		name = "ega-framebuffer";
+		break;
+	case VIDEO_TYPE_VGAC:
+		name = "vga-framebuffer";
+		break;
+	case VIDEO_TYPE_VLFB:
+		name = "vesa-framebuffer";
+		break;
+	case VIDEO_TYPE_EFI:
+		name = "efi-framebuffer";
+		break;
+	default:
+		name = "platform-framebuffer";
+		break;
+	}
+
+	pd = platform_device_alloc(name, 0);
+	if (!pd) {
+		ret = -ENOMEM;
+		goto put_device;
+	}
+
+	pd->dev.parent = parent;
+
+	sysfb_set_efifb_fwnode(pd);
+
+	ret = platform_device_add_data(pd, si, sizeof(*si));
+	if (ret)
+		goto err;
+
+	ret = platform_device_add(pd);
+	if (ret)
+		goto err;
+
+	if (restore)
+		pr_info("sysfb: restored %s device\n", name);
+	goto put_device;
+err:
+	platform_device_put(pd);
+	pd = NULL;
+put_device:
+	put_device(parent);
+	return ret;
+}
+
+/**
+ * sysfb_restore() - restore the Generic System Framebuffer
+ *
+ * This function re-enables the Generic System Framebuffers support and
+ * re-creates the platform device that was previously unregistered by
+ * sysfb_disable(). This is intended for use by DRM drivers that need to
+ * restore the fallback framebuffer when their probe fails after having
+ * called aperture_remove_conflicting_devices() or similar.
+ *
+ * Context: The function can sleep. A @disable_lock mutex is acquired.
+ *
+ * Returns:
+ * 0 on success, or a negative errno value otherwise.
+ */
+int sysfb_restore(void)
+{
+	int ret;
+
+	mutex_lock(&disable_lock);
+	disabled = false;
+	ret = __sysfb_create_device(true);
+	mutex_unlock(&disable_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(sysfb_restore);
+
 /**
  * sysfb_handles_screen_info() - reports if sysfb handles the global screen_info
  *
@@ -141,82 +257,17 @@ static struct device *sysfb_parent_dev(const struct screen_info *si)
 
 static __init int sysfb_init(void)
 {
-	struct screen_info *si = &screen_info;
-	struct device *parent;
-	unsigned int type;
-	struct simplefb_platform_data mode;
-	const char *name;
-	bool compatible;
 	int ret = 0;
 
 	screen_info_apply_fixups();
-
-	mutex_lock(&disable_lock);
-	if (disabled)
-		goto unlock_mutex;
-
 	sysfb_apply_efi_quirks();
 
-	parent = sysfb_parent_dev(si);
-	if (IS_ERR(parent)) {
-		ret = PTR_ERR(parent);
-		goto unlock_mutex;
-	}
-
-	/* try to create a simple-framebuffer device */
-	compatible = sysfb_parse_mode(si, &mode);
-	if (compatible) {
-		pd = sysfb_create_simplefb(si, &mode, parent);
-		if (!IS_ERR(pd))
-			goto put_device;
-	}
-
-	type = screen_info_video_type(si);
-
-	/* if the FB is incompatible, create a legacy framebuffer device */
-	switch (type) {
-	case VIDEO_TYPE_EGAC:
-		name = "ega-framebuffer";
-		break;
-	case VIDEO_TYPE_VGAC:
-		name = "vga-framebuffer";
-		break;
-	case VIDEO_TYPE_VLFB:
-		name = "vesa-framebuffer";
-		break;
-	case VIDEO_TYPE_EFI:
-		name = "efi-framebuffer";
-		break;
-	default:
-		name = "platform-framebuffer";
-		break;
-	}
-
-	pd = platform_device_alloc(name, 0);
-	if (!pd) {
-		ret = -ENOMEM;
-		goto put_device;
-	}
-
-	pd->dev.parent = parent;
-
-	sysfb_set_efifb_fwnode(pd);
-
-	ret = platform_device_add_data(pd, si, sizeof(*si));
-	if (ret)
-		goto err;
-
-	ret = platform_device_add(pd);
-	if (ret)
-		goto err;
-
-	goto put_device;
-err:
-	platform_device_put(pd);
-put_device:
-	put_device(parent);
-unlock_mutex:
+	mutex_lock(&disable_lock);
+	quirks_applied = true;
+	if (!disabled)
+		ret = __sysfb_create_device(false);
 	mutex_unlock(&disable_lock);
+
 	return ret;
 }
 
diff --git a/drivers/firmware/sysfb_simplefb.c b/drivers/firmware/sysfb_simplefb.c
index 592d8a644619..6fcbc3ae17d5 100644
--- a/drivers/firmware/sysfb_simplefb.c
+++ b/drivers/firmware/sysfb_simplefb.c
@@ -24,8 +24,8 @@ static const char simplefb_resname[] = "BOOTFB";
 static const struct simplefb_format formats[] = SIMPLEFB_FORMATS;
 
 /* try parsing screen_info into a simple-framebuffer mode struct */
-__init bool sysfb_parse_mode(const struct screen_info *si,
-			     struct simplefb_platform_data *mode)
+bool sysfb_parse_mode(const struct screen_info *si,
+		      struct simplefb_platform_data *mode)
 {
 	__u8 type;
 	u32 bits_per_pixel;
@@ -61,9 +61,9 @@ __init bool sysfb_parse_mode(const struct screen_info *si,
 	return false;
 }
 
-__init struct platform_device *sysfb_create_simplefb(const struct screen_info *si,
-						     const struct simplefb_platform_data *mode,
-						     struct device *parent)
+struct platform_device *sysfb_create_simplefb(const struct screen_info *si,
+					      const struct simplefb_platform_data *mode,
+					      struct device *parent)
 {
 	struct platform_device *pd;
 	struct resource res;
diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 2b5a1e666e9b..4de6dc04a3fd 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -372,3 +372,57 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na
 
 }
 EXPORT_SYMBOL(aperture_remove_conflicting_pci_devices);
+
+static void devm_aperture_restore_sysfb(void *unused)
+{
+	sysfb_restore();
+}
+
+/**
+ * devm_aperture_remove_conflicting_pci_devices - remove existing framebuffers
+ *                                                with sysfb restore on failure
+ * @pdev: PCI device
+ * @name: a descriptive name of the requesting driver
+ *
+ * This function removes devices that own apertures within any of @pdev's
+ * memory bars, similar to aperture_remove_conflicting_pci_devices().
+ *
+ * Additionally, it registers a devm action that will restore the system
+ * framebuffer if the driver's probe fails or the driver is unloaded. This
+ * ensures the user doesn't lose display output if the DRM driver probe fails
+ * after removing the firmware framebuffer.
+ *
+ * This function should be called early in the driver's probe function. The
+ * driver must call devm_aperture_remove_conflicting_pci_devices_done() after
+ * successfully completing probe to cancel the automatic restore.
+ *
+ * Returns:
+ * 0 on success, or a negative errno code otherwise
+ */
+int devm_aperture_remove_conflicting_pci_devices(struct pci_dev *pdev,
+						 const char *name)
+{
+	int ret;
+
+	ret = aperture_remove_conflicting_pci_devices(pdev, name);
+	if (ret)
+		return ret;
+
+	return devm_add_action_or_reset(&pdev->dev, devm_aperture_restore_sysfb,
+					NULL);
+}
+EXPORT_SYMBOL(devm_aperture_remove_conflicting_pci_devices);
+
+/**
+ * devm_aperture_remove_conflicting_pci_devices_done - cancel sysfb restore
+ * @pdev: PCI device
+ *
+ * Cancels the automatic sysfb restore action registered by
+ * devm_aperture_remove_conflicting_pci_devices(). Call this after the
+ * driver has successfully completed probe and registered its display.
+ */
+void devm_aperture_remove_conflicting_pci_devices_done(struct pci_dev *pdev)
+{
+	devm_remove_action(&pdev->dev, devm_aperture_restore_sysfb, NULL);
+}
+EXPORT_SYMBOL(devm_aperture_remove_conflicting_pci_devices_done);
diff --git a/include/linux/aperture.h b/include/linux/aperture.h
index 1a9a88b11584..ea0ece7f777e 100644
--- a/include/linux/aperture.h
+++ b/include/linux/aperture.h
@@ -19,6 +19,10 @@ int aperture_remove_conflicting_devices(resource_size_t base, resource_size_t si
 int __aperture_remove_legacy_vga_devices(struct pci_dev *pdev);
 
 int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *name);
+
+int devm_aperture_remove_conflicting_pci_devices(struct pci_dev *pdev,
+						 const char *name);
+void devm_aperture_remove_conflicting_pci_devices_done(struct pci_dev *pdev);
 #else
 static inline int devm_aperture_acquire_for_platform_device(struct platform_device *pdev,
 							    resource_size_t base,
@@ -42,6 +46,16 @@ static inline int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev,
 {
 	return 0;
 }
+
+static inline int devm_aperture_remove_conflicting_pci_devices(struct pci_dev *pdev,
+							       const char *name)
+{
+	return 0;
+}
+
+static inline void devm_aperture_remove_conflicting_pci_devices_done(struct pci_dev *pdev)
+{
+}
 #endif
 
 /**
diff --git a/include/linux/sysfb.h b/include/linux/sysfb.h
index b449665c686a..c0ade38bcf99 100644
--- a/include/linux/sysfb.h
+++ b/include/linux/sysfb.h
@@ -63,6 +63,7 @@ struct efifb_dmi_info {
 #ifdef CONFIG_SYSFB
 
 void sysfb_disable(struct device *dev);
+int sysfb_restore(void);
 
 bool sysfb_handles_screen_info(void);
 
@@ -72,6 +73,11 @@ static inline void sysfb_disable(struct device *dev)
 {
 }
 
+static inline int sysfb_restore(void)
+{
+	return -ENODEV;
+}
+
 static inline bool sysfb_handles_screen_info(void)
 {
 	return false;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/12] drm/vmwgfx: Use devm aperture helpers for sysfb restore on probe failure
  2025-12-29 21:58 [PATCH 00/12] Recover sysfb after DRM probe failure Zack Rusin
  2025-12-29 21:58 ` [PATCH 01/12] video/aperture: Add sysfb restore on " Zack Rusin
@ 2025-12-29 21:58 ` Zack Rusin
  2025-12-29 21:58 ` [PATCH 03/12] drm/xe: " Zack Rusin
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Zack Rusin @ 2025-12-29 21:58 UTC (permalink / raw)
  To: dri-devel
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, linux-kernel

Use devm_aperture_remove_conflicting_pci_devices() instead of the
non-devm variant to automatically restore the system framebuffer
(efifb/simpledrm) if the driver's probe fails after removing the
firmware framebuffer.

Call devm_aperture_remove_conflicting_pci_devices_done() after
successful probe to cancel the automatic restore, as the driver
is now responsible for display output.

Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 599052d07ae8..1b0fc4f9d4af 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -1622,7 +1622,12 @@ static int vmw_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	struct vmw_private *vmw;
 	int ret;
 
-	ret = aperture_remove_conflicting_pci_devices(pdev, driver.name);
+	/*
+	 * Use devm variant to automatically restore sysfb if probe fails.
+	 * This ensures the user doesn't lose display if our probe fails
+	 * after removing the firmware framebuffer (efifb/simpledrm).
+	 */
+	ret = devm_aperture_remove_conflicting_pci_devices(pdev, driver.name);
 	if (ret)
 		goto out_error;
 
@@ -1647,6 +1652,12 @@ static int vmw_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (ret)
 		goto out_unload;
 
+	/*
+	 * Probe succeeded - cancel the automatic sysfb restore action.
+	 * We're now responsible for display output.
+	 */
+	devm_aperture_remove_conflicting_pci_devices_done(pdev);
+
 	vmw_fifo_resource_inc(vmw);
 	vmw_svga_enable(vmw);
 	drm_client_setup(&vmw->drm, NULL);
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/12] drm/xe: Use devm aperture helpers for sysfb restore on probe failure
  2025-12-29 21:58 [PATCH 00/12] Recover sysfb after DRM probe failure Zack Rusin
  2025-12-29 21:58 ` [PATCH 01/12] video/aperture: Add sysfb restore on " Zack Rusin
  2025-12-29 21:58 ` [PATCH 02/12] drm/vmwgfx: Use devm aperture helpers for sysfb restore on " Zack Rusin
@ 2025-12-29 21:58 ` Zack Rusin
  2025-12-29 21:58 ` [PATCH 04/12] drm/amdgpu: " Zack Rusin
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Zack Rusin @ 2025-12-29 21:58 UTC (permalink / raw)
  To: dri-devel
  Cc: Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	David Airlie, Simona Vetter, intel-xe, linux-kernel

Use devm_aperture_remove_conflicting_pci_devices() instead of the
non-devm variant to automatically restore the system framebuffer
(efifb/simpledrm) if the driver's probe fails after removing the
firmware framebuffer.

Call devm_aperture_remove_conflicting_pci_devices_done() after
successful probe to cancel the automatic restore, as the driver
is now responsible for display output.

Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: intel-xe@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/gpu/drm/xe/xe_device.c | 7 ++++++-
 drivers/gpu/drm/xe/xe_pci.c    | 7 +++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index c7d373c70f0f..ee9ae73222d9 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -428,7 +428,12 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
 
 	xe_display_driver_set_hooks(&driver);
 
-	err = aperture_remove_conflicting_pci_devices(pdev, driver.name);
+	/*
+	 * Use devm variant to automatically restore sysfb if probe fails.
+	 * This ensures the user doesn't lose display if our probe fails
+	 * after removing the firmware framebuffer (efifb/simpledrm).
+	 */
+	err = devm_aperture_remove_conflicting_pci_devices(pdev, driver.name);
 	if (err)
 		return ERR_PTR(err);
 
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 9c9ea10d994c..ee08a09fda6a 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -6,6 +6,7 @@
 #include "xe_pci.h"
 
 #include <kunit/static_stub.h>
+#include <linux/aperture.h>
 #include <linux/device/driver.h>
 #include <linux/module.h>
 #include <linux/pci.h>
@@ -1058,6 +1059,12 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	drm_dbg(&xe->drm, "d3cold: capable=%s\n",
 		str_yes_no(xe->d3cold.capable));
 
+	/*
+	 * Probe succeeded - cancel the automatic sysfb restore action.
+	 * We're now responsible for display output.
+	 */
+	devm_aperture_remove_conflicting_pci_devices_done(pdev);
+
 	return 0;
 
 err_driver_cleanup:
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/12] drm/amdgpu: Use devm aperture helpers for sysfb restore on probe failure
  2025-12-29 21:58 [PATCH 00/12] Recover sysfb after DRM probe failure Zack Rusin
                   ` (2 preceding siblings ...)
  2025-12-29 21:58 ` [PATCH 03/12] drm/xe: " Zack Rusin
@ 2025-12-29 21:58 ` Zack Rusin
  2025-12-29 21:58 ` [PATCH 05/12] drm/virtio: Add " Zack Rusin
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Zack Rusin @ 2025-12-29 21:58 UTC (permalink / raw)
  To: dri-devel
  Cc: Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Lijo Lazar, Hawking Zhang, Mario Limonciello, Ce Sun,
	Mario Limonciello (AMD), Timur Kristóf, Vitaly Prosyak,
	amd-gfx, linux-kernel

Use devm_aperture_remove_conflicting_pci_devices() instead of the
non-devm variant to automatically restore the system framebuffer
(efifb/simpledrm) if the driver's probe fails after removing the
firmware framebuffer.

Call devm_aperture_remove_conflicting_pci_devices_done() after
successful probe to cancel the automatic restore, as the driver
is now responsible for display output.

The aperture removal only applies to VGA and display class devices,
matching the existing behavior. This ensures users don't lose
display output if the amdgpu driver fails to probe after removing
the firmware framebuffer.

Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Ce Sun <cesun102@amd.com>
Cc: "Mario Limonciello (AMD)" <superm1@kernel.org>
Cc: "Timur Kristóf" <timur.kristof@gmail.com>
Cc: Vitaly Prosyak <vitaly.prosyak@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 7 +++++++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 58c3ffe707d1..6c867657225e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4737,8 +4737,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 	 */
 	if ((pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA ||
 	    (pdev->class >> 8) == PCI_CLASS_DISPLAY_OTHER) {
-		/* Get rid of things like offb */
-		r = aperture_remove_conflicting_pci_devices(adev->pdev, amdgpu_kms_driver.name);
+		/*
+		 * Get rid of things like offb. Use devm variant to
+		 * automatically restore sysfb if probe fails. This ensures
+		 * the user doesn't lose display if our probe fails after
+		 * removing the firmware framebuffer (efifb/simpledrm).
+		 */
+		r = devm_aperture_remove_conflicting_pci_devices(adev->pdev, amdgpu_kms_driver.name);
 		if (r)
 			return r;
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 2dfbddcef9ab..fc2d2dbaebe8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -32,6 +32,7 @@
 #include <drm/drm_probe_helper.h>
 #include <drm/drm_vblank.h>
 
+#include <linux/aperture.h>
 #include <linux/cc_platform.h>
 #include <linux/dynamic_debug.h>
 #include <linux/module.h>
@@ -2528,6 +2529,12 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
 			amdgpu_get_secondary_funcs(adev);
 	}
 
+	/*
+	 * Probe succeeded - cancel the automatic sysfb restore action.
+	 * We're now responsible for display output.
+	 */
+	devm_aperture_remove_conflicting_pci_devices_done(pdev);
+
 	return 0;
 
 err_pci:
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/12] drm/virtio: Add sysfb restore on probe failure
  2025-12-29 21:58 [PATCH 00/12] Recover sysfb after DRM probe failure Zack Rusin
                   ` (3 preceding siblings ...)
  2025-12-29 21:58 ` [PATCH 04/12] drm/amdgpu: " Zack Rusin
@ 2025-12-29 21:58 ` Zack Rusin
  2026-01-15 10:12   ` Dmitry Osipenko
  2025-12-29 21:58 ` [PATCH 06/12] drm/nouveau: Use devm aperture helpers for " Zack Rusin
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Zack Rusin @ 2025-12-29 21:58 UTC (permalink / raw)
  To: dri-devel
  Cc: David Airlie, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter, virtualization, linux-kernel

Register a devm action on the virtio device to restore the system
framebuffer (efifb/simpledrm) if the driver's probe fails after
removing the firmware framebuffer.

Unlike PCI drivers, virtio-gpu cannot use the
devm_aperture_remove_conflicting_pci_devices() helper because the
PCI device is managed by the virtio-pci driver, not by virtio-gpu.
When virtio-gpu probe fails, the PCI device remains bound to
virtio-pci, so devm actions registered on the PCI device won't fire.

Instead, register the sysfb restore action on the virtio device
(&vdev->dev) which will be released if virtio-gpu probe fails.
Cancel the action after successful probe since the driver is now
responsible for display output.

This only applies to VGA devices where aperture_remove_conflicting_pci_devices()
is called to remove the firmware framebuffer.

Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: David Airlie <airlied@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Cc: Gurchetan Singh <gurchetansingh@chromium.org>
Cc: Chia-I Wu <olvaffe@gmail.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: virtualization@lists.linux.dev
Cc: linux-kernel@vger.kernel.org
---
 drivers/gpu/drm/virtio/virtgpu_drv.c | 29 ++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c
index a5ce96fb8a1d..13cc8396fc78 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -30,6 +30,7 @@
 #include <linux/module.h>
 #include <linux/pci.h>
 #include <linux/poll.h>
+#include <linux/sysfb.h>
 #include <linux/vgaarb.h>
 #include <linux/wait.h>
 
@@ -52,6 +53,11 @@ static int virtio_gpu_modeset = -1;
 MODULE_PARM_DESC(modeset, "Disable/Enable modesetting");
 module_param_named(modeset, virtio_gpu_modeset, int, 0400);
 
+static void virtio_gpu_restore_sysfb(void *unused)
+{
+	sysfb_restore();
+}
+
 static int virtio_gpu_pci_quirk(struct drm_device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
@@ -75,6 +81,7 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
 {
 	struct drm_device *dev;
 	int ret;
+	bool sysfb_restore_registered = false;
 
 	if (drm_firmware_drivers_only() && virtio_gpu_modeset == -1)
 		return -EINVAL;
@@ -97,6 +104,21 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
 		ret = virtio_gpu_pci_quirk(dev);
 		if (ret)
 			goto err_free;
+
+		/*
+		 * For VGA devices, register sysfb restore on the virtio device.
+		 * We can't use devm_aperture_remove_conflicting_pci_devices()
+		 * because the PCI device is managed by virtio-pci, not us.
+		 * Register on &vdev->dev so it fires if our probe fails.
+		 */
+		if (pci_is_vga(to_pci_dev(vdev->dev.parent))) {
+			ret = devm_add_action_or_reset(&vdev->dev,
+						       virtio_gpu_restore_sysfb,
+						       NULL);
+			if (ret)
+				goto err_free;
+			sysfb_restore_registered = true;
+		}
 	}
 
 	dma_set_max_seg_size(dev->dev, dma_max_mapping_size(dev->dev) ?: UINT_MAX);
@@ -110,6 +132,13 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
 
 	drm_client_setup(vdev->priv, NULL);
 
+	/*
+	 * Probe succeeded - cancel sysfb restore. We're now responsible
+	 * for display output.
+	 */
+	if (sysfb_restore_registered)
+		devm_remove_action(&vdev->dev, virtio_gpu_restore_sysfb, NULL);
+
 	return 0;
 
 err_deinit:
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/12] drm/nouveau: Use devm aperture helpers for sysfb restore on probe failure
  2025-12-29 21:58 [PATCH 00/12] Recover sysfb after DRM probe failure Zack Rusin
                   ` (4 preceding siblings ...)
  2025-12-29 21:58 ` [PATCH 05/12] drm/virtio: Add " Zack Rusin
@ 2025-12-29 21:58 ` Zack Rusin
  2025-12-29 21:58 ` [PATCH 07/12] drm/qxl: " Zack Rusin
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Zack Rusin @ 2025-12-29 21:58 UTC (permalink / raw)
  To: dri-devel
  Cc: Lyude Paul, Danilo Krummrich, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, nouveau,
	linux-kernel

Use devm_aperture_remove_conflicting_pci_devices() instead of the
non-devm variant to automatically restore the system framebuffer
(efifb/simpledrm) if the driver's probe fails after removing the
firmware framebuffer.

Call devm_aperture_remove_conflicting_pci_devices_done() after
successful probe to cancel the automatic restore, as the driver
is now responsible for display output.

This ensures users don't lose display output if the nouveau driver
fails to probe after removing the firmware framebuffer.

Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/gpu/drm/nouveau/nouveau_drm.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 1527b801f013..7211ec6cdcc9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -871,8 +871,13 @@ static int nouveau_drm_probe(struct pci_dev *pdev,
 	if (ret)
 		return ret;
 
-	/* Remove conflicting drivers (vesafb, efifb etc). */
-	ret = aperture_remove_conflicting_pci_devices(pdev, driver_pci.name);
+	/*
+	 * Remove conflicting drivers (vesafb, efifb etc). Use devm variant
+	 * to automatically restore sysfb if probe fails. This ensures the
+	 * user doesn't lose display if our probe fails after removing the
+	 * firmware framebuffer (efifb/simpledrm).
+	 */
+	ret = devm_aperture_remove_conflicting_pci_devices(pdev, driver_pci.name);
 	if (ret)
 		return ret;
 
@@ -903,6 +908,13 @@ static int nouveau_drm_probe(struct pci_dev *pdev,
 	drm_client_setup(drm->dev, format);
 
 	quirk_broken_nv_runpm(pdev);
+
+	/*
+	 * Probe succeeded - cancel the automatic sysfb restore action.
+	 * We're now responsible for display output.
+	 */
+	devm_aperture_remove_conflicting_pci_devices_done(pdev);
+
 	return 0;
 
 fail_pci:
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/12] drm/qxl: Use devm aperture helpers for sysfb restore on probe failure
  2025-12-29 21:58 [PATCH 00/12] Recover sysfb after DRM probe failure Zack Rusin
                   ` (5 preceding siblings ...)
  2025-12-29 21:58 ` [PATCH 06/12] drm/nouveau: Use devm aperture helpers for " Zack Rusin
@ 2025-12-29 21:58 ` Zack Rusin
  2025-12-29 21:58 ` [PATCH 08/12] drm/vboxvideo: " Zack Rusin
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Zack Rusin @ 2025-12-29 21:58 UTC (permalink / raw)
  To: dri-devel
  Cc: Dave Airlie, Gerd Hoffmann, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, virtualization,
	spice-devel, linux-kernel

Use devm_aperture_remove_conflicting_pci_devices() instead of the
non-devm variant to automatically restore the system framebuffer
(efifb/simpledrm) if the driver's probe fails after removing the
firmware framebuffer.

Call devm_aperture_remove_conflicting_pci_devices_done() after
successful probe to cancel the automatic restore, as the driver
is now responsible for display output.

This ensures users don't lose display output if the qxl driver
fails to probe after removing the firmware framebuffer.

Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: virtualization@lists.linux.dev
Cc: spice-devel@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/gpu/drm/qxl/qxl_drv.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/qxl/qxl_drv.c b/drivers/gpu/drm/qxl/qxl_drv.c
index 2bbb1168a3ff..ca4c817fd611 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.c
+++ b/drivers/gpu/drm/qxl/qxl_drv.c
@@ -93,7 +93,12 @@ qxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (ret)
 		return ret;
 
-	ret = aperture_remove_conflicting_pci_devices(pdev, qxl_driver.name);
+	/*
+	 * Use devm variant to automatically restore sysfb if probe fails.
+	 * This ensures the user doesn't lose display if our probe fails
+	 * after removing the firmware framebuffer (efifb/simpledrm).
+	 */
+	ret = devm_aperture_remove_conflicting_pci_devices(pdev, qxl_driver.name);
 	if (ret)
 		goto disable_pci;
 
@@ -121,6 +126,13 @@ qxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto modeset_cleanup;
 
 	drm_client_setup(&qdev->ddev, NULL);
+
+	/*
+	 * Probe succeeded - cancel the automatic sysfb restore action.
+	 * We're now responsible for display output.
+	 */
+	devm_aperture_remove_conflicting_pci_devices_done(pdev);
+
 	return 0;
 
 modeset_cleanup:
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/12] drm/vboxvideo: Use devm aperture helpers for sysfb restore on probe failure
  2025-12-29 21:58 [PATCH 00/12] Recover sysfb after DRM probe failure Zack Rusin
                   ` (6 preceding siblings ...)
  2025-12-29 21:58 ` [PATCH 07/12] drm/qxl: " Zack Rusin
@ 2025-12-29 21:58 ` Zack Rusin
  2025-12-29 21:58 ` [PATCH 09/12] drm/hyperv: Add " Zack Rusin
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Zack Rusin @ 2025-12-29 21:58 UTC (permalink / raw)
  To: dri-devel
  Cc: Hans de Goede, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, linux-kernel

Use devm_aperture_remove_conflicting_pci_devices() instead of the
non-devm variant to automatically restore the system framebuffer
(efifb/simpledrm) if the driver's probe fails after removing the
firmware framebuffer.

Call devm_aperture_remove_conflicting_pci_devices_done() after
successful probe to cancel the automatic restore, as the driver
is now responsible for display output.

This ensures users don't lose display output if the vboxvideo driver
fails to probe after removing the firmware framebuffer.

Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: Hans de Goede <hansg@kernel.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/gpu/drm/vboxvideo/vbox_drv.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vboxvideo/vbox_drv.c b/drivers/gpu/drm/vboxvideo/vbox_drv.c
index bb861f0a0a31..569fd7b60115 100644
--- a/drivers/gpu/drm/vboxvideo/vbox_drv.c
+++ b/drivers/gpu/drm/vboxvideo/vbox_drv.c
@@ -46,7 +46,12 @@ static int vbox_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (!vbox_check_supported(VBE_DISPI_ID_HGSMI))
 		return -ENODEV;
 
-	ret = aperture_remove_conflicting_pci_devices(pdev, driver.name);
+	/*
+	 * Use devm variant to automatically restore sysfb if probe fails.
+	 * This ensures the user doesn't lose display if our probe fails
+	 * after removing the firmware framebuffer (efifb/simpledrm).
+	 */
+	ret = devm_aperture_remove_conflicting_pci_devices(pdev, driver.name);
 	if (ret)
 		return ret;
 
@@ -84,6 +89,12 @@ static int vbox_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	drm_client_setup(&vbox->ddev, NULL);
 
+	/*
+	 * Probe succeeded - cancel the automatic sysfb restore action.
+	 * We're now responsible for display output.
+	 */
+	devm_aperture_remove_conflicting_pci_devices_done(pdev);
+
 	return 0;
 
 err_irq_fini:
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/12] drm/hyperv: Add sysfb restore on probe failure
  2025-12-29 21:58 [PATCH 00/12] Recover sysfb after DRM probe failure Zack Rusin
                   ` (7 preceding siblings ...)
  2025-12-29 21:58 ` [PATCH 08/12] drm/vboxvideo: " Zack Rusin
@ 2025-12-29 21:58 ` Zack Rusin
  2025-12-29 21:58 ` [PATCH 10/12] drm/ast: Use devm aperture helpers for " Zack Rusin
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Zack Rusin @ 2025-12-29 21:58 UTC (permalink / raw)
  To: dri-devel
  Cc: Deepak Rawat, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, linux-hyperv, linux-kernel

Register a devm action on the vmbus device to restore the system
framebuffer (efifb/simpledrm) if the driver's probe fails after
removing the firmware framebuffer.

Unlike PCI drivers, hyperv cannot use the
devm_aperture_remove_conflicting_pci_devices() helper because this
is a vmbus device, not a PCI device. Instead, register the sysfb
restore action on the hv device (&hdev->device) which will be
released if probe fails. Cancel the action after successful probe
since the driver is now responsible for display output.

This ensures users don't lose display output if the hyperv driver
fails to probe after removing the firmware framebuffer.

Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: Deepak Rawat <drawat.floss@gmail.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: linux-hyperv@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/gpu/drm/hyperv/hyperv_drm_drv.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_drv.c b/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
index 06b5d96e6eaf..6d66cd243bab 100644
--- a/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
+++ b/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
@@ -8,6 +8,7 @@
 #include <linux/hyperv.h>
 #include <linux/module.h>
 #include <linux/pci.h>
+#include <linux/sysfb.h>
 
 #include <drm/clients/drm_client_setup.h>
 #include <drm/drm_atomic_helper.h>
@@ -102,6 +103,11 @@ static int hyperv_setup_vram(struct hyperv_drm_device *hv,
 	return ret;
 }
 
+static void hyperv_restore_sysfb(void *unused)
+{
+	sysfb_restore();
+}
+
 static int hyperv_vmbus_probe(struct hv_device *hdev,
 			      const struct hv_vmbus_device_id *dev_id)
 {
@@ -127,6 +133,17 @@ static int hyperv_vmbus_probe(struct hv_device *hdev,
 
 	aperture_remove_all_conflicting_devices(hyperv_driver.name);
 
+	/*
+	 * Register sysfb restore on the hv device. We can't use
+	 * devm_aperture_remove_conflicting_pci_devices() because this
+	 * is a vmbus device, not a PCI device. Register on &hdev->device
+	 * so it fires if our probe fails after removing firmware FB.
+	 */
+	ret = devm_add_action_or_reset(&hdev->device, hyperv_restore_sysfb,
+				       NULL);
+	if (ret)
+		goto err_vmbus_close;
+
 	ret = hyperv_setup_vram(hv, hdev);
 	if (ret)
 		goto err_vmbus_close;
@@ -152,6 +169,12 @@ static int hyperv_vmbus_probe(struct hv_device *hdev,
 
 	drm_client_setup(dev, NULL);
 
+	/*
+	 * Probe succeeded - cancel sysfb restore. We're now responsible
+	 * for display output.
+	 */
+	devm_remove_action(&hdev->device, hyperv_restore_sysfb, NULL);
+
 	return 0;
 
 err_free_mmio:
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/12] drm/ast: Use devm aperture helpers for sysfb restore on probe failure
  2025-12-29 21:58 [PATCH 00/12] Recover sysfb after DRM probe failure Zack Rusin
                   ` (8 preceding siblings ...)
  2025-12-29 21:58 ` [PATCH 09/12] drm/hyperv: Add " Zack Rusin
@ 2025-12-29 21:58 ` Zack Rusin
  2025-12-29 21:58 ` [PATCH 11/12] drm/radeon: " Zack Rusin
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Zack Rusin @ 2025-12-29 21:58 UTC (permalink / raw)
  To: dri-devel
  Cc: Dave Airlie, Thomas Zimmermann, Jocelyn Falempe,
	Maarten Lankhorst, Maxime Ripard, David Airlie, Simona Vetter,
	linux-kernel

Use devm_aperture_remove_conflicting_pci_devices() instead of the
non-devm variant to automatically restore the system framebuffer
(efifb/simpledrm) if the driver's probe fails after removing the
firmware framebuffer.

Call devm_aperture_remove_conflicting_pci_devices_done() after
successful probe to cancel the automatic restore, as the driver
is now responsible for display output.

This ensures users don't lose display output if the ast driver
fails to probe after removing the firmware framebuffer.

Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/gpu/drm/ast/ast_drv.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ast/ast_drv.c b/drivers/gpu/drm/ast/ast_drv.c
index b9a9b050b546..8e6c7cbafa59 100644
--- a/drivers/gpu/drm/ast/ast_drv.c
+++ b/drivers/gpu/drm/ast/ast_drv.c
@@ -310,7 +310,12 @@ static int ast_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	struct drm_device *drm;
 	bool need_post = false;
 
-	ret = aperture_remove_conflicting_pci_devices(pdev, ast_driver.name);
+	/*
+	 * Use devm variant to automatically restore sysfb if probe fails.
+	 * This ensures the user doesn't lose display if our probe fails
+	 * after removing the firmware framebuffer (efifb/simpledrm).
+	 */
+	ret = devm_aperture_remove_conflicting_pci_devices(pdev, ast_driver.name);
 	if (ret)
 		return ret;
 
@@ -426,6 +431,12 @@ static int ast_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	drm_client_setup(drm, NULL);
 
+	/*
+	 * Probe succeeded - cancel the automatic sysfb restore action.
+	 * We're now responsible for display output.
+	 */
+	devm_aperture_remove_conflicting_pci_devices_done(pdev);
+
 	return 0;
 }
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/12] drm/radeon: Use devm aperture helpers for sysfb restore on probe failure
  2025-12-29 21:58 [PATCH 00/12] Recover sysfb after DRM probe failure Zack Rusin
                   ` (9 preceding siblings ...)
  2025-12-29 21:58 ` [PATCH 10/12] drm/ast: Use devm aperture helpers for " Zack Rusin
@ 2025-12-29 21:58 ` Zack Rusin
  2025-12-29 21:58 ` [PATCH 12/12] drm/i915: " Zack Rusin
  2026-01-09 10:34 ` [PATCH 00/12] Recover sysfb after DRM " Thomas Zimmermann
  12 siblings, 0 replies; 28+ messages in thread
From: Zack Rusin @ 2025-12-29 21:58 UTC (permalink / raw)
  To: dri-devel
  Cc: Alex Deucher, Christian König, David Airlie, Simona Vetter,
	amd-gfx, linux-kernel

Use devm_aperture_remove_conflicting_pci_devices() instead of the
non-devm variant to automatically restore the system framebuffer
(efifb/simpledrm) if the driver's probe fails after removing the
firmware framebuffer.

Call devm_aperture_remove_conflicting_pci_devices_done() after
successful probe to cancel the automatic restore, as the driver
is now responsible for display output.

This ensures users don't lose display output if the radeon driver
fails to probe after removing the firmware framebuffer.

Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/gpu/drm/radeon/radeon_drv.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
index 87fd6255c114..225f716d5db9 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -324,8 +324,13 @@ static int radeon_pci_probe(struct pci_dev *pdev,
 	if (vga_switcheroo_client_probe_defer(pdev))
 		return -EPROBE_DEFER;
 
-	/* Get rid of things like offb */
-	ret = aperture_remove_conflicting_pci_devices(pdev, kms_driver.name);
+	/*
+	 * Get rid of things like offb. Use devm variant to automatically
+	 * restore sysfb if probe fails. This ensures the user doesn't lose
+	 * display if our probe fails after removing the firmware framebuffer
+	 * (efifb/simpledrm).
+	 */
+	ret = devm_aperture_remove_conflicting_pci_devices(pdev, kms_driver.name);
 	if (ret)
 		return ret;
 
@@ -361,6 +366,12 @@ static int radeon_pci_probe(struct pci_dev *pdev,
 
 	drm_client_setup(ddev, format);
 
+	/*
+	 * Probe succeeded - cancel the automatic sysfb restore action.
+	 * We're now responsible for display output.
+	 */
+	devm_aperture_remove_conflicting_pci_devices_done(pdev);
+
 	return 0;
 
 err:
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 12/12] drm/i915: Use devm aperture helpers for sysfb restore on probe failure
  2025-12-29 21:58 [PATCH 00/12] Recover sysfb after DRM probe failure Zack Rusin
                   ` (10 preceding siblings ...)
  2025-12-29 21:58 ` [PATCH 11/12] drm/radeon: " Zack Rusin
@ 2025-12-29 21:58 ` Zack Rusin
  2026-01-09 10:34 ` [PATCH 00/12] Recover sysfb after DRM " Thomas Zimmermann
  12 siblings, 0 replies; 28+ messages in thread
From: Zack Rusin @ 2025-12-29 21:58 UTC (permalink / raw)
  To: dri-devel
  Cc: Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	David Airlie, Simona Vetter, intel-gfx, linux-kernel

Use devm_aperture_remove_conflicting_pci_devices() instead of the
non-devm variant to automatically restore the system framebuffer
(efifb/simpledrm) if the driver's probe fails after removing the
firmware framebuffer.

Call devm_aperture_remove_conflicting_pci_devices_done() after
successful probe to cancel the automatic restore, as the driver
is now responsible for display output.

This ensures users don't lose display output if the i915 driver
fails to probe after removing the firmware framebuffer.

Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_driver.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index c97b76771917..f9efeb825064 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -506,7 +506,12 @@ static int i915_driver_hw_probe(struct drm_i915_private *dev_priv)
 	if (ret)
 		goto err_perf;
 
-	ret = aperture_remove_conflicting_pci_devices(pdev, dev_priv->drm.driver->name);
+	/*
+	 * Use devm variant to automatically restore sysfb if probe fails.
+	 * This ensures the user doesn't lose display if our probe fails
+	 * after removing the firmware framebuffer (efifb/simpledrm).
+	 */
+	ret = devm_aperture_remove_conflicting_pci_devices(pdev, dev_priv->drm.driver->name);
 	if (ret)
 		goto err_ggtt;
 
@@ -866,6 +871,12 @@ int i915_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	i915->do_release = true;
 
+	/*
+	 * Probe succeeded - cancel the automatic sysfb restore action.
+	 * We're now responsible for display output.
+	 */
+	devm_aperture_remove_conflicting_pci_devices_done(pdev);
+
 	return 0;
 
 out_cleanup_gem:
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
  2025-12-29 21:58 [PATCH 00/12] Recover sysfb after DRM probe failure Zack Rusin
                   ` (11 preceding siblings ...)
  2025-12-29 21:58 ` [PATCH 12/12] drm/i915: " Zack Rusin
@ 2026-01-09 10:34 ` Thomas Zimmermann
  2026-01-10  4:52   ` Zack Rusin
  12 siblings, 1 reply; 28+ messages in thread
From: Thomas Zimmermann @ 2026-01-09 10:34 UTC (permalink / raw)
  To: Zack Rusin, dri-devel
  Cc: Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun, Chia-I Wu,
	Christian König, Danilo Krummrich, Dave Airlie, Deepak Rawat,
	Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh, Hans de Goede,
	Hawking Zhang, Helge Deller, intel-gfx, intel-xe, Jani Nikula,
	Javier Martinez Canillas, Jocelyn Falempe, Joonas Lahtinen,
	Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv, linux-kernel,
	Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
	Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
	nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
	Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
	virtualization, Vitaly Prosyak

Hi

Am 29.12.25 um 22:58 schrieb Zack Rusin:
> Almost a rite of passage for every DRM developer and most Linux users
> is upgrading your DRM driver/updating boot flags/changing some config
> and having DRM driver fail at probe resulting in a blank screen.
>
> Currently there's no way to recover from DRM driver probe failure. PCI
> DRM driver explicitly throw out the existing sysfb to get exclusive
> access to PCI resources so if the probe fails the system is left without
> a functioning display driver.
>
> Add code to sysfb to recever system framebuffer when DRM driver's probe
> fails. This means that a DRM driver that fails to load reloads the system
> framebuffer driver.
>
> This works best with simpledrm. Without it Xorg won't recover because
> it still tries to load the vendor specific driver which ends up usually
> not working at all. With simpledrm the system recovers really nicely
> ending up with a working console and not a blank screen.
>
> There's a caveat in that some hardware might require some special magic
> register write to recover EFI display. I'd appreciate it a lot if
> maintainers could introduce a temporary failure in their drivers
> probe to validate that the sysfb recovers and they get a working console.
> The easiest way to double check it is by adding:
>   /* XXX: Temporary failure to test sysfb restore - REMOVE BEFORE COMMIT */
>   dev_info(&pdev->dev, "Testing sysfb restore: forcing probe failure\n");
>   ret = -EINVAL;
>   goto out_error;
> or such right after the devm_aperture_remove_conflicting_pci_devices .

Recovering the display like that is guess work and will at best work 
with simple discrete devices where the framebuffer is always located in 
a confined graphics aperture.

But the problem you're trying to solve is a real one.

What we'd want to do instead is to take the initial hardware state into 
account when we do the initial mode-setting operation.

The first step is to move each driver's remove_conflicting_devices call 
to the latest possible location in the probe function. We usually do it 
first, because that's easy. But on most hardware, it could happen much 
later. The native driver is free to examine hardware state while probing 
the device as long as it does not interfere with the pre-configured 
framebuffer mode/format/address. Hence it can set up it's internal 
structures while the sysfb device is still active.

The next step for the native driver is to load the pre-configured 
hardware state into its initial internal atomic state. Maxime has worked 
on that on and off. The last iteration I'm aware of is at [1].

After the state-readout, the sysfb device has to be unplugged. But as 
the underlying hardware config remains active, the native driver can now 
use and modify it. We currently do a drm_mode_config_reset(), which 
clears the state and then let the first client set a new display state. 
But with state-readout, we could either pick up the existing framebuffer 
directly or do a proper modeset from existing state.

As DRM clients control the mode setting, they'd likely need some changes 
to handle state-readout. There's such code in i915's fbdev support AFAIK.

Best regards
Thomas

[1] 
https://lore.kernel.org/dri-devel/20250902-drm-state-readout-v1-0-14ad5315da3f@kernel.org/

>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: amd-gfx@lists.freedesktop.org
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Ce Sun <cesun102@amd.com>
> Cc: Chia-I Wu <olvaffe@gmail.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Deepak Rawat <drawat.floss@gmail.com>
> Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Gurchetan Singh <gurchetansingh@chromium.org>
> Cc: Hans de Goede <hansg@kernel.org>
> Cc: Hawking Zhang <Hawking.Zhang@amd.com>
> Cc: Helge Deller <deller@gmx.de>
> Cc: intel-gfx@lists.freedesktop.org
> Cc: intel-xe@lists.freedesktop.org
> Cc: Jani Nikula <jani.nikula@linux.intel.com>
> Cc: Javier Martinez Canillas <javierm@redhat.com>
> Cc: Jocelyn Falempe <jfalempe@redhat.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Lijo Lazar <lijo.lazar@amd.com>
> Cc: linux-efi@vger.kernel.org
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-hyperv@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: "Mario Limonciello (AMD)" <superm1@kernel.org>
> Cc: Mario Limonciello <mario.limonciello@amd.com>
> Cc: Maxime Ripard <mripard@kernel.org>
> Cc: nouveau@lists.freedesktop.org
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Simona Vetter <simona@ffwll.ch>
> Cc: spice-devel@lists.freedesktop.org
> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: "Timur Kristóf" <timur.kristof@gmail.com>
> Cc: Tvrtko Ursulin <tursulin@ursulin.net>
> Cc: virtualization@lists.linux.dev
> Cc: Vitaly Prosyak <vitaly.prosyak@amd.com>
>
> Zack Rusin (12):
>    video/aperture: Add sysfb restore on DRM probe failure
>    drm/vmwgfx: Use devm aperture helpers for sysfb restore on probe
>      failure
>    drm/xe: Use devm aperture helpers for sysfb restore on probe failure
>    drm/amdgpu: Use devm aperture helpers for sysfb restore on probe
>      failure
>    drm/virtio: Add sysfb restore on probe failure
>    drm/nouveau: Use devm aperture helpers for sysfb restore on probe
>      failure
>    drm/qxl: Use devm aperture helpers for sysfb restore on probe failure
>    drm/vboxvideo: Use devm aperture helpers for sysfb restore on probe
>      failure
>    drm/hyperv: Add sysfb restore on probe failure
>    drm/ast: Use devm aperture helpers for sysfb restore on probe failure
>    drm/radeon: Use devm aperture helpers for sysfb restore on probe
>      failure
>    drm/i915: Use devm aperture helpers for sysfb restore on probe failure
>
>   drivers/firmware/efi/sysfb_efi.c           |   2 +-
>   drivers/firmware/sysfb.c                   | 191 +++++++++++++--------
>   drivers/firmware/sysfb_simplefb.c          |  10 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   7 +
>   drivers/gpu/drm/ast/ast_drv.c              |  13 +-
>   drivers/gpu/drm/hyperv/hyperv_drm_drv.c    |  23 +++
>   drivers/gpu/drm/i915/i915_driver.c         |  13 +-
>   drivers/gpu/drm/nouveau/nouveau_drm.c      |  16 +-
>   drivers/gpu/drm/qxl/qxl_drv.c              |  14 +-
>   drivers/gpu/drm/radeon/radeon_drv.c        |  15 +-
>   drivers/gpu/drm/vboxvideo/vbox_drv.c       |  13 +-
>   drivers/gpu/drm/virtio/virtgpu_drv.c       |  29 ++++
>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c        |  13 +-
>   drivers/gpu/drm/xe/xe_device.c             |   7 +-
>   drivers/gpu/drm/xe/xe_pci.c                |   7 +
>   drivers/video/aperture.c                   |  54 ++++++
>   include/linux/aperture.h                   |  14 ++
>   include/linux/sysfb.h                      |   6 +
>   19 files changed, 368 insertions(+), 88 deletions(-)
>

-- 
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg)



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
  2026-01-09 10:34 ` [PATCH 00/12] Recover sysfb after DRM " Thomas Zimmermann
@ 2026-01-10  4:52   ` Zack Rusin
  2026-01-15 11:02     ` Thomas Zimmermann
  0 siblings, 1 reply; 28+ messages in thread
From: Zack Rusin @ 2026-01-10  4:52 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: dri-devel, Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun,
	Chia-I Wu, Christian König, Danilo Krummrich, Dave Airlie,
	Deepak Rawat, Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh,
	Hans de Goede, Hawking Zhang, Helge Deller, intel-gfx, intel-xe,
	Jani Nikula, Javier Martinez Canillas, Jocelyn Falempe,
	Joonas Lahtinen, Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv,
	linux-kernel, Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
	Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
	nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
	Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
	virtualization, Vitaly Prosyak

[-- Attachment #1: Type: text/plain, Size: 3519 bytes --]

On Fri, Jan 9, 2026 at 5:34 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> Hi
>
> Am 29.12.25 um 22:58 schrieb Zack Rusin:
> > Almost a rite of passage for every DRM developer and most Linux users
> > is upgrading your DRM driver/updating boot flags/changing some config
> > and having DRM driver fail at probe resulting in a blank screen.
> >
> > Currently there's no way to recover from DRM driver probe failure. PCI
> > DRM driver explicitly throw out the existing sysfb to get exclusive
> > access to PCI resources so if the probe fails the system is left without
> > a functioning display driver.
> >
> > Add code to sysfb to recever system framebuffer when DRM driver's probe
> > fails. This means that a DRM driver that fails to load reloads the system
> > framebuffer driver.
> >
> > This works best with simpledrm. Without it Xorg won't recover because
> > it still tries to load the vendor specific driver which ends up usually
> > not working at all. With simpledrm the system recovers really nicely
> > ending up with a working console and not a blank screen.
> >
> > There's a caveat in that some hardware might require some special magic
> > register write to recover EFI display. I'd appreciate it a lot if
> > maintainers could introduce a temporary failure in their drivers
> > probe to validate that the sysfb recovers and they get a working console.
> > The easiest way to double check it is by adding:
> >   /* XXX: Temporary failure to test sysfb restore - REMOVE BEFORE COMMIT */
> >   dev_info(&pdev->dev, "Testing sysfb restore: forcing probe failure\n");
> >   ret = -EINVAL;
> >   goto out_error;
> > or such right after the devm_aperture_remove_conflicting_pci_devices .
>
> Recovering the display like that is guess work and will at best work
> with simple discrete devices where the framebuffer is always located in
> a confined graphics aperture.
>
> But the problem you're trying to solve is a real one.
>
> What we'd want to do instead is to take the initial hardware state into
> account when we do the initial mode-setting operation.
>
> The first step is to move each driver's remove_conflicting_devices call
> to the latest possible location in the probe function. We usually do it
> first, because that's easy. But on most hardware, it could happen much
> later.

Well, some drivers (vbox, vmwgfx, bochs and currus-qemu) do it because
they request pci regions which is going to fail otherwise. Because
grabbining the pci resources is in general the very first thing that
those drivers need to do to setup anything, we
remove_conflicting_devices first or at least very early.

I also don't think it's possible or even desirable by some drivers to
reuse the initial state, good example here is vmwgfx where by default
some people will setup their vm's with e.g. 8mb ram, when the vmwgfx
loads we allow scanning out from system memory, so you can set your vm
up with 8mb of vram but still use 4k resolutions when the driver
loads, this way the suspend size of the vm is very predictable (tiny
vram plus whatever ram was setup) while still allowing a lot of
flexibility.

In general I think however this is planned it's two or three separate series:
1) infrastructure to reload the sysfb driver (what this series is)
2) making sure that drivers that do want to recover cleanly actually
clean out all the state on exit properly,
3) abstracting at least some of that cleanup in some driver independent way

z

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5414 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 05/12] drm/virtio: Add sysfb restore on probe failure
  2025-12-29 21:58 ` [PATCH 05/12] drm/virtio: Add " Zack Rusin
@ 2026-01-15 10:12   ` Dmitry Osipenko
  0 siblings, 0 replies; 28+ messages in thread
From: Dmitry Osipenko @ 2026-01-15 10:12 UTC (permalink / raw)
  To: Zack Rusin, dri-devel
  Cc: David Airlie, Gerd Hoffmann, Gurchetan Singh, Chia-I Wu,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter, virtualization, linux-kernel

On 12/30/25 00:58, Zack Rusin wrote:
> Register a devm action on the virtio device to restore the system
> framebuffer (efifb/simpledrm) if the driver's probe fails after
> removing the firmware framebuffer.
> 
> Unlike PCI drivers, virtio-gpu cannot use the
> devm_aperture_remove_conflicting_pci_devices() helper because the
> PCI device is managed by the virtio-pci driver, not by virtio-gpu.
> When virtio-gpu probe fails, the PCI device remains bound to
> virtio-pci, so devm actions registered on the PCI device won't fire.
> 
> Instead, register the sysfb restore action on the virtio device
> (&vdev->dev) which will be released if virtio-gpu probe fails.
> Cancel the action after successful probe since the driver is now
> responsible for display output.
> 
> This only applies to VGA devices where aperture_remove_conflicting_pci_devices()
> is called to remove the firmware framebuffer.
> 
> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
> Cc: David Airlie <airlied@redhat.com>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com>
> Cc: Gurchetan Singh <gurchetansingh@chromium.org>
> Cc: Chia-I Wu <olvaffe@gmail.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Maxime Ripard <mripard@kernel.org>
> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: Simona Vetter <simona@ffwll.ch>
> Cc: dri-devel@lists.freedesktop.org
> Cc: virtualization@lists.linux.dev
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.c | 29 ++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c
> index a5ce96fb8a1d..13cc8396fc78 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
> @@ -30,6 +30,7 @@
>  #include <linux/module.h>
>  #include <linux/pci.h>
>  #include <linux/poll.h>
> +#include <linux/sysfb.h>
>  #include <linux/vgaarb.h>
>  #include <linux/wait.h>
>  
> @@ -52,6 +53,11 @@ static int virtio_gpu_modeset = -1;
>  MODULE_PARM_DESC(modeset, "Disable/Enable modesetting");
>  module_param_named(modeset, virtio_gpu_modeset, int, 0400);
>  
> +static void virtio_gpu_restore_sysfb(void *unused)
> +{
> +	sysfb_restore();
> +}
> +
>  static int virtio_gpu_pci_quirk(struct drm_device *dev)
>  {
>  	struct pci_dev *pdev = to_pci_dev(dev->dev);
> @@ -75,6 +81,7 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
>  {
>  	struct drm_device *dev;
>  	int ret;
> +	bool sysfb_restore_registered = false;
>  
>  	if (drm_firmware_drivers_only() && virtio_gpu_modeset == -1)
>  		return -EINVAL;
> @@ -97,6 +104,21 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
>  		ret = virtio_gpu_pci_quirk(dev);
>  		if (ret)
>  			goto err_free;
> +
> +		/*
> +		 * For VGA devices, register sysfb restore on the virtio device.
> +		 * We can't use devm_aperture_remove_conflicting_pci_devices()
> +		 * because the PCI device is managed by virtio-pci, not us.
> +		 * Register on &vdev->dev so it fires if our probe fails.
> +		 */
> +		if (pci_is_vga(to_pci_dev(vdev->dev.parent))) {
> +			ret = devm_add_action_or_reset(&vdev->dev,
> +						       virtio_gpu_restore_sysfb,
> +						       NULL);
> +			if (ret)
> +				goto err_free;
> +			sysfb_restore_registered = true;
> +		}
>  	}
>  
>  	dma_set_max_seg_size(dev->dev, dma_max_mapping_size(dev->dev) ?: UINT_MAX);
> @@ -110,6 +132,13 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
>  
>  	drm_client_setup(vdev->priv, NULL);
>  
> +	/*
> +	 * Probe succeeded - cancel sysfb restore. We're now responsible
> +	 * for display output.
> +	 */
> +	if (sysfb_restore_registered)
> +		devm_remove_action(&vdev->dev, virtio_gpu_restore_sysfb, NULL);
> +
>  	return 0;
>  
>  err_deinit:

Acked-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>

-- 
Best regards,
Dmitry


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
  2026-01-10  4:52   ` Zack Rusin
@ 2026-01-15 11:02     ` Thomas Zimmermann
  2026-01-15 14:39       ` Christian König
  2026-01-16  3:59       ` Zack Rusin
  0 siblings, 2 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2026-01-15 11:02 UTC (permalink / raw)
  To: Zack Rusin
  Cc: dri-devel, Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun,
	Chia-I Wu, Christian König, Danilo Krummrich, Dave Airlie,
	Deepak Rawat, Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh,
	Hans de Goede, Hawking Zhang, Helge Deller, intel-gfx, intel-xe,
	Jani Nikula, Javier Martinez Canillas, Jocelyn Falempe,
	Joonas Lahtinen, Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv,
	linux-kernel, Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
	Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
	nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
	Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
	virtualization, Vitaly Prosyak

Hi,

apologies for the delay. I wanted to reply and then forgot about it.

Am 10.01.26 um 05:52 schrieb Zack Rusin:
> On Fri, Jan 9, 2026 at 5:34 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>> Hi
>>
>> Am 29.12.25 um 22:58 schrieb Zack Rusin:
>>> Almost a rite of passage for every DRM developer and most Linux users
>>> is upgrading your DRM driver/updating boot flags/changing some config
>>> and having DRM driver fail at probe resulting in a blank screen.
>>>
>>> Currently there's no way to recover from DRM driver probe failure. PCI
>>> DRM driver explicitly throw out the existing sysfb to get exclusive
>>> access to PCI resources so if the probe fails the system is left without
>>> a functioning display driver.
>>>
>>> Add code to sysfb to recever system framebuffer when DRM driver's probe
>>> fails. This means that a DRM driver that fails to load reloads the system
>>> framebuffer driver.
>>>
>>> This works best with simpledrm. Without it Xorg won't recover because
>>> it still tries to load the vendor specific driver which ends up usually
>>> not working at all. With simpledrm the system recovers really nicely
>>> ending up with a working console and not a blank screen.
>>>
>>> There's a caveat in that some hardware might require some special magic
>>> register write to recover EFI display. I'd appreciate it a lot if
>>> maintainers could introduce a temporary failure in their drivers
>>> probe to validate that the sysfb recovers and they get a working console.
>>> The easiest way to double check it is by adding:
>>>    /* XXX: Temporary failure to test sysfb restore - REMOVE BEFORE COMMIT */
>>>    dev_info(&pdev->dev, "Testing sysfb restore: forcing probe failure\n");
>>>    ret = -EINVAL;
>>>    goto out_error;
>>> or such right after the devm_aperture_remove_conflicting_pci_devices .
>> Recovering the display like that is guess work and will at best work
>> with simple discrete devices where the framebuffer is always located in
>> a confined graphics aperture.
>>
>> But the problem you're trying to solve is a real one.
>>
>> What we'd want to do instead is to take the initial hardware state into
>> account when we do the initial mode-setting operation.
>>
>> The first step is to move each driver's remove_conflicting_devices call
>> to the latest possible location in the probe function. We usually do it
>> first, because that's easy. But on most hardware, it could happen much
>> later.
> Well, some drivers (vbox, vmwgfx, bochs and currus-qemu) do it because
> they request pci regions which is going to fail otherwise. Because
> grabbining the pci resources is in general the very first thing that
> those drivers need to do to setup anything, we
> remove_conflicting_devices first or at least very early.

To my knowledge, requesting resources is more about correctness than a 
hard requirement to use an I/O or memory range. Has this changed?


>
> I also don't think it's possible or even desirable by some drivers to
> reuse the initial state, good example here is vmwgfx where by default
> some people will setup their vm's with e.g. 8mb ram, when the vmwgfx
> loads we allow scanning out from system memory, so you can set your vm
> up with 8mb of vram but still use 4k resolutions when the driver
> loads, this way the suspend size of the vm is very predictable (tiny
> vram plus whatever ram was setup) while still allowing a lot of
> flexibility.

If there's no initial state to switch from, the first modeset can fail 
while leaving the display unusable. There's no way around that. Going 
back to the old state is not an option unless the driver has been 
written to support this.

The case of vmwgfx is special, but does not effect the overall problem. 
For vmwgfx, it would be best to import that initial state and support a 
transparent modeset from vram to system memory (and back) at least 
during this initial state.


>
> In general I think however this is planned it's two or three separate series:
> 1) infrastructure to reload the sysfb driver (what this series is)
> 2) making sure that drivers that do want to recover cleanly actually
> clean out all the state on exit properly,
> 3) abstracting at least some of that cleanup in some driver independent way

That's really not going to work. For example, in the current series, you 
invoke devm_aperture_remove_conflicting_pci_devices_done() after 
drm_mode_reset(), drm_dev_register() and drm_client_setup(). Each of 
these calls can modify hardware state. In the case of _register() and 
_setup(), the DRM clients can perform a modeset, which destroys the 
initial hardware state. Patch 1 of this series removes the sysfb 
device/driver entirely. That should be a no-go as it significantly 
complicates recovery. For example, if the native drivers failed from an 
allocation failure, the sysfb device/driver is not likely to come back 
either. As the very first thing, the series should state which failures 
is is going to resolve, - failed hardware init, - invalid initial 
modesetting, - runtime errors (such ENOMEM, failed firmware loading), - 
others? And then specify how a recovery to sysfb could look in each 
supported scenario. In terms of implementation, make any transition 
between drivers gradually. The native driver needs to acquire the 
hardware resource (framebuffer and I/O apertures) without unloading the 
sysfb driver. Luckily there's struct drm_device.unplug, which does that. 
[1] Flipping this field disables hardware access for DRM drivers. All 
sysfb drivers support this. To get the sysfb drivers ready, I suggest 
dedicated helpers for each drivers aperture. The aperture helpers can 
use these callback to flip the DRM driver off and on again. For example, 
efidrm could do this as a minimum: int efidrm_aperture_suspend() { 
dev->unplug = true; remove_resource(/*framebuffer aperture*/) return 0 } 
int efidrm_aperture_resume() { insert_resource(/*framebuffer aperture*/) 
dev->unplug = false; return 0 } struct aperture_funcs 
efidrm_aperture_funcs { .suspend = efidrm_aperture_suspend, .resume = 
efidrm_aperture_resume, } Pass this struct when efidrm acquires the 
framebuffer aperture, so that the aperture helpers can control the 
behavior of efidrm. With this, a multi-step takeover from sysfb to 
native driver can be tried. It's still a massive effort that requires an 
audit of each driver's probing logic. There's no copy-paste pattern 
AFAICT. I suggest to pick one simple driver first and make a prototype. 
Let me also say that I DO like the general idea you're proposing. But if 
it was easy, we would likely have done it already. Best regards Thomas
>
> z

-- 
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg)



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
  2026-01-15 11:02     ` Thomas Zimmermann
@ 2026-01-15 14:39       ` Christian König
  2026-01-15 14:54         ` Thomas Zimmermann
  2026-01-15 15:10         ` Ville Syrjälä
  2026-01-16  3:59       ` Zack Rusin
  1 sibling, 2 replies; 28+ messages in thread
From: Christian König @ 2026-01-15 14:39 UTC (permalink / raw)
  To: Thomas Zimmermann, Zack Rusin
  Cc: dri-devel, Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun,
	Chia-I Wu, Danilo Krummrich, Dave Airlie, Deepak Rawat,
	Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh, Hans de Goede,
	Hawking Zhang, Helge Deller, intel-gfx, intel-xe, Jani Nikula,
	Javier Martinez Canillas, Jocelyn Falempe, Joonas Lahtinen,
	Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv, linux-kernel,
	Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
	Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
	nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
	Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
	virtualization, Vitaly Prosyak

Sorry to being late, but I only now realized what you are doing here.

On 1/15/26 12:02, Thomas Zimmermann wrote:
> Hi,
> 
> apologies for the delay. I wanted to reply and then forgot about it.
> 
> Am 10.01.26 um 05:52 schrieb Zack Rusin:
>> On Fri, Jan 9, 2026 at 5:34 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>> Hi
>>>
>>> Am 29.12.25 um 22:58 schrieb Zack Rusin:
>>>> Almost a rite of passage for every DRM developer and most Linux users
>>>> is upgrading your DRM driver/updating boot flags/changing some config
>>>> and having DRM driver fail at probe resulting in a blank screen.
>>>>
>>>> Currently there's no way to recover from DRM driver probe failure. PCI
>>>> DRM driver explicitly throw out the existing sysfb to get exclusive
>>>> access to PCI resources so if the probe fails the system is left without
>>>> a functioning display driver.
>>>>
>>>> Add code to sysfb to recever system framebuffer when DRM driver's probe
>>>> fails. This means that a DRM driver that fails to load reloads the system
>>>> framebuffer driver.
>>>>
>>>> This works best with simpledrm. Without it Xorg won't recover because
>>>> it still tries to load the vendor specific driver which ends up usually
>>>> not working at all. With simpledrm the system recovers really nicely
>>>> ending up with a working console and not a blank screen.
>>>>
>>>> There's a caveat in that some hardware might require some special magic
>>>> register write to recover EFI display. I'd appreciate it a lot if
>>>> maintainers could introduce a temporary failure in their drivers
>>>> probe to validate that the sysfb recovers and they get a working console.
>>>> The easiest way to double check it is by adding:
>>>>    /* XXX: Temporary failure to test sysfb restore - REMOVE BEFORE COMMIT */
>>>>    dev_info(&pdev->dev, "Testing sysfb restore: forcing probe failure\n");
>>>>    ret = -EINVAL;
>>>>    goto out_error;
>>>> or such right after the devm_aperture_remove_conflicting_pci_devices .
>>> Recovering the display like that is guess work and will at best work
>>> with simple discrete devices where the framebuffer is always located in
>>> a confined graphics aperture.
>>>
>>> But the problem you're trying to solve is a real one.
>>>
>>> What we'd want to do instead is to take the initial hardware state into
>>> account when we do the initial mode-setting operation.
>>>
>>> The first step is to move each driver's remove_conflicting_devices call
>>> to the latest possible location in the probe function. We usually do it
>>> first, because that's easy. But on most hardware, it could happen much
>>> later.
>> Well, some drivers (vbox, vmwgfx, bochs and currus-qemu) do it because
>> they request pci regions which is going to fail otherwise. Because
>> grabbining the pci resources is in general the very first thing that
>> those drivers need to do to setup anything, we
>> remove_conflicting_devices first or at least very early.
> 
> To my knowledge, requesting resources is more about correctness than a hard requirement to use an I/O or memory range. Has this changed?

Nope that is not correct.

At least for AMD GPUs remove_conflicting_devices() really early is necessary because otherwise some operations just result in a spontaneous system reboot.	

For example resizing the PCIe BAR giving access to VRAM or disabling VGA emulation (which AFAIK is used for EFI as well) is only possible when the VGA or EFI framebuffer driver is kicked out first.

And disabling VGA emulation is among the absolutely first steps you do to take over the scanout config.

So I absolutely clearly have to reject the amdgpu patch in this series, that will break tons of use cases.

Regards,
Christian.

>> I also don't think it's possible or even desirable by some drivers to
>> reuse the initial state, good example here is vmwgfx where by default
>> some people will setup their vm's with e.g. 8mb ram, when the vmwgfx
>> loads we allow scanning out from system memory, so you can set your vm
>> up with 8mb of vram but still use 4k resolutions when the driver
>> loads, this way the suspend size of the vm is very predictable (tiny
>> vram plus whatever ram was setup) while still allowing a lot of
>> flexibility.
> 
> If there's no initial state to switch from, the first modeset can fail while leaving the display unusable. There's no way around that. Going back to the old state is not an option unless the driver has been written to support this.
> 
> The case of vmwgfx is special, but does not effect the overall problem. For vmwgfx, it would be best to import that initial state and support a transparent modeset from vram to system memory (and back) at least during this initial state.
> 
> 
>>
>> In general I think however this is planned it's two or three separate series:
>> 1) infrastructure to reload the sysfb driver (what this series is)
>> 2) making sure that drivers that do want to recover cleanly actually
>> clean out all the state on exit properly,
>> 3) abstracting at least some of that cleanup in some driver independent way
> 
> That's really not going to work. For example, in the current series, you invoke devm_aperture_remove_conflicting_pci_devices_done() after drm_mode_reset(), drm_dev_register() and drm_client_setup(). Each of these calls can modify hardware state. In the case of _register() and _setup(), the DRM clients can perform a modeset, which destroys the initial hardware state. Patch 1 of this series removes the sysfb device/driver entirely. That should be a no-go as it significantly complicates recovery. For example, if the native drivers failed from an allocation failure, the sysfb device/driver is not likely to come back either. As the very first thing, the series should state which failures is is going to resolve, - failed hardware init, - invalid initial modesetting, - runtime errors (such ENOMEM, failed firmware loading), - others? And then specify how a recovery to sysfb could look in each supported scenario. In terms of implementation, make any transition between drivers
> gradually. The native driver needs to acquire the hardware resource (framebuffer and I/O apertures) without unloading the sysfb driver. Luckily there's struct drm_device.unplug, which does that. [1] Flipping this field disables hardware access for DRM drivers. All sysfb drivers support this. To get the sysfb drivers ready, I suggest dedicated helpers for each drivers aperture. The aperture helpers can use these callback to flip the DRM driver off and on again. For example, efidrm could do this as a minimum: int efidrm_aperture_suspend() { dev->unplug = true; remove_resource(/*framebuffer aperture*/) return 0 } int efidrm_aperture_resume() { insert_resource(/*framebuffer aperture*/) dev->unplug = false; return 0 } struct aperture_funcs efidrm_aperture_funcs { .suspend = efidrm_aperture_suspend, .resume = efidrm_aperture_resume, } Pass this struct when efidrm acquires the framebuffer aperture, so that the aperture helpers can control the behavior of efidrm. With this, a multi-
> step takeover from sysfb to native driver can be tried. It's still a massive effort that requires an audit of each driver's probing logic. There's no copy-paste pattern AFAICT. I suggest to pick one simple driver first and make a prototype. Let me also say that I DO like the general idea you're proposing. But if it was easy, we would likely have done it already. Best regards Thomas
>>
>> z
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
  2026-01-15 14:39       ` Christian König
@ 2026-01-15 14:54         ` Thomas Zimmermann
  2026-01-15 15:58           ` Christian König
  2026-01-15 15:10         ` Ville Syrjälä
  1 sibling, 1 reply; 28+ messages in thread
From: Thomas Zimmermann @ 2026-01-15 14:54 UTC (permalink / raw)
  To: Christian König, Zack Rusin
  Cc: dri-devel, Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun,
	Chia-I Wu, Danilo Krummrich, Dave Airlie, Deepak Rawat,
	Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh, Hans de Goede,
	Hawking Zhang, Helge Deller, intel-gfx, intel-xe, Jani Nikula,
	Javier Martinez Canillas, Jocelyn Falempe, Joonas Lahtinen,
	Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv, linux-kernel,
	Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
	Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
	nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
	Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
	virtualization, Vitaly Prosyak

Hi

Am 15.01.26 um 15:39 schrieb Christian König:
> Sorry to being late, but I only now realized what you are doing here.
>
> On 1/15/26 12:02, Thomas Zimmermann wrote:
>> Hi,
>>
>> apologies for the delay. I wanted to reply and then forgot about it.
>>
>> Am 10.01.26 um 05:52 schrieb Zack Rusin:
>>> On Fri, Jan 9, 2026 at 5:34 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>>> Hi
>>>>
>>>> Am 29.12.25 um 22:58 schrieb Zack Rusin:
>>>>> Almost a rite of passage for every DRM developer and most Linux users
>>>>> is upgrading your DRM driver/updating boot flags/changing some config
>>>>> and having DRM driver fail at probe resulting in a blank screen.
>>>>>
>>>>> Currently there's no way to recover from DRM driver probe failure. PCI
>>>>> DRM driver explicitly throw out the existing sysfb to get exclusive
>>>>> access to PCI resources so if the probe fails the system is left without
>>>>> a functioning display driver.
>>>>>
>>>>> Add code to sysfb to recever system framebuffer when DRM driver's probe
>>>>> fails. This means that a DRM driver that fails to load reloads the system
>>>>> framebuffer driver.
>>>>>
>>>>> This works best with simpledrm. Without it Xorg won't recover because
>>>>> it still tries to load the vendor specific driver which ends up usually
>>>>> not working at all. With simpledrm the system recovers really nicely
>>>>> ending up with a working console and not a blank screen.
>>>>>
>>>>> There's a caveat in that some hardware might require some special magic
>>>>> register write to recover EFI display. I'd appreciate it a lot if
>>>>> maintainers could introduce a temporary failure in their drivers
>>>>> probe to validate that the sysfb recovers and they get a working console.
>>>>> The easiest way to double check it is by adding:
>>>>>     /* XXX: Temporary failure to test sysfb restore - REMOVE BEFORE COMMIT */
>>>>>     dev_info(&pdev->dev, "Testing sysfb restore: forcing probe failure\n");
>>>>>     ret = -EINVAL;
>>>>>     goto out_error;
>>>>> or such right after the devm_aperture_remove_conflicting_pci_devices .
>>>> Recovering the display like that is guess work and will at best work
>>>> with simple discrete devices where the framebuffer is always located in
>>>> a confined graphics aperture.
>>>>
>>>> But the problem you're trying to solve is a real one.
>>>>
>>>> What we'd want to do instead is to take the initial hardware state into
>>>> account when we do the initial mode-setting operation.
>>>>
>>>> The first step is to move each driver's remove_conflicting_devices call
>>>> to the latest possible location in the probe function. We usually do it
>>>> first, because that's easy. But on most hardware, it could happen much
>>>> later.
>>> Well, some drivers (vbox, vmwgfx, bochs and currus-qemu) do it because
>>> they request pci regions which is going to fail otherwise. Because
>>> grabbining the pci resources is in general the very first thing that
>>> those drivers need to do to setup anything, we
>>> remove_conflicting_devices first or at least very early.
>> To my knowledge, requesting resources is more about correctness than a hard requirement to use an I/O or memory range. Has this changed?
> Nope that is not correct.
>
> At least for AMD GPUs remove_conflicting_devices() really early is necessary because otherwise some operations just result in a spontaneous system reboot.	

Here I was only talking about avoiding calls to request_resource() and 
similar interfaces.

>
> For example resizing the PCIe BAR giving access to VRAM or disabling VGA emulation (which AFAIK is used for EFI as well) is only possible when the VGA or EFI framebuffer driver is kicked out first.

Yeah, that's what I expected.

>
> And disabling VGA emulation is among the absolutely first steps you do to take over the scanout config.

Assuming the driver (or driver author) is careful, is it possible to 
only read state from AMD hardware at such an early time?

We usually do remove_conflicting_devices() as the first thing in most 
driver's probe function. As a first step, it would be helpful to 
postpone itto a later point.

>
> So I absolutely clearly have to reject the amdgpu patch in this series, that will break tons of use cases.

Don't worry, we're still in the early ideation phase.

Best regards
Thomas

>
> Regards,
> Christian.
>
>>> I also don't think it's possible or even desirable by some drivers to
>>> reuse the initial state, good example here is vmwgfx where by default
>>> some people will setup their vm's with e.g. 8mb ram, when the vmwgfx
>>> loads we allow scanning out from system memory, so you can set your vm
>>> up with 8mb of vram but still use 4k resolutions when the driver
>>> loads, this way the suspend size of the vm is very predictable (tiny
>>> vram plus whatever ram was setup) while still allowing a lot of
>>> flexibility.
>> If there's no initial state to switch from, the first modeset can fail while leaving the display unusable. There's no way around that. Going back to the old state is not an option unless the driver has been written to support this.
>>
>> The case of vmwgfx is special, but does not effect the overall problem. For vmwgfx, it would be best to import that initial state and support a transparent modeset from vram to system memory (and back) at least during this initial state.
>>
>>
>>> In general I think however this is planned it's two or three separate series:
>>> 1) infrastructure to reload the sysfb driver (what this series is)
>>> 2) making sure that drivers that do want to recover cleanly actually
>>> clean out all the state on exit properly,
>>> 3) abstracting at least some of that cleanup in some driver independent way
>> That's really not going to work. For example, in the current series, you invoke devm_aperture_remove_conflicting_pci_devices_done() after drm_mode_reset(), drm_dev_register() and drm_client_setup(). Each of these calls can modify hardware state. In the case of _register() and _setup(), the DRM clients can perform a modeset, which destroys the initial hardware state. Patch 1 of this series removes the sysfb device/driver entirely. That should be a no-go as it significantly complicates recovery. For example, if the native drivers failed from an allocation failure, the sysfb device/driver is not likely to come back either. As the very first thing, the series should state which failures is is going to resolve, - failed hardware init, - invalid initial modesetting, - runtime errors (such ENOMEM, failed firmware loading), - others? And then specify how a recovery to sysfb could look in each supported scenario. In terms of implementation, make any transition between drivers
>> gradually. The native driver needs to acquire the hardware resource (framebuffer and I/O apertures) without unloading the sysfb driver. Luckily there's struct drm_device.unplug, which does that. [1] Flipping this field disables hardware access for DRM drivers. All sysfb drivers support this. To get the sysfb drivers ready, I suggest dedicated helpers for each drivers aperture. The aperture helpers can use these callback to flip the DRM driver off and on again. For example, efidrm could do this as a minimum: int efidrm_aperture_suspend() { dev->unplug = true; remove_resource(/*framebuffer aperture*/) return 0 } int efidrm_aperture_resume() { insert_resource(/*framebuffer aperture*/) dev->unplug = false; return 0 } struct aperture_funcs efidrm_aperture_funcs { .suspend = efidrm_aperture_suspend, .resume = efidrm_aperture_resume, } Pass this struct when efidrm acquires the framebuffer aperture, so that the aperture helpers can control the behavior of efidrm. With this, a multi-
>> step takeover from sysfb to native driver can be tried. It's still a massive effort that requires an audit of each driver's probing logic. There's no copy-paste pattern AFAICT. I suggest to pick one simple driver first and make a prototype. Let me also say that I DO like the general idea you're proposing. But if it was easy, we would likely have done it already. Best regards Thomas
>>> z

-- 
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg)



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
  2026-01-15 14:39       ` Christian König
  2026-01-15 14:54         ` Thomas Zimmermann
@ 2026-01-15 15:10         ` Ville Syrjälä
  2026-01-15 16:36           ` Gerd Hoffmann
  2026-01-16  7:39           ` Thomas Zimmermann
  1 sibling, 2 replies; 28+ messages in thread
From: Ville Syrjälä @ 2026-01-15 15:10 UTC (permalink / raw)
  To: Christian König
  Cc: Thomas Zimmermann, Zack Rusin, dri-devel, Alex Deucher, amd-gfx,
	Ard Biesheuvel, Ce Sun, Chia-I Wu, Danilo Krummrich, Dave Airlie,
	Deepak Rawat, Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh,
	Hans de Goede, Hawking Zhang, Helge Deller, intel-gfx, intel-xe,
	Jani Nikula, Javier Martinez Canillas, Jocelyn Falempe,
	Joonas Lahtinen, Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv,
	linux-kernel, Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
	Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
	nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
	Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
	virtualization, Vitaly Prosyak

On Thu, Jan 15, 2026 at 03:39:00PM +0100, Christian König wrote:
> Sorry to being late, but I only now realized what you are doing here.
> 
> On 1/15/26 12:02, Thomas Zimmermann wrote:
> > Hi,
> > 
> > apologies for the delay. I wanted to reply and then forgot about it.
> > 
> > Am 10.01.26 um 05:52 schrieb Zack Rusin:
> >> On Fri, Jan 9, 2026 at 5:34 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >>> Hi
> >>>
> >>> Am 29.12.25 um 22:58 schrieb Zack Rusin:
> >>>> Almost a rite of passage for every DRM developer and most Linux users
> >>>> is upgrading your DRM driver/updating boot flags/changing some config
> >>>> and having DRM driver fail at probe resulting in a blank screen.
> >>>>
> >>>> Currently there's no way to recover from DRM driver probe failure. PCI
> >>>> DRM driver explicitly throw out the existing sysfb to get exclusive
> >>>> access to PCI resources so if the probe fails the system is left without
> >>>> a functioning display driver.
> >>>>
> >>>> Add code to sysfb to recever system framebuffer when DRM driver's probe
> >>>> fails. This means that a DRM driver that fails to load reloads the system
> >>>> framebuffer driver.
> >>>>
> >>>> This works best with simpledrm. Without it Xorg won't recover because
> >>>> it still tries to load the vendor specific driver which ends up usually
> >>>> not working at all. With simpledrm the system recovers really nicely
> >>>> ending up with a working console and not a blank screen.
> >>>>
> >>>> There's a caveat in that some hardware might require some special magic
> >>>> register write to recover EFI display. I'd appreciate it a lot if
> >>>> maintainers could introduce a temporary failure in their drivers
> >>>> probe to validate that the sysfb recovers and they get a working console.
> >>>> The easiest way to double check it is by adding:
> >>>>    /* XXX: Temporary failure to test sysfb restore - REMOVE BEFORE COMMIT */
> >>>>    dev_info(&pdev->dev, "Testing sysfb restore: forcing probe failure\n");
> >>>>    ret = -EINVAL;
> >>>>    goto out_error;
> >>>> or such right after the devm_aperture_remove_conflicting_pci_devices .
> >>> Recovering the display like that is guess work and will at best work
> >>> with simple discrete devices where the framebuffer is always located in
> >>> a confined graphics aperture.
> >>>
> >>> But the problem you're trying to solve is a real one.
> >>>
> >>> What we'd want to do instead is to take the initial hardware state into
> >>> account when we do the initial mode-setting operation.
> >>>
> >>> The first step is to move each driver's remove_conflicting_devices call
> >>> to the latest possible location in the probe function. We usually do it
> >>> first, because that's easy. But on most hardware, it could happen much
> >>> later.
> >> Well, some drivers (vbox, vmwgfx, bochs and currus-qemu) do it because
> >> they request pci regions which is going to fail otherwise. Because
> >> grabbining the pci resources is in general the very first thing that
> >> those drivers need to do to setup anything, we
> >> remove_conflicting_devices first or at least very early.
> > 
> > To my knowledge, requesting resources is more about correctness than a hard requirement to use an I/O or memory range. Has this changed?
> 
> Nope that is not correct.
> 
> At least for AMD GPUs remove_conflicting_devices() really early is necessary because otherwise some operations just result in a spontaneous system reboot.	
> 
> For example resizing the PCIe BAR giving access to VRAM or disabling VGA emulation (which AFAIK is used for EFI as well) is only possible when the VGA or EFI framebuffer driver is kicked out first.
> 
> And disabling VGA emulation is among the absolutely first steps you do to take over the scanout config.

It's similar for Intel. For us VGA emulation won't be used for
EFI boot, but we still can't have the previous driver poking
around in memory while the real driver is initializing. The
entire memory layout may get completely shuffled so there's
no telling where such memory accesses would land.

And I suppose reBAR is a concern for us as well.

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
  2026-01-15 14:54         ` Thomas Zimmermann
@ 2026-01-15 15:58           ` Christian König
  0 siblings, 0 replies; 28+ messages in thread
From: Christian König @ 2026-01-15 15:58 UTC (permalink / raw)
  To: Thomas Zimmermann, Zack Rusin
  Cc: dri-devel, Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun,
	Chia-I Wu, Danilo Krummrich, Dave Airlie, Deepak Rawat,
	Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh, Hans de Goede,
	Hawking Zhang, Helge Deller, intel-gfx, intel-xe, Jani Nikula,
	Javier Martinez Canillas, Jocelyn Falempe, Joonas Lahtinen,
	Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv, linux-kernel,
	Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
	Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
	nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
	Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
	virtualization, Vitaly Prosyak

On 1/15/26 15:54, Thomas Zimmermann wrote:
> Hi
> 
> Am 15.01.26 um 15:39 schrieb Christian König:
>> Sorry to being late, but I only now realized what you are doing here.
>>
>> On 1/15/26 12:02, Thomas Zimmermann wrote:
>>> Hi,
>>>
>>> apologies for the delay. I wanted to reply and then forgot about it.
>>>
>>> Am 10.01.26 um 05:52 schrieb Zack Rusin:
>>>> On Fri, Jan 9, 2026 at 5:34 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>>>> Hi
>>>>>
>>>>> Am 29.12.25 um 22:58 schrieb Zack Rusin:
>>>>>> Almost a rite of passage for every DRM developer and most Linux users
>>>>>> is upgrading your DRM driver/updating boot flags/changing some config
>>>>>> and having DRM driver fail at probe resulting in a blank screen.
>>>>>>
>>>>>> Currently there's no way to recover from DRM driver probe failure. PCI
>>>>>> DRM driver explicitly throw out the existing sysfb to get exclusive
>>>>>> access to PCI resources so if the probe fails the system is left without
>>>>>> a functioning display driver.
>>>>>>
>>>>>> Add code to sysfb to recever system framebuffer when DRM driver's probe
>>>>>> fails. This means that a DRM driver that fails to load reloads the system
>>>>>> framebuffer driver.
>>>>>>
>>>>>> This works best with simpledrm. Without it Xorg won't recover because
>>>>>> it still tries to load the vendor specific driver which ends up usually
>>>>>> not working at all. With simpledrm the system recovers really nicely
>>>>>> ending up with a working console and not a blank screen.
>>>>>>
>>>>>> There's a caveat in that some hardware might require some special magic
>>>>>> register write to recover EFI display. I'd appreciate it a lot if
>>>>>> maintainers could introduce a temporary failure in their drivers
>>>>>> probe to validate that the sysfb recovers and they get a working console.
>>>>>> The easiest way to double check it is by adding:
>>>>>>     /* XXX: Temporary failure to test sysfb restore - REMOVE BEFORE COMMIT */
>>>>>>     dev_info(&pdev->dev, "Testing sysfb restore: forcing probe failure\n");
>>>>>>     ret = -EINVAL;
>>>>>>     goto out_error;
>>>>>> or such right after the devm_aperture_remove_conflicting_pci_devices .
>>>>> Recovering the display like that is guess work and will at best work
>>>>> with simple discrete devices where the framebuffer is always located in
>>>>> a confined graphics aperture.
>>>>>
>>>>> But the problem you're trying to solve is a real one.
>>>>>
>>>>> What we'd want to do instead is to take the initial hardware state into
>>>>> account when we do the initial mode-setting operation.
>>>>>
>>>>> The first step is to move each driver's remove_conflicting_devices call
>>>>> to the latest possible location in the probe function. We usually do it
>>>>> first, because that's easy. But on most hardware, it could happen much
>>>>> later.
>>>> Well, some drivers (vbox, vmwgfx, bochs and currus-qemu) do it because
>>>> they request pci regions which is going to fail otherwise. Because
>>>> grabbining the pci resources is in general the very first thing that
>>>> those drivers need to do to setup anything, we
>>>> remove_conflicting_devices first or at least very early.
>>> To my knowledge, requesting resources is more about correctness than a hard requirement to use an I/O or memory range. Has this changed?
>> Nope that is not correct.
>>
>> At least for AMD GPUs remove_conflicting_devices() really early is necessary because otherwise some operations just result in a spontaneous system reboot.   
> 
> Here I was only talking about avoiding calls to request_resource() and similar interfaces.
> 
>>
>> For example resizing the PCIe BAR giving access to VRAM or disabling VGA emulation (which AFAIK is used for EFI as well) is only possible when the VGA or EFI framebuffer driver is kicked out first.
> 
> Yeah, that's what I expected.
> 
>>
>> And disabling VGA emulation is among the absolutely first steps you do to take over the scanout config.
> 
> Assuming the driver (or driver author) is careful, is it possible to only read state from AMD hardware at such an early time?

I'm not an expert for that particular stuff but I strongly don't think so.

Basically the VGA emulation is firmware which "owns" the CRTC registers and might modify them at any time unless it's turned off first.

So you can't even use data/index pairs of registers etc...

> We usually do remove_conflicting_devices() as the first thing in most driver's probe function. As a first step, it would be helpful to postpone itto a later point.

Well from what I knew that won't work in a lot of cases.

I mean what we could do on non-AMD HW is to remove the conflicting driver, play with the HW and if we find that this didn't worked reset the HW using a PCI function level reset and try to load the EFI or whatever driver again. But that has a rather low chance of working reliable I would say.

The problem with AMD GPUs is that the PCI function level reset is broken to begin with (which already caused us tons of headache in the case of pass through).

Regards,
Christian.

> 
>>
>> So I absolutely clearly have to reject the amdgpu patch in this series, that will break tons of use cases.
> 
> Don't worry, we're still in the early ideation phase.
> 
> Best regards
> Thomas
> 
>>
>> Regards,
>> Christian.
>>
>>>> I also don't think it's possible or even desirable by some drivers to
>>>> reuse the initial state, good example here is vmwgfx where by default
>>>> some people will setup their vm's with e.g. 8mb ram, when the vmwgfx
>>>> loads we allow scanning out from system memory, so you can set your vm
>>>> up with 8mb of vram but still use 4k resolutions when the driver
>>>> loads, this way the suspend size of the vm is very predictable (tiny
>>>> vram plus whatever ram was setup) while still allowing a lot of
>>>> flexibility.
>>> If there's no initial state to switch from, the first modeset can fail while leaving the display unusable. There's no way around that. Going back to the old state is not an option unless the driver has been written to support this.
>>>
>>> The case of vmwgfx is special, but does not effect the overall problem. For vmwgfx, it would be best to import that initial state and support a transparent modeset from vram to system memory (and back) at least during this initial state.
>>>
>>>
>>>> In general I think however this is planned it's two or three separate series:
>>>> 1) infrastructure to reload the sysfb driver (what this series is)
>>>> 2) making sure that drivers that do want to recover cleanly actually
>>>> clean out all the state on exit properly,
>>>> 3) abstracting at least some of that cleanup in some driver independent way
>>> That's really not going to work. For example, in the current series, you invoke devm_aperture_remove_conflicting_pci_devices_done() after drm_mode_reset(), drm_dev_register() and drm_client_setup(). Each of these calls can modify hardware state. In the case of _register() and _setup(), the DRM clients can perform a modeset, which destroys the initial hardware state. Patch 1 of this series removes the sysfb device/driver entirely. That should be a no-go as it significantly complicates recovery. For example, if the native drivers failed from an allocation failure, the sysfb device/driver is not likely to come back either. As the very first thing, the series should state which failures is is going to resolve, - failed hardware init, - invalid initial modesetting, - runtime errors (such ENOMEM, failed firmware loading), - others? And then specify how a recovery to sysfb could look in each supported scenario. In terms of implementation, make any transition between drivers
>>> gradually. The native driver needs to acquire the hardware resource (framebuffer and I/O apertures) without unloading the sysfb driver. Luckily there's struct drm_device.unplug, which does that. [1] Flipping this field disables hardware access for DRM drivers. All sysfb drivers support this. To get the sysfb drivers ready, I suggest dedicated helpers for each drivers aperture. The aperture helpers can use these callback to flip the DRM driver off and on again. For example, efidrm could do this as a minimum: int efidrm_aperture_suspend() { dev->unplug = true; remove_resource(/*framebuffer aperture*/) return 0 } int efidrm_aperture_resume() { insert_resource(/*framebuffer aperture*/) dev->unplug = false; return 0 } struct aperture_funcs efidrm_aperture_funcs { .suspend = efidrm_aperture_suspend, .resume = efidrm_aperture_resume, } Pass this struct when efidrm acquires the framebuffer aperture, so that the aperture helpers can control the behavior of efidrm. With this, a multi-
>>> step takeover from sysfb to native driver can be tried. It's still a massive effort that requires an audit of each driver's probing logic. There's no copy-paste pattern AFAICT. I suggest to pick one simple driver first and make a prototype. Let me also say that I DO like the general idea you're proposing. But if it was easy, we would likely have done it already. Best regards Thomas
>>>> z
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
  2026-01-15 15:10         ` Ville Syrjälä
@ 2026-01-15 16:36           ` Gerd Hoffmann
  2026-01-15 16:39             ` Mario Limonciello
  2026-01-16  7:39           ` Thomas Zimmermann
  1 sibling, 1 reply; 28+ messages in thread
From: Gerd Hoffmann @ 2026-01-15 16:36 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Christian König, Thomas Zimmermann, Zack Rusin, dri-devel,
	Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun, Chia-I Wu,
	Danilo Krummrich, Dave Airlie, Deepak Rawat, Dmitry Osipenko,
	Gurchetan Singh, Hans de Goede, Hawking Zhang, Helge Deller,
	intel-gfx, intel-xe, Jani Nikula, Javier Martinez Canillas,
	Jocelyn Falempe, Joonas Lahtinen, Lijo Lazar, linux-efi,
	linux-fbdev, linux-hyperv, linux-kernel, Lucas De Marchi,
	Lyude Paul, Maarten Lankhorst, Mario Limonciello (AMD),
	Mario Limonciello, Maxime Ripard, nouveau, Rodrigo Vivi,
	Simona Vetter, spice-devel, Thomas Hellström,
	Timur Kristóf, Tvrtko Ursulin, virtualization,
	Vitaly Prosyak

  Hi,

> > At least for AMD GPUs remove_conflicting_devices() really early is
> > necessary because otherwise some operations just result in a
> > spontaneous system reboot.	

> It's similar for Intel. For us VGA emulation won't be used for EFI
> boot, but we still can't have the previous driver poking around in
> memory while the real driver is initializing. The entire memory layout
> may get completely shuffled so there's no telling where such memory
> accesses would land.

Can you do stuff like checking which firmware is needed and whenever
that can be loaded from the filesystem before calling
remove_conflicting_devices() ?

take care,
  Gerd


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
  2026-01-15 16:36           ` Gerd Hoffmann
@ 2026-01-15 16:39             ` Mario Limonciello
  0 siblings, 0 replies; 28+ messages in thread
From: Mario Limonciello @ 2026-01-15 16:39 UTC (permalink / raw)
  To: Gerd Hoffmann, Ville Syrjälä
  Cc: Christian König, Thomas Zimmermann, Zack Rusin, dri-devel,
	Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun, Chia-I Wu,
	Danilo Krummrich, Dave Airlie, Deepak Rawat, Dmitry Osipenko,
	Gurchetan Singh, Hans de Goede, Hawking Zhang, Helge Deller,
	intel-gfx, intel-xe, Jani Nikula, Javier Martinez Canillas,
	Jocelyn Falempe, Joonas Lahtinen, Lijo Lazar, linux-efi,
	linux-fbdev, linux-hyperv, linux-kernel, Lucas De Marchi,
	Lyude Paul, Maarten Lankhorst, Mario Limonciello, Maxime Ripard,
	nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
	Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
	virtualization, Vitaly Prosyak

On 1/15/26 10:36 AM, Gerd Hoffmann wrote:
>    Hi,
> 
>>> At least for AMD GPUs remove_conflicting_devices() really early is
>>> necessary because otherwise some operations just result in a
>>> spontaneous system reboot.	
> 
>> It's similar for Intel. For us VGA emulation won't be used for EFI
>> boot, but we still can't have the previous driver poking around in
>> memory while the real driver is initializing. The entire memory layout
>> may get completely shuffled so there's no telling where such memory
>> accesses would land.
> 
> Can you do stuff like checking which firmware is needed and whenever
> that can be loaded from the filesystem before calling
> remove_conflicting_devices() ?
> 

That's something that I did in amdgpu a few years back.

I pushed the identification and ability to load firmware into early init 
stages.  It means that if you have a brand new GPU and run a modern 
kernel with an older linux-firmware snapshot amdgpu will fail probe and 
your framebuffer from EFI keeps working.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
  2026-01-15 11:02     ` Thomas Zimmermann
  2026-01-15 14:39       ` Christian König
@ 2026-01-16  3:59       ` Zack Rusin
  2026-01-16  7:58         ` Thomas Zimmermann
  1 sibling, 1 reply; 28+ messages in thread
From: Zack Rusin @ 2026-01-16  3:59 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: dri-devel, Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun,
	Chia-I Wu, Christian König, Danilo Krummrich, Dave Airlie,
	Deepak Rawat, Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh,
	Hans de Goede, Hawking Zhang, Helge Deller, intel-gfx, intel-xe,
	Jani Nikula, Javier Martinez Canillas, Jocelyn Falempe,
	Joonas Lahtinen, Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv,
	linux-kernel, Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
	Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
	nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
	Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
	virtualization, Vitaly Prosyak

[-- Attachment #1: Type: text/plain, Size: 2249 bytes --]

On Thu, Jan 15, 2026 at 6:02 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> That's really not going to work. For example, in the current series, you
> invoke devm_aperture_remove_conflicting_pci_devices_done() after
> drm_mode_reset(), drm_dev_register() and drm_client_setup().

That's perfectly fine,
devm_aperture_remove_conflicting_pci_devices_done is removing the
reload behavior not doing anything.

This series, essentially, just adds a "defer" statement to
aperture_remove_conflicting_pci_devices that says

"reload sysfb if this driver unloads".

devm_aperture_remove_conflicting_pci_devices_done just cancels that defer.

You could ask why have
devm_aperture_remove_conflicting_pci_devices_done at all then and it's
because I didn't want to change the default behavior of anything.

There are three cases:
1) Driver fails to load before
aperture_remove_conflicting_pci_devices, in which case sysfb is still
active and there's no problem,
2) Driver fails to load after aperture_remove_conflicting_pci_devices,
in which case sysfb is gone and the screen is blank
3) Driver is unloaded after the probe succeeded. igt tests this too.

Without devm_aperture_remove_conflicting_pci_devices_done we'd try to
reload sysfb in #3, which, in general makes sense to me and I'd
probably remove it in my drivers, but there might be people or tests
(again, igt does it and we don't need to flip-flop between sysfb and
the driver there) that depend on specifically that behavior of not
having anything driving fb so I didn't want to change it.

So with this series the worst case scenario is that the driver that
failed after aperture_remove_conflicting_pci_devices changed the
hardware state so much that sysfb can't recover and the fb is blank.
So it was blank before and this series can't fix it because the driver
in its cleanup routine will need to do more unwinding for sysfb to
reload (i.e. we'd need an extra patch to unwind the driver state).
There also might be the case of some crazy behavior, e.g. pci bar
resize in the driver makes the vga hardware crash or something, in
which case, yea, we should definitely skip this patch, at least until
those drivers properly cleanup on exit.

z

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5414 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
  2026-01-15 15:10         ` Ville Syrjälä
  2026-01-15 16:36           ` Gerd Hoffmann
@ 2026-01-16  7:39           ` Thomas Zimmermann
  1 sibling, 0 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2026-01-16  7:39 UTC (permalink / raw)
  To: Ville Syrjälä, Christian König
  Cc: Zack Rusin, dri-devel, Alex Deucher, amd-gfx, Ard Biesheuvel,
	Ce Sun, Chia-I Wu, Danilo Krummrich, Dave Airlie, Deepak Rawat,
	Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh, Hans de Goede,
	Hawking Zhang, Helge Deller, intel-gfx, intel-xe, Jani Nikula,
	Javier Martinez Canillas, Jocelyn Falempe, Joonas Lahtinen,
	Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv, linux-kernel,
	Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
	Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
	nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
	Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
	virtualization, Vitaly Prosyak

Hi

Am 15.01.26 um 16:10 schrieb Ville Syrjälä:
> On Thu, Jan 15, 2026 at 03:39:00PM +0100, Christian König wrote:
>> Sorry to being late, but I only now realized what you are doing here.
>>
>> On 1/15/26 12:02, Thomas Zimmermann wrote:
>>> Hi,
>>>
>>> apologies for the delay. I wanted to reply and then forgot about it.
>>>
>>> Am 10.01.26 um 05:52 schrieb Zack Rusin:
>>>> On Fri, Jan 9, 2026 at 5:34 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>>>> Hi
>>>>>
>>>>> Am 29.12.25 um 22:58 schrieb Zack Rusin:
>>>>>> Almost a rite of passage for every DRM developer and most Linux users
>>>>>> is upgrading your DRM driver/updating boot flags/changing some config
>>>>>> and having DRM driver fail at probe resulting in a blank screen.
>>>>>>
>>>>>> Currently there's no way to recover from DRM driver probe failure. PCI
>>>>>> DRM driver explicitly throw out the existing sysfb to get exclusive
>>>>>> access to PCI resources so if the probe fails the system is left without
>>>>>> a functioning display driver.
>>>>>>
>>>>>> Add code to sysfb to recever system framebuffer when DRM driver's probe
>>>>>> fails. This means that a DRM driver that fails to load reloads the system
>>>>>> framebuffer driver.
>>>>>>
>>>>>> This works best with simpledrm. Without it Xorg won't recover because
>>>>>> it still tries to load the vendor specific driver which ends up usually
>>>>>> not working at all. With simpledrm the system recovers really nicely
>>>>>> ending up with a working console and not a blank screen.
>>>>>>
>>>>>> There's a caveat in that some hardware might require some special magic
>>>>>> register write to recover EFI display. I'd appreciate it a lot if
>>>>>> maintainers could introduce a temporary failure in their drivers
>>>>>> probe to validate that the sysfb recovers and they get a working console.
>>>>>> The easiest way to double check it is by adding:
>>>>>>     /* XXX: Temporary failure to test sysfb restore - REMOVE BEFORE COMMIT */
>>>>>>     dev_info(&pdev->dev, "Testing sysfb restore: forcing probe failure\n");
>>>>>>     ret = -EINVAL;
>>>>>>     goto out_error;
>>>>>> or such right after the devm_aperture_remove_conflicting_pci_devices .
>>>>> Recovering the display like that is guess work and will at best work
>>>>> with simple discrete devices where the framebuffer is always located in
>>>>> a confined graphics aperture.
>>>>>
>>>>> But the problem you're trying to solve is a real one.
>>>>>
>>>>> What we'd want to do instead is to take the initial hardware state into
>>>>> account when we do the initial mode-setting operation.
>>>>>
>>>>> The first step is to move each driver's remove_conflicting_devices call
>>>>> to the latest possible location in the probe function. We usually do it
>>>>> first, because that's easy. But on most hardware, it could happen much
>>>>> later.
>>>> Well, some drivers (vbox, vmwgfx, bochs and currus-qemu) do it because
>>>> they request pci regions which is going to fail otherwise. Because
>>>> grabbining the pci resources is in general the very first thing that
>>>> those drivers need to do to setup anything, we
>>>> remove_conflicting_devices first or at least very early.
>>> To my knowledge, requesting resources is more about correctness than a hard requirement to use an I/O or memory range. Has this changed?
>> Nope that is not correct.
>>
>> At least for AMD GPUs remove_conflicting_devices() really early is necessary because otherwise some operations just result in a spontaneous system reboot.	
>>
>> For example resizing the PCIe BAR giving access to VRAM or disabling VGA emulation (which AFAIK is used for EFI as well) is only possible when the VGA or EFI framebuffer driver is kicked out first.
>>
>> And disabling VGA emulation is among the absolutely first steps you do to take over the scanout config.
> It's similar for Intel. For us VGA emulation won't be used for
> EFI boot, but we still can't have the previous driver poking
> around in memory while the real driver is initializing. The
> entire memory layout may get completely shuffled so there's
> no telling where such memory accesses would land.

Isn't there code in display/intel_fbdev.c that reads back the old state 
from hardware before initializing fbdev? [1] How does that work then? 
Wouldn't the HW state be invalid already?

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/v6.18.5/source/drivers/gpu/drm/i915/display/intel_fbdev.c#L356

>
> And I suppose reBAR is a concern for us as well.
>

-- 
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg)



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
  2026-01-16  3:59       ` Zack Rusin
@ 2026-01-16  7:58         ` Thomas Zimmermann
  2026-01-17  6:02           ` Zack Rusin
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Zimmermann @ 2026-01-16  7:58 UTC (permalink / raw)
  To: Zack Rusin
  Cc: dri-devel, Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun,
	Chia-I Wu, Christian König, Danilo Krummrich, Dave Airlie,
	Deepak Rawat, Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh,
	Hans de Goede, Hawking Zhang, Helge Deller, intel-gfx, intel-xe,
	Jani Nikula, Javier Martinez Canillas, Jocelyn Falempe,
	Joonas Lahtinen, Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv,
	linux-kernel, Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
	Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
	nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
	Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
	virtualization, Vitaly Prosyak

Hi

Am 16.01.26 um 04:59 schrieb Zack Rusin:
> On Thu, Jan 15, 2026 at 6:02 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>> That's really not going to work. For example, in the current series, you
>> invoke devm_aperture_remove_conflicting_pci_devices_done() after
>> drm_mode_reset(), drm_dev_register() and drm_client_setup().
> That's perfectly fine,
> devm_aperture_remove_conflicting_pci_devices_done is removing the
> reload behavior not doing anything.
>
> This series, essentially, just adds a "defer" statement to
> aperture_remove_conflicting_pci_devices that says
>
> "reload sysfb if this driver unloads".
>
> devm_aperture_remove_conflicting_pci_devices_done just cancels that defer.

Exactly. And if that reload happens after the hardware state has been 
changed, the result is undefined.

>
> You could ask why have
> devm_aperture_remove_conflicting_pci_devices_done at all then and it's
> because I didn't want to change the default behavior of anything.
>
> There are three cases:
> 1) Driver fails to load before
> aperture_remove_conflicting_pci_devices, in which case sysfb is still
> active and there's no problem,
> 2) Driver fails to load after aperture_remove_conflicting_pci_devices,
> in which case sysfb is gone and the screen is blank
> 3) Driver is unloaded after the probe succeeded. igt tests this too.
>
> Without devm_aperture_remove_conflicting_pci_devices_done we'd try to
> reload sysfb in #3, which, in general makes sense to me and I'd
> probably remove it in my drivers, but there might be people or tests
> (again, igt does it and we don't need to flip-flop between sysfb and
> the driver there) that depend on specifically that behavior of not
> having anything driving fb so I didn't want to change it.
>
> So with this series the worst case scenario is that the driver that
> failed after aperture_remove_conflicting_pci_devices changed the
> hardware state so much that sysfb can't recover and the fb is blank.
> So it was blank before and this series can't fix it because the driver
> in its cleanup routine will need to do more unwinding for sysfb to
> reload (i.e. we'd need an extra patch to unwind the driver state).

The current recovery/reload is not reliable in any case. A number of 
high-profile devs have also said that it doesn't work with their driver. 
The same is true for ast. So the current approach is not going to happen.

> There also might be the case of some crazy behavior, e.g. pci bar
> resize in the driver makes the vga hardware crash or something, in
> which case, yea, we should definitely skip this patch, at least until
> those drivers properly cleanup on exit.

There's nothing crazy here. It's standard probing code.

If you want to to move forward, my suggestion is to look at the proposal 
with the aperture_funcs callbacks that control sysfb device access. And 
from there, build a full prototype with one or two drivers.

Best regards
Thomas


>
> z

-- 
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg)



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
  2026-01-16  7:58         ` Thomas Zimmermann
@ 2026-01-17  6:02           ` Zack Rusin
  2026-01-19 10:03             ` Christian König
  0 siblings, 1 reply; 28+ messages in thread
From: Zack Rusin @ 2026-01-17  6:02 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: dri-devel, Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun,
	Chia-I Wu, Christian König, Danilo Krummrich, Dave Airlie,
	Deepak Rawat, Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh,
	Hans de Goede, Hawking Zhang, Helge Deller, intel-gfx, intel-xe,
	Jani Nikula, Javier Martinez Canillas, Jocelyn Falempe,
	Joonas Lahtinen, Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv,
	linux-kernel, Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
	Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
	nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
	Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
	virtualization, Vitaly Prosyak

[-- Attachment #1: Type: text/plain, Size: 2416 bytes --]

On Fri, Jan 16, 2026 at 2:58 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> Hi
>
> Am 16.01.26 um 04:59 schrieb Zack Rusin:
> > On Thu, Jan 15, 2026 at 6:02 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >> That's really not going to work. For example, in the current series, you
> >> invoke devm_aperture_remove_conflicting_pci_devices_done() after
> >> drm_mode_reset(), drm_dev_register() and drm_client_setup().
> > That's perfectly fine,
> > devm_aperture_remove_conflicting_pci_devices_done is removing the
> > reload behavior not doing anything.
> >
> > This series, essentially, just adds a "defer" statement to
> > aperture_remove_conflicting_pci_devices that says
> >
> > "reload sysfb if this driver unloads".
> >
> > devm_aperture_remove_conflicting_pci_devices_done just cancels that defer.
>
> Exactly. And if that reload happens after the hardware state has been
> changed, the result is undefined.

This is all predicated on drivers actually cleaning up after
themselves. I don't think any amount of good will or api design is
going to fix device specific state mismatches.

> The current recovery/reload is not reliable in any case. A number of
> high-profile devs have also said that it doesn't work with their driver.
> The same is true for ast. So the current approach is not going to happen.
>
> > There also might be the case of some crazy behavior, e.g. pci bar
> > resize in the driver makes the vga hardware crash or something, in
> > which case, yea, we should definitely skip this patch, at least until
> > those drivers properly cleanup on exit.
>
> There's nothing crazy here. It's standard probing code.
>
> If you want to to move forward, my suggestion is to look at the proposal
> with the aperture_funcs callbacks that control sysfb device access. And
> from there, build a full prototype with one or two drivers.

I don't think that approach is going to work. I don't think there's
anything that can be done if drivers didn't cleanup everything they've
done that might have broken sysfb on unload. I'm going to drop it
then, it's obviously a shame because it works fine with virtualized
drivers and they're ones that would likely profit from this the most
but I'm sceptical that I could do full system state set reset in a
generalized fashion for hw drivers or that the work required would be
worth the payoff.

z

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5414 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
  2026-01-17  6:02           ` Zack Rusin
@ 2026-01-19 10:03             ` Christian König
  0 siblings, 0 replies; 28+ messages in thread
From: Christian König @ 2026-01-19 10:03 UTC (permalink / raw)
  To: Zack Rusin, Thomas Zimmermann
  Cc: dri-devel, Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun,
	Chia-I Wu, Danilo Krummrich, Dave Airlie, Deepak Rawat,
	Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh, Hans de Goede,
	Hawking Zhang, Helge Deller, intel-gfx, intel-xe, Jani Nikula,
	Javier Martinez Canillas, Jocelyn Falempe, Joonas Lahtinen,
	Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv, linux-kernel,
	Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
	Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
	nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
	Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
	virtualization, Vitaly Prosyak

On 1/17/26 07:02, Zack Rusin wrote:
> On Fri, Jan 16, 2026 at 2:58 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>
>> Hi
>>
>> Am 16.01.26 um 04:59 schrieb Zack Rusin:
>>> On Thu, Jan 15, 2026 at 6:02 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>>> That's really not going to work. For example, in the current series, you
>>>> invoke devm_aperture_remove_conflicting_pci_devices_done() after
>>>> drm_mode_reset(), drm_dev_register() and drm_client_setup().
>>> That's perfectly fine,
>>> devm_aperture_remove_conflicting_pci_devices_done is removing the
>>> reload behavior not doing anything.
>>>
>>> This series, essentially, just adds a "defer" statement to
>>> aperture_remove_conflicting_pci_devices that says
>>>
>>> "reload sysfb if this driver unloads".
>>>
>>> devm_aperture_remove_conflicting_pci_devices_done just cancels that defer.
>>
>> Exactly. And if that reload happens after the hardware state has been
>> changed, the result is undefined.
> 
> This is all predicated on drivers actually cleaning up after
> themselves. I don't think any amount of good will or api design is
> going to fix device specific state mismatches.
> 
>> The current recovery/reload is not reliable in any case. A number of
>> high-profile devs have also said that it doesn't work with their driver.
>> The same is true for ast. So the current approach is not going to happen.
>>
>>> There also might be the case of some crazy behavior, e.g. pci bar
>>> resize in the driver makes the vga hardware crash or something, in
>>> which case, yea, we should definitely skip this patch, at least until
>>> those drivers properly cleanup on exit.
>>
>> There's nothing crazy here. It's standard probing code.
>>
>> If you want to to move forward, my suggestion is to look at the proposal
>> with the aperture_funcs callbacks that control sysfb device access. And
>> from there, build a full prototype with one or two drivers.
> 
> I don't think that approach is going to work. I don't think there's
> anything that can be done if drivers didn't cleanup everything they've
> done that might have broken sysfb on unload. I'm going to drop it
> then, it's obviously a shame because it works fine with virtualized
> drivers and they're ones that would likely profit from this the most
> but I'm sceptical that I could do full system state set reset in a
> generalized fashion for hw drivers or that the work required would be
> worth the payoff.

Well at least for PCI devices you could try doing a function level reset to get the HW back into some usable state.

This does *not* work for AMD HW since we have HW/FW bugs, but at least for your virtualized use case it might work.

All you need then is an EFI, Vesa or int10 call to re-init the HW to the pre-driver load setup.

I know that is not the easiest thing to do, but still better than a black screen.

Regards,
Christian.

> 
> z


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2026-01-19 10:03 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-29 21:58 [PATCH 00/12] Recover sysfb after DRM probe failure Zack Rusin
2025-12-29 21:58 ` [PATCH 01/12] video/aperture: Add sysfb restore on " Zack Rusin
2025-12-29 21:58 ` [PATCH 02/12] drm/vmwgfx: Use devm aperture helpers for sysfb restore on " Zack Rusin
2025-12-29 21:58 ` [PATCH 03/12] drm/xe: " Zack Rusin
2025-12-29 21:58 ` [PATCH 04/12] drm/amdgpu: " Zack Rusin
2025-12-29 21:58 ` [PATCH 05/12] drm/virtio: Add " Zack Rusin
2026-01-15 10:12   ` Dmitry Osipenko
2025-12-29 21:58 ` [PATCH 06/12] drm/nouveau: Use devm aperture helpers for " Zack Rusin
2025-12-29 21:58 ` [PATCH 07/12] drm/qxl: " Zack Rusin
2025-12-29 21:58 ` [PATCH 08/12] drm/vboxvideo: " Zack Rusin
2025-12-29 21:58 ` [PATCH 09/12] drm/hyperv: Add " Zack Rusin
2025-12-29 21:58 ` [PATCH 10/12] drm/ast: Use devm aperture helpers for " Zack Rusin
2025-12-29 21:58 ` [PATCH 11/12] drm/radeon: " Zack Rusin
2025-12-29 21:58 ` [PATCH 12/12] drm/i915: " Zack Rusin
2026-01-09 10:34 ` [PATCH 00/12] Recover sysfb after DRM " Thomas Zimmermann
2026-01-10  4:52   ` Zack Rusin
2026-01-15 11:02     ` Thomas Zimmermann
2026-01-15 14:39       ` Christian König
2026-01-15 14:54         ` Thomas Zimmermann
2026-01-15 15:58           ` Christian König
2026-01-15 15:10         ` Ville Syrjälä
2026-01-15 16:36           ` Gerd Hoffmann
2026-01-15 16:39             ` Mario Limonciello
2026-01-16  7:39           ` Thomas Zimmermann
2026-01-16  3:59       ` Zack Rusin
2026-01-16  7:58         ` Thomas Zimmermann
2026-01-17  6:02           ` Zack Rusin
2026-01-19 10:03             ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox