* [PATCH] fbdev: sys_fillrect: Add bounds checking to prevent vmalloc-out-of-bounds
From: Osama Abdelkader @ 2026-01-18 0:18 UTC (permalink / raw)
To: Zsolt Kajtar, Simona Vetter, Helge Deller, Osama Abdelkader,
Thomas Zimmermann, linux-fbdev, dri-devel, linux-kernel
Cc: syzbot+7a63ce155648954e749b
The sys_fillrect function was missing bounds validation, which could lead
to vmalloc-out-of-bounds writes when the rectangle coordinates extend
beyond the framebuffer's virtual resolution. This was detected by KASAN
and reported by syzkaller.
Add validation to:
1. Check that width and height are non-zero
2. Verify that dx and dy are within virtual resolution bounds
3. Clip the rectangle dimensions to fit within virtual resolution if needed
This follows the same pattern used in other framebuffer drivers like
pm2fb_fillrect.
Reported-by: syzbot+7a63ce155648954e749b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7a63ce155648954e749b
Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
---
drivers/video/fbdev/core/sysfillrect.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/core/sysfillrect.c b/drivers/video/fbdev/core/sysfillrect.c
index 12eea3e424bb..73fc322ff8fd 100644
--- a/drivers/video/fbdev/core/sysfillrect.c
+++ b/drivers/video/fbdev/core/sysfillrect.c
@@ -7,6 +7,7 @@
#include <linux/module.h>
#include <linux/fb.h>
#include <linux/bitrev.h>
+#include <linux/string.h>
#include <asm/types.h>
#ifdef CONFIG_FB_SYS_REV_PIXELS_IN_BYTE
@@ -18,10 +19,28 @@
void sys_fillrect(struct fb_info *p, const struct fb_fillrect *rect)
{
+ struct fb_fillrect modded;
+ int vxres, vyres;
+
if (!(p->flags & FBINFO_VIRTFB))
fb_warn_once(p, "%s: framebuffer is not in virtual address space.\n", __func__);
- fb_fillrect(p, rect);
+ vxres = p->var.xres_virtual;
+ vyres = p->var.yres_virtual;
+
+ /* Validate and clip rectangle to virtual resolution */
+ if (!rect->width || !rect->height ||
+ rect->dx >= vxres || rect->dy >= vyres)
+ return;
+
+ memcpy(&modded, rect, sizeof(struct fb_fillrect));
+
+ if (modded.dx + modded.width > vxres)
+ modded.width = vxres - modded.dx;
+ if (modded.dy + modded.height > vyres)
+ modded.height = vyres - modded.dy;
+
+ fb_fillrect(p, &modded);
}
EXPORT_SYMBOL(sys_fillrect);
--
2.43.0
^ permalink raw reply related
* Re: [PATCH v2,stable/linux-6.6.y] fbdev: Fix out-of-bounds issue in sys_fillrect()
From: Gu Bowen @ 2026-01-17 7:45 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Daniel Vetter, Helge Deller, linux-fbdev, dri-devel, stable,
Lu Jialin
In-Reply-To: <2025121715-vindicate-valium-1118@gregkh>
Hi Greg,
On 12/17/2025 5:34 PM, Greg Kroah-Hartman wrote:
> On Wed, Dec 17, 2025 at 05:45:30PM +0800, Gu Bowen wrote:
>> This issue has already been fixed by commit eabb03293087 ("fbdev:
>> Refactoring the fbcon packed pixel drawing routines") on v6.15-rc1, but it
>> still exists in the stable version.
>
> Why not take the refactoring changes instead? That is almost always the
> proper thing to do, one-off changes are almost always wrong and cause
> extra work in the long-term.
>
> Please try backporting those changes instead please.
>
> thanks,
>
> greg k-h
As you've suggested, I understand the preference to keep stable branches
aligned with upstream when possible. However, I find that the
refactoring touches many areas of the codebase that have diverged
between mainline and stable-6.6, resulting in extensive merge conflicts.
In addition, I cannot be certain that backporting 3000+ lines of
refactoring code to a stable branch might introduce unknown risks.
Given the current situation, I have another simpler patch solution that
is easy to maintain, and perhaps it could be merged into the stable branch:
void sys_fillrect(struct fb_info *p, const struct fb_fillrect *rect)
...
while (height--) {
dst += dst_idx >> (ffs(bits) - 1);
+ long dst_offset;
+ dst_offset = (unsigned long)dst - (unsigned long)p->screen_base;
+ if (dst_offset < 0 || dst_offset >= p->fix.smem_len)
+ return;
dst_idx &= (bits - 1);
fill_op32(p, dst, dst_idx, pat, width*bpp, bits);
...
BR,
Guber
^ permalink raw reply
* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
From: Zack Rusin @ 2026-01-17 6:02 UTC (permalink / raw)
To: Thomas Zimmermann
Cc: dri-devel, Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun,
Chia-I Wu, Christian König, Danilo Krummrich, Dave Airlie,
Deepak Rawat, Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh,
Hans de Goede, Hawking Zhang, Helge Deller, intel-gfx, intel-xe,
Jani Nikula, Javier Martinez Canillas, Jocelyn Falempe,
Joonas Lahtinen, Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv,
linux-kernel, Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
virtualization, Vitaly Prosyak
In-Reply-To: <f3643c19-c250-4927-b39d-37d2494c7c84@suse.de>
[-- Attachment #1: Type: text/plain, Size: 2416 bytes --]
On Fri, Jan 16, 2026 at 2:58 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> Hi
>
> Am 16.01.26 um 04:59 schrieb Zack Rusin:
> > On Thu, Jan 15, 2026 at 6:02 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >> That's really not going to work. For example, in the current series, you
> >> invoke devm_aperture_remove_conflicting_pci_devices_done() after
> >> drm_mode_reset(), drm_dev_register() and drm_client_setup().
> > That's perfectly fine,
> > devm_aperture_remove_conflicting_pci_devices_done is removing the
> > reload behavior not doing anything.
> >
> > This series, essentially, just adds a "defer" statement to
> > aperture_remove_conflicting_pci_devices that says
> >
> > "reload sysfb if this driver unloads".
> >
> > devm_aperture_remove_conflicting_pci_devices_done just cancels that defer.
>
> Exactly. And if that reload happens after the hardware state has been
> changed, the result is undefined.
This is all predicated on drivers actually cleaning up after
themselves. I don't think any amount of good will or api design is
going to fix device specific state mismatches.
> The current recovery/reload is not reliable in any case. A number of
> high-profile devs have also said that it doesn't work with their driver.
> The same is true for ast. So the current approach is not going to happen.
>
> > There also might be the case of some crazy behavior, e.g. pci bar
> > resize in the driver makes the vga hardware crash or something, in
> > which case, yea, we should definitely skip this patch, at least until
> > those drivers properly cleanup on exit.
>
> There's nothing crazy here. It's standard probing code.
>
> If you want to to move forward, my suggestion is to look at the proposal
> with the aperture_funcs callbacks that control sysfb device access. And
> from there, build a full prototype with one or two drivers.
I don't think that approach is going to work. I don't think there's
anything that can be done if drivers didn't cleanup everything they've
done that might have broken sysfb on unload. I'm going to drop it
then, it's obviously a shame because it works fine with virtualized
drivers and they're ones that would likely profit from this the most
but I'm sceptical that I could do full system state set reset in a
generalized fashion for hw drivers or that the work required would be
worth the payoff.
z
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5414 bytes --]
^ permalink raw reply
* [PATCH v7 2/2] staging: fbtft: Make framebuffer registration message debug-only
From: Chintan Patel @ 2026-01-17 4:29 UTC (permalink / raw)
To: linux-fbdev, linux-staging, linux-omap
Cc: linux-kernel, dri-devel, tzimmermann, andy, deller, gregkh,
Chintan Patel
In-Reply-To: <20260117042931.6088-1-chintanlike@gmail.com>
The framebuffer registration message is informational only and not
useful during normal operation. Convert it to debug-level logging to
keep the driver quiet when working correctly.
Signed-off-by: Chintan Patel <chintanlike@gmail.com>
---
drivers/staging/fbtft/fbtft-core.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/staging/fbtft/fbtft-core.c b/drivers/staging/fbtft/fbtft-core.c
index 1b3b62950205..f427c0914907 100644
--- a/drivers/staging/fbtft/fbtft-core.c
+++ b/drivers/staging/fbtft/fbtft-core.c
@@ -792,11 +792,11 @@ int fbtft_register_framebuffer(struct fb_info *fb_info)
if (spi)
sprintf(text2, ", spi%d.%d at %d MHz", spi->controller->bus_num,
spi_get_chipselect(spi, 0), spi->max_speed_hz / 1000000);
- fb_info(fb_info,
- "%s frame buffer, %dx%d, %d KiB video memory%s, fps=%lu%s\n",
- fb_info->fix.id, fb_info->var.xres, fb_info->var.yres,
- fb_info->fix.smem_len >> 10, text1,
- HZ / fb_info->fbdefio->delay, text2);
+ fb_dbg(fb_info,
+ "%s frame buffer, %dx%d, %d KiB video memory%s, fps=%lu%s\n",
+ fb_info->fix.id, fb_info->var.xres, fb_info->var.yres,
+ fb_info->fix.smem_len >> 10, text1,
+ HZ / fb_info->fbdefio->delay, text2);
/* Turn on backlight if available */
if (fb_info->bl_dev) {
--
2.43.0
^ permalink raw reply related
* [PATCH v7 1/2] staging: fbtft: Fix build failure when CONFIG_FB_DEVICE=n
From: Chintan Patel @ 2026-01-17 4:29 UTC (permalink / raw)
To: linux-fbdev, linux-staging, linux-omap
Cc: linux-kernel, dri-devel, tzimmermann, andy, deller, gregkh,
Chintan Patel, kernel test robot
When CONFIG_FB_DEVICE is disabled, struct fb_info does
not provide a valid dev pointer. Direct dereferences of
fb_info->dev therefore result in build failures.
Fix this by avoiding direct accesses to fb_info->dev and
switching the affected debug logging to framebuffer helpers
that do not rely on a device pointer.
This fixes the following build failure reported by the
kernel test robot.
Fixes: a06d03f9f238 ("staging: fbtft: Make FB_DEVICE dependency optional")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601110740.Y9XK5HtN-lkp@intel.com
Signed-off-by: Chintan Patel <chintanlike@gmail.com>
---
Changes in v7:
- Split logging cleanups into a separate patch
- Limit this patch to the CONFIG_FB_DEVICE=n build fix only
Changes in v6:
- Switch debug/info logging to fb_dbg() and fb_info()(suggested by Thomas Zimmermann)
- Drop dev_of_fbinfo() usage in favor of framebuffer helpers that implicitly
handle the debug/info context.
- Drop __func__ usage per review feedback(suggested by greg k-h)
- Add Fixes tag for a06d03f9f238 ("staging: fbtft: Make FB_DEVICE dependency optional")
(suggested by Andy Shevchenko)
Changes in v5:
- Initial attempt to replace info->dev accesses using
dev_of_fbinfo() helper
drivers/staging/fbtft/fbtft-core.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/drivers/staging/fbtft/fbtft-core.c b/drivers/staging/fbtft/fbtft-core.c
index 8a5ccc8ae0a1..1b3b62950205 100644
--- a/drivers/staging/fbtft/fbtft-core.c
+++ b/drivers/staging/fbtft/fbtft-core.c
@@ -365,9 +365,9 @@ static int fbtft_fb_setcolreg(unsigned int regno, unsigned int red,
unsigned int val;
int ret = 1;
- dev_dbg(info->dev,
- "%s(regno=%u, red=0x%X, green=0x%X, blue=0x%X, trans=0x%X)\n",
- __func__, regno, red, green, blue, transp);
+ fb_dbg(info,
+ "regno=%u, red=0x%X, green=0x%X, blue=0x%X, trans=0x%X\n",
+ regno, red, green, blue, transp);
switch (info->fix.visual) {
case FB_VISUAL_TRUECOLOR:
@@ -391,8 +391,7 @@ static int fbtft_fb_blank(int blank, struct fb_info *info)
struct fbtft_par *par = info->par;
int ret = -EINVAL;
- dev_dbg(info->dev, "%s(blank=%d)\n",
- __func__, blank);
+ fb_dbg(info, "blank=%d\n", blank);
if (!par->fbtftops.blank)
return ret;
@@ -793,11 +792,11 @@ int fbtft_register_framebuffer(struct fb_info *fb_info)
if (spi)
sprintf(text2, ", spi%d.%d at %d MHz", spi->controller->bus_num,
spi_get_chipselect(spi, 0), spi->max_speed_hz / 1000000);
- dev_info(fb_info->dev,
- "%s frame buffer, %dx%d, %d KiB video memory%s, fps=%lu%s\n",
- fb_info->fix.id, fb_info->var.xres, fb_info->var.yres,
- fb_info->fix.smem_len >> 10, text1,
- HZ / fb_info->fbdefio->delay, text2);
+ fb_info(fb_info,
+ "%s frame buffer, %dx%d, %d KiB video memory%s, fps=%lu%s\n",
+ fb_info->fix.id, fb_info->var.xres, fb_info->var.yres,
+ fb_info->fix.smem_len >> 10, text1,
+ HZ / fb_info->fbdefio->delay, text2);
/* Turn on backlight if available */
if (fb_info->bl_dev) {
--
2.43.0
^ permalink raw reply related
* Re: [PATCH v3 7/7] arm64: dts: qcom: msm8953-xiaomi-daisy: fix backlight
From: Konrad Dybcio @ 2026-01-16 10:07 UTC (permalink / raw)
To: Barnabás Czémán, Lee Jones, Daniel Thompson,
Jingoo Han, Pavel Machek, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Bjorn Andersson, Kiran Gunda, Helge Deller,
Luca Weiss, Konrad Dybcio, Eugene Lepshy, Gianluca Boiano,
Alejandro Tafalla
Cc: dri-devel, linux-leds, devicetree, linux-kernel, Daniel Thompson,
linux-arm-msm, linux-fbdev
In-Reply-To: <20260116-pmi8950-wled-v3-7-e6c93de84079@mainlining.org>
On 1/16/26 8:07 AM, Barnabás Czémán wrote:
> The backlight on this device is connected via 3 strings. Currently,
> the DT claims only two are present, which results in visible stripes
> on the display (since every third backlight string remains unconfigured).
>
> Fix the number of strings to avoid that.
>
> Fixes: 38d779c26395 ("arm64: dts: qcom: msm8953: Add device tree for Xiaomi Mi A2 Lite")
> Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org>
> ---
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Konrad
^ permalink raw reply
* Re: [PATCH v3 6/7] arm64: dts: qcom: msm8937-xiaomi-land: correct wled ovp value
From: Konrad Dybcio @ 2026-01-16 10:07 UTC (permalink / raw)
To: Barnabás Czémán, Lee Jones, Daniel Thompson,
Jingoo Han, Pavel Machek, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Bjorn Andersson, Kiran Gunda, Helge Deller,
Luca Weiss, Konrad Dybcio, Eugene Lepshy, Gianluca Boiano,
Alejandro Tafalla
Cc: dri-devel, linux-leds, devicetree, linux-kernel, Daniel Thompson,
linux-arm-msm, linux-fbdev
In-Reply-To: <20260116-pmi8950-wled-v3-6-e6c93de84079@mainlining.org>
On 1/16/26 8:07 AM, Barnabás Czémán wrote:
> PMI8950 doesn't actually support setting an OVP threshold value of
> 29.6 V. The closest allowed value is 29.5 V. Set that instead.
>
> Fixes: 2144f6d57d8e ("arm64: dts: qcom: Add Xiaomi Redmi 3S")
> Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org>
> ---
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Konrad
^ permalink raw reply
* [PATCH] video: of_display_timing: fix refcount leak in of_get_display_timings()
From: Weigang He @ 2026-01-16 9:57 UTC (permalink / raw)
To: deller; +Cc: linux-fbdev, dri-devel, linux-kernel, Weigang He, stable
of_parse_phandle() returns a device_node with refcount incremented,
which is stored in 'entry' and then copied to 'native_mode'. When the
error paths at lines 184 or 192 jump to 'entryfail', native_mode's
refcount is not decremented, causing a refcount leak.
Fix this by changing the goto target from 'entryfail' to 'timingfail',
which properly calls of_node_put(native_mode) before cleanup.
Fixes: cc3f414cf2e4 ("video: add of helper for display timings/videomode")
Cc: stable@vger.kernel.org
Signed-off-by: Weigang He <geoffreyhe2@gmail.com>
---
drivers/video/of_display_timing.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/video/of_display_timing.c b/drivers/video/of_display_timing.c
index bebd371c6b93e..1940c9505dd3b 100644
--- a/drivers/video/of_display_timing.c
+++ b/drivers/video/of_display_timing.c
@@ -181,7 +181,7 @@ struct display_timings *of_get_display_timings(const struct device_node *np)
if (disp->num_timings == 0) {
/* should never happen, as entry was already found above */
pr_err("%pOF: no timings specified\n", np);
- goto entryfail;
+ goto timingfail;
}
disp->timings = kcalloc(disp->num_timings,
@@ -189,7 +189,7 @@ struct display_timings *of_get_display_timings(const struct device_node *np)
GFP_KERNEL);
if (!disp->timings) {
pr_err("%pOF: could not allocate timings array\n", np);
- goto entryfail;
+ goto timingfail;
}
disp->num_timings = 0;
--
2.34.1
^ permalink raw reply related
* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
From: Thomas Zimmermann @ 2026-01-16 7:58 UTC (permalink / raw)
To: Zack Rusin
Cc: dri-devel, Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun,
Chia-I Wu, Christian König, Danilo Krummrich, Dave Airlie,
Deepak Rawat, Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh,
Hans de Goede, Hawking Zhang, Helge Deller, intel-gfx, intel-xe,
Jani Nikula, Javier Martinez Canillas, Jocelyn Falempe,
Joonas Lahtinen, Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv,
linux-kernel, Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
virtualization, Vitaly Prosyak
In-Reply-To: <CABQX2QMn_dTh2h44LRwB7+RxGqK3Jn+QCx38xWrzpNJG5SZ9-Q@mail.gmail.com>
Hi
Am 16.01.26 um 04:59 schrieb Zack Rusin:
> On Thu, Jan 15, 2026 at 6:02 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>> That's really not going to work. For example, in the current series, you
>> invoke devm_aperture_remove_conflicting_pci_devices_done() after
>> drm_mode_reset(), drm_dev_register() and drm_client_setup().
> That's perfectly fine,
> devm_aperture_remove_conflicting_pci_devices_done is removing the
> reload behavior not doing anything.
>
> This series, essentially, just adds a "defer" statement to
> aperture_remove_conflicting_pci_devices that says
>
> "reload sysfb if this driver unloads".
>
> devm_aperture_remove_conflicting_pci_devices_done just cancels that defer.
Exactly. And if that reload happens after the hardware state has been
changed, the result is undefined.
>
> You could ask why have
> devm_aperture_remove_conflicting_pci_devices_done at all then and it's
> because I didn't want to change the default behavior of anything.
>
> There are three cases:
> 1) Driver fails to load before
> aperture_remove_conflicting_pci_devices, in which case sysfb is still
> active and there's no problem,
> 2) Driver fails to load after aperture_remove_conflicting_pci_devices,
> in which case sysfb is gone and the screen is blank
> 3) Driver is unloaded after the probe succeeded. igt tests this too.
>
> Without devm_aperture_remove_conflicting_pci_devices_done we'd try to
> reload sysfb in #3, which, in general makes sense to me and I'd
> probably remove it in my drivers, but there might be people or tests
> (again, igt does it and we don't need to flip-flop between sysfb and
> the driver there) that depend on specifically that behavior of not
> having anything driving fb so I didn't want to change it.
>
> So with this series the worst case scenario is that the driver that
> failed after aperture_remove_conflicting_pci_devices changed the
> hardware state so much that sysfb can't recover and the fb is blank.
> So it was blank before and this series can't fix it because the driver
> in its cleanup routine will need to do more unwinding for sysfb to
> reload (i.e. we'd need an extra patch to unwind the driver state).
The current recovery/reload is not reliable in any case. A number of
high-profile devs have also said that it doesn't work with their driver.
The same is true for ast. So the current approach is not going to happen.
> There also might be the case of some crazy behavior, e.g. pci bar
> resize in the driver makes the vga hardware crash or something, in
> which case, yea, we should definitely skip this patch, at least until
> those drivers properly cleanup on exit.
There's nothing crazy here. It's standard probing code.
If you want to to move forward, my suggestion is to look at the proposal
with the aperture_funcs callbacks that control sysfb device access. And
from there, build a full prototype with one or two drivers.
Best regards
Thomas
>
> z
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg)
^ permalink raw reply
* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
From: Thomas Zimmermann @ 2026-01-16 7:39 UTC (permalink / raw)
To: Ville Syrjälä, Christian König
Cc: Zack Rusin, dri-devel, Alex Deucher, amd-gfx, Ard Biesheuvel,
Ce Sun, Chia-I Wu, Danilo Krummrich, Dave Airlie, Deepak Rawat,
Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh, Hans de Goede,
Hawking Zhang, Helge Deller, intel-gfx, intel-xe, Jani Nikula,
Javier Martinez Canillas, Jocelyn Falempe, Joonas Lahtinen,
Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv, linux-kernel,
Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
virtualization, Vitaly Prosyak
In-Reply-To: <aWkDYO1o9T1BhvXj@intel.com>
Hi
Am 15.01.26 um 16:10 schrieb Ville Syrjälä:
> On Thu, Jan 15, 2026 at 03:39:00PM +0100, Christian König wrote:
>> Sorry to being late, but I only now realized what you are doing here.
>>
>> On 1/15/26 12:02, Thomas Zimmermann wrote:
>>> Hi,
>>>
>>> apologies for the delay. I wanted to reply and then forgot about it.
>>>
>>> Am 10.01.26 um 05:52 schrieb Zack Rusin:
>>>> On Fri, Jan 9, 2026 at 5:34 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>>>> Hi
>>>>>
>>>>> Am 29.12.25 um 22:58 schrieb Zack Rusin:
>>>>>> Almost a rite of passage for every DRM developer and most Linux users
>>>>>> is upgrading your DRM driver/updating boot flags/changing some config
>>>>>> and having DRM driver fail at probe resulting in a blank screen.
>>>>>>
>>>>>> Currently there's no way to recover from DRM driver probe failure. PCI
>>>>>> DRM driver explicitly throw out the existing sysfb to get exclusive
>>>>>> access to PCI resources so if the probe fails the system is left without
>>>>>> a functioning display driver.
>>>>>>
>>>>>> Add code to sysfb to recever system framebuffer when DRM driver's probe
>>>>>> fails. This means that a DRM driver that fails to load reloads the system
>>>>>> framebuffer driver.
>>>>>>
>>>>>> This works best with simpledrm. Without it Xorg won't recover because
>>>>>> it still tries to load the vendor specific driver which ends up usually
>>>>>> not working at all. With simpledrm the system recovers really nicely
>>>>>> ending up with a working console and not a blank screen.
>>>>>>
>>>>>> There's a caveat in that some hardware might require some special magic
>>>>>> register write to recover EFI display. I'd appreciate it a lot if
>>>>>> maintainers could introduce a temporary failure in their drivers
>>>>>> probe to validate that the sysfb recovers and they get a working console.
>>>>>> The easiest way to double check it is by adding:
>>>>>> /* XXX: Temporary failure to test sysfb restore - REMOVE BEFORE COMMIT */
>>>>>> dev_info(&pdev->dev, "Testing sysfb restore: forcing probe failure\n");
>>>>>> ret = -EINVAL;
>>>>>> goto out_error;
>>>>>> or such right after the devm_aperture_remove_conflicting_pci_devices .
>>>>> Recovering the display like that is guess work and will at best work
>>>>> with simple discrete devices where the framebuffer is always located in
>>>>> a confined graphics aperture.
>>>>>
>>>>> But the problem you're trying to solve is a real one.
>>>>>
>>>>> What we'd want to do instead is to take the initial hardware state into
>>>>> account when we do the initial mode-setting operation.
>>>>>
>>>>> The first step is to move each driver's remove_conflicting_devices call
>>>>> to the latest possible location in the probe function. We usually do it
>>>>> first, because that's easy. But on most hardware, it could happen much
>>>>> later.
>>>> Well, some drivers (vbox, vmwgfx, bochs and currus-qemu) do it because
>>>> they request pci regions which is going to fail otherwise. Because
>>>> grabbining the pci resources is in general the very first thing that
>>>> those drivers need to do to setup anything, we
>>>> remove_conflicting_devices first or at least very early.
>>> To my knowledge, requesting resources is more about correctness than a hard requirement to use an I/O or memory range. Has this changed?
>> Nope that is not correct.
>>
>> At least for AMD GPUs remove_conflicting_devices() really early is necessary because otherwise some operations just result in a spontaneous system reboot.
>>
>> For example resizing the PCIe BAR giving access to VRAM or disabling VGA emulation (which AFAIK is used for EFI as well) is only possible when the VGA or EFI framebuffer driver is kicked out first.
>>
>> And disabling VGA emulation is among the absolutely first steps you do to take over the scanout config.
> It's similar for Intel. For us VGA emulation won't be used for
> EFI boot, but we still can't have the previous driver poking
> around in memory while the real driver is initializing. The
> entire memory layout may get completely shuffled so there's
> no telling where such memory accesses would land.
Isn't there code in display/intel_fbdev.c that reads back the old state
from hardware before initializing fbdev? [1] How does that work then?
Wouldn't the HW state be invalid already?
Best regards
Thomas
[1]
https://elixir.bootlin.com/linux/v6.18.5/source/drivers/gpu/drm/i915/display/intel_fbdev.c#L356
>
> And I suppose reBAR is a concern for us as well.
>
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg)
^ permalink raw reply
* [PATCH v3 7/7] arm64: dts: qcom: msm8953-xiaomi-daisy: fix backlight
From: Barnabás Czémán @ 2026-01-16 7:07 UTC (permalink / raw)
To: Lee Jones, Daniel Thompson, Jingoo Han, Pavel Machek, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Bjorn Andersson, Kiran Gunda,
Helge Deller, Luca Weiss, Konrad Dybcio, Eugene Lepshy,
Gianluca Boiano, Alejandro Tafalla
Cc: dri-devel, linux-leds, devicetree, linux-kernel, Daniel Thompson,
linux-arm-msm, linux-fbdev, Konrad Dybcio,
Barnabás Czémán
In-Reply-To: <20260116-pmi8950-wled-v3-0-e6c93de84079@mainlining.org>
The backlight on this device is connected via 3 strings. Currently,
the DT claims only two are present, which results in visible stripes
on the display (since every third backlight string remains unconfigured).
Fix the number of strings to avoid that.
Fixes: 38d779c26395 ("arm64: dts: qcom: msm8953: Add device tree for Xiaomi Mi A2 Lite")
Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org>
---
arch/arm64/boot/dts/qcom/msm8953-xiaomi-daisy.dts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/qcom/msm8953-xiaomi-daisy.dts b/arch/arm64/boot/dts/qcom/msm8953-xiaomi-daisy.dts
index ddd7af616794..59f873a06e4d 100644
--- a/arch/arm64/boot/dts/qcom/msm8953-xiaomi-daisy.dts
+++ b/arch/arm64/boot/dts/qcom/msm8953-xiaomi-daisy.dts
@@ -157,7 +157,7 @@ &pm8953_resin {
&pmi8950_wled {
qcom,current-limit-microamp = <20000>;
- qcom,num-strings = <2>;
+ qcom,num-strings = <3>;
status = "okay";
};
--
2.52.0
^ permalink raw reply related
* [PATCH v3 0/7] Fix PMI8994 WLED ovp values and more
From: Barnabás Czémán @ 2026-01-16 7:07 UTC (permalink / raw)
To: Lee Jones, Daniel Thompson, Jingoo Han, Pavel Machek, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Bjorn Andersson, Kiran Gunda,
Helge Deller, Luca Weiss, Konrad Dybcio, Eugene Lepshy,
Gianluca Boiano, Alejandro Tafalla
Cc: dri-devel, linux-leds, devicetree, linux-kernel, Daniel Thompson,
linux-arm-msm, linux-fbdev, Konrad Dybcio,
Barnabás Czémán, Krzysztof Kozlowski
This patch series fixes supported ovp values related to pmi8994 wled
and set same configuration for pmi8950 wled.
It also corrects wled related properties in xiaomi-daisy, xiaomi-land and
in xiaomi-vince.
Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org>
---
Changes in v3:
- pmi8950: reword the commit to make it more clearer
- Link to v2: https://lore.kernel.org/r/20260108-pmi8950-wled-v2-0-8687f23147d7@mainlining.org
Changes in v2:
- Rework ovp change to support pmi8994 also.
- Reword commits.
- dt-bindings: Set min max for qcom,ovp-millivolt.
- Link to v1: https://lore.kernel.org/r/20260107-pmi8950-wled-v1-0-5e52f5caa39c@mainlining.org
---
Barnabás Czémán (7):
dt-bindings: backlight: qcom-wled: Document ovp values for PMI8994
backlight: qcom-wled: Support ovp values for PMI8994
dt-bindings: backlight: qcom-wled: Document ovp values for PMI8950
backlight: qcom-wled: Change PM8950 WLED configurations
arm64: dts: qcom: msm8953-xiaomi-vince: correct wled ovp value
arm64: dts: qcom: msm8937-xiaomi-land: correct wled ovp value
arm64: dts: qcom: msm8953-xiaomi-daisy: fix backlight
.../bindings/leds/backlight/qcom-wled.yaml | 24 +++++++++++--
arch/arm64/boot/dts/qcom/msm8937-xiaomi-land.dts | 2 +-
arch/arm64/boot/dts/qcom/msm8953-xiaomi-daisy.dts | 2 +-
arch/arm64/boot/dts/qcom/msm8953-xiaomi-vince.dts | 2 +-
drivers/video/backlight/qcom-wled.c | 42 ++++++++++++++++++++--
5 files changed, 65 insertions(+), 7 deletions(-)
---
base-commit: f96074c6d01d8a5e9e2fccd0bba5f2ed654c1f2d
change-id: 20260107-pmi8950-wled-b014578f67a6
Best regards,
--
Barnabás Czémán <barnabas.czeman@mainlining.org>
^ permalink raw reply
* [PATCH v3 1/7] dt-bindings: backlight: qcom-wled: Document ovp values for PMI8994
From: Barnabás Czémán @ 2026-01-16 7:07 UTC (permalink / raw)
To: Lee Jones, Daniel Thompson, Jingoo Han, Pavel Machek, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Bjorn Andersson, Kiran Gunda,
Helge Deller, Luca Weiss, Konrad Dybcio, Eugene Lepshy,
Gianluca Boiano, Alejandro Tafalla
Cc: dri-devel, linux-leds, devicetree, linux-kernel, Daniel Thompson,
linux-arm-msm, linux-fbdev, Konrad Dybcio,
Barnabás Czémán, Krzysztof Kozlowski
In-Reply-To: <20260116-pmi8950-wled-v3-0-e6c93de84079@mainlining.org>
Document ovp values supported by wled found in PMI8994.
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org>
---
.../bindings/leds/backlight/qcom-wled.yaml | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/Documentation/devicetree/bindings/leds/backlight/qcom-wled.yaml b/Documentation/devicetree/bindings/leds/backlight/qcom-wled.yaml
index a8490781011d..19166186a1ff 100644
--- a/Documentation/devicetree/bindings/leds/backlight/qcom-wled.yaml
+++ b/Documentation/devicetree/bindings/leds/backlight/qcom-wled.yaml
@@ -98,8 +98,8 @@ properties:
description: |
Over-voltage protection limit. This property is for WLED4 only.
$ref: /schemas/types.yaml#/definitions/uint32
- enum: [ 18100, 19600, 29600, 31100 ]
- default: 29600
+ minimum: 17800
+ maximum: 31100
qcom,num-strings:
description: |
@@ -239,6 +239,24 @@ allOf:
minimum: 0
maximum: 4095
+ - if:
+ properties:
+ compatible:
+ contains:
+ const: qcom,pmi8994-wled
+
+ then:
+ properties:
+ qcom,ovp-millivolt:
+ enum: [ 17800, 19400, 29500, 31000 ]
+ default: 29500
+
+ else:
+ properties:
+ qcom,ovp-millivolt:
+ enum: [ 18100, 19600, 29600, 31100 ]
+ default: 29600
+
required:
- compatible
- reg
--
2.52.0
^ permalink raw reply related
* [PATCH v3 4/7] backlight: qcom-wled: Change PM8950 WLED configurations
From: Barnabás Czémán @ 2026-01-16 7:07 UTC (permalink / raw)
To: Lee Jones, Daniel Thompson, Jingoo Han, Pavel Machek, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Bjorn Andersson, Kiran Gunda,
Helge Deller, Luca Weiss, Konrad Dybcio, Eugene Lepshy,
Gianluca Boiano, Alejandro Tafalla
Cc: dri-devel, linux-leds, devicetree, linux-kernel, Daniel Thompson,
linux-arm-msm, linux-fbdev, Konrad Dybcio,
Barnabás Czémán
In-Reply-To: <20260116-pmi8950-wled-v3-0-e6c93de84079@mainlining.org>
PMI8950 WLED needs same configurations as PMI8994 WLED.
Fixes: 10258bf4534b ("backlight: qcom-wled: Add PMI8950 compatible")
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org>
---
drivers/video/backlight/qcom-wled.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/video/backlight/qcom-wled.c b/drivers/video/backlight/qcom-wled.c
index 5decbd39b789..8054e4787725 100644
--- a/drivers/video/backlight/qcom-wled.c
+++ b/drivers/video/backlight/qcom-wled.c
@@ -1455,7 +1455,8 @@ static int wled_configure(struct wled *wled)
break;
case 4:
- if (of_device_is_compatible(dev->of_node, "qcom,pmi8994-wled")) {
+ if (of_device_is_compatible(dev->of_node, "qcom,pmi8950-wled") ||
+ of_device_is_compatible(dev->of_node, "qcom,pmi8994-wled")) {
u32_opts = pmi8994_wled_opts;
size = ARRAY_SIZE(pmi8994_wled_opts);
} else {
--
2.52.0
^ permalink raw reply related
* [PATCH v3 6/7] arm64: dts: qcom: msm8937-xiaomi-land: correct wled ovp value
From: Barnabás Czémán @ 2026-01-16 7:07 UTC (permalink / raw)
To: Lee Jones, Daniel Thompson, Jingoo Han, Pavel Machek, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Bjorn Andersson, Kiran Gunda,
Helge Deller, Luca Weiss, Konrad Dybcio, Eugene Lepshy,
Gianluca Boiano, Alejandro Tafalla
Cc: dri-devel, linux-leds, devicetree, linux-kernel, Daniel Thompson,
linux-arm-msm, linux-fbdev, Konrad Dybcio,
Barnabás Czémán
In-Reply-To: <20260116-pmi8950-wled-v3-0-e6c93de84079@mainlining.org>
PMI8950 doesn't actually support setting an OVP threshold value of
29.6 V. The closest allowed value is 29.5 V. Set that instead.
Fixes: 2144f6d57d8e ("arm64: dts: qcom: Add Xiaomi Redmi 3S")
Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org>
---
arch/arm64/boot/dts/qcom/msm8937-xiaomi-land.dts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/qcom/msm8937-xiaomi-land.dts b/arch/arm64/boot/dts/qcom/msm8937-xiaomi-land.dts
index 91837ff940f1..4f301e7c6517 100644
--- a/arch/arm64/boot/dts/qcom/msm8937-xiaomi-land.dts
+++ b/arch/arm64/boot/dts/qcom/msm8937-xiaomi-land.dts
@@ -178,7 +178,7 @@ &pmi8950_wled {
qcom,num-strings = <2>;
qcom,external-pfet;
qcom,current-limit-microamp = <20000>;
- qcom,ovp-millivolt = <29600>;
+ qcom,ovp-millivolt = <29500>;
status = "okay";
};
--
2.52.0
^ permalink raw reply related
* [PATCH v3 5/7] arm64: dts: qcom: msm8953-xiaomi-vince: correct wled ovp value
From: Barnabás Czémán @ 2026-01-16 7:07 UTC (permalink / raw)
To: Lee Jones, Daniel Thompson, Jingoo Han, Pavel Machek, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Bjorn Andersson, Kiran Gunda,
Helge Deller, Luca Weiss, Konrad Dybcio, Eugene Lepshy,
Gianluca Boiano, Alejandro Tafalla
Cc: dri-devel, linux-leds, devicetree, linux-kernel, Daniel Thompson,
linux-arm-msm, linux-fbdev, Konrad Dybcio,
Barnabás Czémán
In-Reply-To: <20260116-pmi8950-wled-v3-0-e6c93de84079@mainlining.org>
PMI8950 doesn't actually support setting an OVP threshold value of
29.6 V. The closest allowed value is 29.5 V. Set that instead.
Fixes: aa17e707e04a ("arm64: dts: qcom: msm8953: Add device tree for Xiaomi Redmi 5 Plus")
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org>
---
arch/arm64/boot/dts/qcom/msm8953-xiaomi-vince.dts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/qcom/msm8953-xiaomi-vince.dts b/arch/arm64/boot/dts/qcom/msm8953-xiaomi-vince.dts
index d46325e79917..c2a290bf493c 100644
--- a/arch/arm64/boot/dts/qcom/msm8953-xiaomi-vince.dts
+++ b/arch/arm64/boot/dts/qcom/msm8953-xiaomi-vince.dts
@@ -169,7 +169,7 @@ &pm8953_resin {
&pmi8950_wled {
qcom,current-limit-microamp = <20000>;
- qcom,ovp-millivolt = <29600>;
+ qcom,ovp-millivolt = <29500>;
qcom,num-strings = <2>;
qcom,external-pfet;
qcom,cabc;
--
2.52.0
^ permalink raw reply related
* [PATCH v3 3/7] dt-bindings: backlight: qcom-wled: Document ovp values for PMI8950
From: Barnabás Czémán @ 2026-01-16 7:07 UTC (permalink / raw)
To: Lee Jones, Daniel Thompson, Jingoo Han, Pavel Machek, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Bjorn Andersson, Kiran Gunda,
Helge Deller, Luca Weiss, Konrad Dybcio, Eugene Lepshy,
Gianluca Boiano, Alejandro Tafalla
Cc: dri-devel, linux-leds, devicetree, linux-kernel, Daniel Thompson,
linux-arm-msm, linux-fbdev, Konrad Dybcio,
Barnabás Czémán, Krzysztof Kozlowski
In-Reply-To: <20260116-pmi8950-wled-v3-0-e6c93de84079@mainlining.org>
Document ovp values supported by wled found in PMI8950.
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org>
---
Documentation/devicetree/bindings/leds/backlight/qcom-wled.yaml | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/Documentation/devicetree/bindings/leds/backlight/qcom-wled.yaml b/Documentation/devicetree/bindings/leds/backlight/qcom-wled.yaml
index 19166186a1ff..a54448cfdb38 100644
--- a/Documentation/devicetree/bindings/leds/backlight/qcom-wled.yaml
+++ b/Documentation/devicetree/bindings/leds/backlight/qcom-wled.yaml
@@ -243,7 +243,9 @@ allOf:
properties:
compatible:
contains:
- const: qcom,pmi8994-wled
+ enum:
+ - qcom,pmi8950-wled
+ - qcom,pmi8994-wled
then:
properties:
--
2.52.0
^ permalink raw reply related
* [PATCH v3 2/7] backlight: qcom-wled: Support ovp values for PMI8994
From: Barnabás Czémán @ 2026-01-16 7:07 UTC (permalink / raw)
To: Lee Jones, Daniel Thompson, Jingoo Han, Pavel Machek, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Bjorn Andersson, Kiran Gunda,
Helge Deller, Luca Weiss, Konrad Dybcio, Eugene Lepshy,
Gianluca Boiano, Alejandro Tafalla
Cc: dri-devel, linux-leds, devicetree, linux-kernel, Daniel Thompson,
linux-arm-msm, linux-fbdev, Konrad Dybcio,
Barnabás Czémán
In-Reply-To: <20260116-pmi8950-wled-v3-0-e6c93de84079@mainlining.org>
WLED4 found in PMI8994 supports different ovp values.
Fixes: 6fc632d3e3e0 ("video: backlight: qcom-wled: Add PMI8994 compatible")
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org>
---
drivers/video/backlight/qcom-wled.c | 41 +++++++++++++++++++++++++++++++++++--
1 file changed, 39 insertions(+), 2 deletions(-)
diff --git a/drivers/video/backlight/qcom-wled.c b/drivers/video/backlight/qcom-wled.c
index a63bb42c8f8b..5decbd39b789 100644
--- a/drivers/video/backlight/qcom-wled.c
+++ b/drivers/video/backlight/qcom-wled.c
@@ -1244,6 +1244,15 @@ static const struct wled_var_cfg wled4_ovp_cfg = {
.size = ARRAY_SIZE(wled4_ovp_values),
};
+static const u32 pmi8994_wled_ovp_values[] = {
+ 31000, 29500, 19400, 17800,
+};
+
+static const struct wled_var_cfg pmi8994_wled_ovp_cfg = {
+ .values = pmi8994_wled_ovp_values,
+ .size = ARRAY_SIZE(pmi8994_wled_ovp_values),
+};
+
static inline u32 wled5_ovp_values_fn(u32 idx)
{
/*
@@ -1357,6 +1366,29 @@ static int wled_configure(struct wled *wled)
},
};
+ const struct wled_u32_opts pmi8994_wled_opts[] = {
+ {
+ .name = "qcom,current-boost-limit",
+ .val_ptr = &cfg->boost_i_limit,
+ .cfg = &wled4_boost_i_limit_cfg,
+ },
+ {
+ .name = "qcom,current-limit-microamp",
+ .val_ptr = &cfg->string_i_limit,
+ .cfg = &wled4_string_i_limit_cfg,
+ },
+ {
+ .name = "qcom,ovp-millivolt",
+ .val_ptr = &cfg->ovp,
+ .cfg = &pmi8994_wled_ovp_cfg,
+ },
+ {
+ .name = "qcom,switching-freq",
+ .val_ptr = &cfg->switch_freq,
+ .cfg = &wled3_switch_freq_cfg,
+ },
+ };
+
const struct wled_u32_opts wled5_opts[] = {
{
.name = "qcom,current-boost-limit",
@@ -1423,8 +1455,13 @@ static int wled_configure(struct wled *wled)
break;
case 4:
- u32_opts = wled4_opts;
- size = ARRAY_SIZE(wled4_opts);
+ if (of_device_is_compatible(dev->of_node, "qcom,pmi8994-wled")) {
+ u32_opts = pmi8994_wled_opts;
+ size = ARRAY_SIZE(pmi8994_wled_opts);
+ } else {
+ u32_opts = wled4_opts;
+ size = ARRAY_SIZE(wled4_opts);
+ }
*cfg = wled4_config_defaults;
wled->wled_set_brightness = wled4_set_brightness;
wled->wled_sync_toggle = wled3_sync_toggle;
--
2.52.0
^ permalink raw reply related
* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
From: Zack Rusin @ 2026-01-16 3:59 UTC (permalink / raw)
To: Thomas Zimmermann
Cc: dri-devel, Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun,
Chia-I Wu, Christian König, Danilo Krummrich, Dave Airlie,
Deepak Rawat, Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh,
Hans de Goede, Hawking Zhang, Helge Deller, intel-gfx, intel-xe,
Jani Nikula, Javier Martinez Canillas, Jocelyn Falempe,
Joonas Lahtinen, Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv,
linux-kernel, Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
virtualization, Vitaly Prosyak
In-Reply-To: <97993761-5884-4ada-b345-9fb64819e02a@suse.de>
[-- Attachment #1: Type: text/plain, Size: 2249 bytes --]
On Thu, Jan 15, 2026 at 6:02 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> That's really not going to work. For example, in the current series, you
> invoke devm_aperture_remove_conflicting_pci_devices_done() after
> drm_mode_reset(), drm_dev_register() and drm_client_setup().
That's perfectly fine,
devm_aperture_remove_conflicting_pci_devices_done is removing the
reload behavior not doing anything.
This series, essentially, just adds a "defer" statement to
aperture_remove_conflicting_pci_devices that says
"reload sysfb if this driver unloads".
devm_aperture_remove_conflicting_pci_devices_done just cancels that defer.
You could ask why have
devm_aperture_remove_conflicting_pci_devices_done at all then and it's
because I didn't want to change the default behavior of anything.
There are three cases:
1) Driver fails to load before
aperture_remove_conflicting_pci_devices, in which case sysfb is still
active and there's no problem,
2) Driver fails to load after aperture_remove_conflicting_pci_devices,
in which case sysfb is gone and the screen is blank
3) Driver is unloaded after the probe succeeded. igt tests this too.
Without devm_aperture_remove_conflicting_pci_devices_done we'd try to
reload sysfb in #3, which, in general makes sense to me and I'd
probably remove it in my drivers, but there might be people or tests
(again, igt does it and we don't need to flip-flop between sysfb and
the driver there) that depend on specifically that behavior of not
having anything driving fb so I didn't want to change it.
So with this series the worst case scenario is that the driver that
failed after aperture_remove_conflicting_pci_devices changed the
hardware state so much that sysfb can't recover and the fb is blank.
So it was blank before and this series can't fix it because the driver
in its cleanup routine will need to do more unwinding for sysfb to
reload (i.e. we'd need an extra patch to unwind the driver state).
There also might be the case of some crazy behavior, e.g. pci bar
resize in the driver makes the vga hardware crash or something, in
which case, yea, we should definitely skip this patch, at least until
those drivers properly cleanup on exit.
z
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5414 bytes --]
^ permalink raw reply
* Re: [PATCH v6] staging: fbtft: Use fbdev logging helpers when FB_DEVICE is disabled
From: Chintan Patel @ 2026-01-16 2:59 UTC (permalink / raw)
To: Thomas Zimmermann, linux-fbdev, linux-staging, linux-omap
Cc: linux-kernel, dri-devel, andy, deller, gregkh, kernel test robot
In-Reply-To: <1b83803a-b51f-4cc0-a836-b4417bfd6537@suse.de>
On 1/14/26 23:55, Thomas Zimmermann wrote:
> Hi
>
> Am 13.01.26 um 05:59 schrieb Chintan Patel:
>> Replace direct accesses to info->dev with fb_dbg() and fb_info()
>> helpers to avoid build failures when CONFIG_FB_DEVICE=n.
>>
>> Fixes: a06d03f9f238 ("staging: fbtft: Make FB_DEVICE dependency
>> optional")
>> Reported-by: kernel test robot <lkp@intel.com>
>> Closes: https://lore.kernel.org/oe-kbuild-all/202601110740.Y9XK5HtN-
>> lkp@intel.com
>> Signed-off-by: Chintan Patel <chintanlike@gmail.com>
>>
>> Changes in v6:
>> - Switch debug/info logging to fb_dbg() and fb_info()(suggested by
>> Thomas Zimmermann)
>> - Drop dev_of_fbinfo() usage in favor of framebuffer helpers that
>> implicitly
>> handle the debug/info context.
>> - Drop __func__ usage per review feedback(suggested by greg k-h)
>> - Add Fixes tag for a06d03f9f238 ("staging: fbtft: Make FB_DEVICE
>> dependency optional")
>> (suggested by Andy Shevchenko)
>>
>> Changes in v5:
>> - Initial attempt to replace info->dev accesses using
>> dev_of_fbinfo() helper
>> ---
>> drivers/staging/fbtft/fbtft-core.c | 19 +++++++++----------
>> 1 file changed, 9 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/staging/fbtft/fbtft-core.c b/drivers/staging/
>> fbtft/fbtft-core.c
>> index 8a5ccc8ae0a1..1b3b62950205 100644
>> --- a/drivers/staging/fbtft/fbtft-core.c
>> +++ b/drivers/staging/fbtft/fbtft-core.c
>> @@ -365,9 +365,9 @@ static int fbtft_fb_setcolreg(unsigned int regno,
>> unsigned int red,
>> unsigned int val;
>> int ret = 1;
>> - dev_dbg(info->dev,
>> - "%s(regno=%u, red=0x%X, green=0x%X, blue=0x%X, trans=0x%X)\n",
>> - __func__, regno, red, green, blue, transp);
>> + fb_dbg(info,
>> + "regno=%u, red=0x%X, green=0x%X, blue=0x%X, trans=0x%X\n",
>> + regno, red, green, blue, transp);
>> switch (info->fix.visual) {
>> case FB_VISUAL_TRUECOLOR:
>> @@ -391,8 +391,7 @@ static int fbtft_fb_blank(int blank, struct
>> fb_info *info)
>> struct fbtft_par *par = info->par;
>> int ret = -EINVAL;
>> - dev_dbg(info->dev, "%s(blank=%d)\n",
>> - __func__, blank);
>> + fb_dbg(info, "blank=%d\n", blank);
>> if (!par->fbtftops.blank)
>> return ret;
>> @@ -793,11 +792,11 @@ int fbtft_register_framebuffer(struct fb_info
>> *fb_info)
>> if (spi)
>> sprintf(text2, ", spi%d.%d at %d MHz", spi->controller-
>> >bus_num,
>> spi_get_chipselect(spi, 0), spi->max_speed_hz / 1000000);
>> - dev_info(fb_info->dev,
>> - "%s frame buffer, %dx%d, %d KiB video memory%s, fps=%lu%s\n",
>> - fb_info->fix.id, fb_info->var.xres, fb_info->var.yres,
>> - fb_info->fix.smem_len >> 10, text1,
>> - HZ / fb_info->fbdefio->delay, text2);
>> + fb_info(fb_info,
>> + "%s frame buffer, %dx%d, %d KiB video memory%s, fps=%lu%s\n",
>> + fb_info->fix.id, fb_info->var.xres, fb_info->var.yres,
>> + fb_info->fix.smem_len >> 10, text1,
>> + HZ / fb_info->fbdefio->delay, text2);
>
> As discussed before, this should become fb_dbg(). Drivers should not
> print status reports unless they do not work as expected.
Agree - I will send 2 patches(series) as per feedback 1) a patch focused
purely on fixing the compilation issue by avoiding info->dev
dereferences (using fb_dbg() where logging remains), and
2) a follow-up cleanup that downgrades the framebuffer
registration message to debug level.
> Best regards
> Thomas
>
>> /* Turn on backlight if available */
>> if (fb_info->bl_dev) {
>
^ permalink raw reply
* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
From: Mario Limonciello @ 2026-01-15 16:39 UTC (permalink / raw)
To: Gerd Hoffmann, Ville Syrjälä
Cc: Christian König, Thomas Zimmermann, Zack Rusin, dri-devel,
Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun, Chia-I Wu,
Danilo Krummrich, Dave Airlie, Deepak Rawat, Dmitry Osipenko,
Gurchetan Singh, Hans de Goede, Hawking Zhang, Helge Deller,
intel-gfx, intel-xe, Jani Nikula, Javier Martinez Canillas,
Jocelyn Falempe, Joonas Lahtinen, Lijo Lazar, linux-efi,
linux-fbdev, linux-hyperv, linux-kernel, Lucas De Marchi,
Lyude Paul, Maarten Lankhorst, Mario Limonciello, Maxime Ripard,
nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
virtualization, Vitaly Prosyak
In-Reply-To: <aWkWSnJ7Xn6ukW-b@sirius.home.kraxel.org>
On 1/15/26 10:36 AM, Gerd Hoffmann wrote:
> Hi,
>
>>> At least for AMD GPUs remove_conflicting_devices() really early is
>>> necessary because otherwise some operations just result in a
>>> spontaneous system reboot.
>
>> It's similar for Intel. For us VGA emulation won't be used for EFI
>> boot, but we still can't have the previous driver poking around in
>> memory while the real driver is initializing. The entire memory layout
>> may get completely shuffled so there's no telling where such memory
>> accesses would land.
>
> Can you do stuff like checking which firmware is needed and whenever
> that can be loaded from the filesystem before calling
> remove_conflicting_devices() ?
>
That's something that I did in amdgpu a few years back.
I pushed the identification and ability to load firmware into early init
stages. It means that if you have a brand new GPU and run a modern
kernel with an older linux-firmware snapshot amdgpu will fail probe and
your framebuffer from EFI keeps working.
^ permalink raw reply
* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
From: Gerd Hoffmann @ 2026-01-15 16:36 UTC (permalink / raw)
To: Ville Syrjälä
Cc: Christian König, Thomas Zimmermann, Zack Rusin, dri-devel,
Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun, Chia-I Wu,
Danilo Krummrich, Dave Airlie, Deepak Rawat, Dmitry Osipenko,
Gurchetan Singh, Hans de Goede, Hawking Zhang, Helge Deller,
intel-gfx, intel-xe, Jani Nikula, Javier Martinez Canillas,
Jocelyn Falempe, Joonas Lahtinen, Lijo Lazar, linux-efi,
linux-fbdev, linux-hyperv, linux-kernel, Lucas De Marchi,
Lyude Paul, Maarten Lankhorst, Mario Limonciello (AMD),
Mario Limonciello, Maxime Ripard, nouveau, Rodrigo Vivi,
Simona Vetter, spice-devel, Thomas Hellström,
Timur Kristóf, Tvrtko Ursulin, virtualization,
Vitaly Prosyak
In-Reply-To: <aWkDYO1o9T1BhvXj@intel.com>
Hi,
> > At least for AMD GPUs remove_conflicting_devices() really early is
> > necessary because otherwise some operations just result in a
> > spontaneous system reboot.
> It's similar for Intel. For us VGA emulation won't be used for EFI
> boot, but we still can't have the previous driver poking around in
> memory while the real driver is initializing. The entire memory layout
> may get completely shuffled so there's no telling where such memory
> accesses would land.
Can you do stuff like checking which firmware is needed and whenever
that can be loaded from the filesystem before calling
remove_conflicting_devices() ?
take care,
Gerd
^ permalink raw reply
* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
From: Christian König @ 2026-01-15 15:58 UTC (permalink / raw)
To: Thomas Zimmermann, Zack Rusin
Cc: dri-devel, Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun,
Chia-I Wu, Danilo Krummrich, Dave Airlie, Deepak Rawat,
Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh, Hans de Goede,
Hawking Zhang, Helge Deller, intel-gfx, intel-xe, Jani Nikula,
Javier Martinez Canillas, Jocelyn Falempe, Joonas Lahtinen,
Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv, linux-kernel,
Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
virtualization, Vitaly Prosyak
In-Reply-To: <4ee824d5-8ea0-4ae1-8bcb-5f8cbae37fc8@suse.de>
On 1/15/26 15:54, Thomas Zimmermann wrote:
> Hi
>
> Am 15.01.26 um 15:39 schrieb Christian König:
>> Sorry to being late, but I only now realized what you are doing here.
>>
>> On 1/15/26 12:02, Thomas Zimmermann wrote:
>>> Hi,
>>>
>>> apologies for the delay. I wanted to reply and then forgot about it.
>>>
>>> Am 10.01.26 um 05:52 schrieb Zack Rusin:
>>>> On Fri, Jan 9, 2026 at 5:34 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>>>> Hi
>>>>>
>>>>> Am 29.12.25 um 22:58 schrieb Zack Rusin:
>>>>>> Almost a rite of passage for every DRM developer and most Linux users
>>>>>> is upgrading your DRM driver/updating boot flags/changing some config
>>>>>> and having DRM driver fail at probe resulting in a blank screen.
>>>>>>
>>>>>> Currently there's no way to recover from DRM driver probe failure. PCI
>>>>>> DRM driver explicitly throw out the existing sysfb to get exclusive
>>>>>> access to PCI resources so if the probe fails the system is left without
>>>>>> a functioning display driver.
>>>>>>
>>>>>> Add code to sysfb to recever system framebuffer when DRM driver's probe
>>>>>> fails. This means that a DRM driver that fails to load reloads the system
>>>>>> framebuffer driver.
>>>>>>
>>>>>> This works best with simpledrm. Without it Xorg won't recover because
>>>>>> it still tries to load the vendor specific driver which ends up usually
>>>>>> not working at all. With simpledrm the system recovers really nicely
>>>>>> ending up with a working console and not a blank screen.
>>>>>>
>>>>>> There's a caveat in that some hardware might require some special magic
>>>>>> register write to recover EFI display. I'd appreciate it a lot if
>>>>>> maintainers could introduce a temporary failure in their drivers
>>>>>> probe to validate that the sysfb recovers and they get a working console.
>>>>>> The easiest way to double check it is by adding:
>>>>>> /* XXX: Temporary failure to test sysfb restore - REMOVE BEFORE COMMIT */
>>>>>> dev_info(&pdev->dev, "Testing sysfb restore: forcing probe failure\n");
>>>>>> ret = -EINVAL;
>>>>>> goto out_error;
>>>>>> or such right after the devm_aperture_remove_conflicting_pci_devices .
>>>>> Recovering the display like that is guess work and will at best work
>>>>> with simple discrete devices where the framebuffer is always located in
>>>>> a confined graphics aperture.
>>>>>
>>>>> But the problem you're trying to solve is a real one.
>>>>>
>>>>> What we'd want to do instead is to take the initial hardware state into
>>>>> account when we do the initial mode-setting operation.
>>>>>
>>>>> The first step is to move each driver's remove_conflicting_devices call
>>>>> to the latest possible location in the probe function. We usually do it
>>>>> first, because that's easy. But on most hardware, it could happen much
>>>>> later.
>>>> Well, some drivers (vbox, vmwgfx, bochs and currus-qemu) do it because
>>>> they request pci regions which is going to fail otherwise. Because
>>>> grabbining the pci resources is in general the very first thing that
>>>> those drivers need to do to setup anything, we
>>>> remove_conflicting_devices first or at least very early.
>>> To my knowledge, requesting resources is more about correctness than a hard requirement to use an I/O or memory range. Has this changed?
>> Nope that is not correct.
>>
>> At least for AMD GPUs remove_conflicting_devices() really early is necessary because otherwise some operations just result in a spontaneous system reboot.
>
> Here I was only talking about avoiding calls to request_resource() and similar interfaces.
>
>>
>> For example resizing the PCIe BAR giving access to VRAM or disabling VGA emulation (which AFAIK is used for EFI as well) is only possible when the VGA or EFI framebuffer driver is kicked out first.
>
> Yeah, that's what I expected.
>
>>
>> And disabling VGA emulation is among the absolutely first steps you do to take over the scanout config.
>
> Assuming the driver (or driver author) is careful, is it possible to only read state from AMD hardware at such an early time?
I'm not an expert for that particular stuff but I strongly don't think so.
Basically the VGA emulation is firmware which "owns" the CRTC registers and might modify them at any time unless it's turned off first.
So you can't even use data/index pairs of registers etc...
> We usually do remove_conflicting_devices() as the first thing in most driver's probe function. As a first step, it would be helpful to postpone itto a later point.
Well from what I knew that won't work in a lot of cases.
I mean what we could do on non-AMD HW is to remove the conflicting driver, play with the HW and if we find that this didn't worked reset the HW using a PCI function level reset and try to load the EFI or whatever driver again. But that has a rather low chance of working reliable I would say.
The problem with AMD GPUs is that the PCI function level reset is broken to begin with (which already caused us tons of headache in the case of pass through).
Regards,
Christian.
>
>>
>> So I absolutely clearly have to reject the amdgpu patch in this series, that will break tons of use cases.
>
> Don't worry, we're still in the early ideation phase.
>
> Best regards
> Thomas
>
>>
>> Regards,
>> Christian.
>>
>>>> I also don't think it's possible or even desirable by some drivers to
>>>> reuse the initial state, good example here is vmwgfx where by default
>>>> some people will setup their vm's with e.g. 8mb ram, when the vmwgfx
>>>> loads we allow scanning out from system memory, so you can set your vm
>>>> up with 8mb of vram but still use 4k resolutions when the driver
>>>> loads, this way the suspend size of the vm is very predictable (tiny
>>>> vram plus whatever ram was setup) while still allowing a lot of
>>>> flexibility.
>>> If there's no initial state to switch from, the first modeset can fail while leaving the display unusable. There's no way around that. Going back to the old state is not an option unless the driver has been written to support this.
>>>
>>> The case of vmwgfx is special, but does not effect the overall problem. For vmwgfx, it would be best to import that initial state and support a transparent modeset from vram to system memory (and back) at least during this initial state.
>>>
>>>
>>>> In general I think however this is planned it's two or three separate series:
>>>> 1) infrastructure to reload the sysfb driver (what this series is)
>>>> 2) making sure that drivers that do want to recover cleanly actually
>>>> clean out all the state on exit properly,
>>>> 3) abstracting at least some of that cleanup in some driver independent way
>>> That's really not going to work. For example, in the current series, you invoke devm_aperture_remove_conflicting_pci_devices_done() after drm_mode_reset(), drm_dev_register() and drm_client_setup(). Each of these calls can modify hardware state. In the case of _register() and _setup(), the DRM clients can perform a modeset, which destroys the initial hardware state. Patch 1 of this series removes the sysfb device/driver entirely. That should be a no-go as it significantly complicates recovery. For example, if the native drivers failed from an allocation failure, the sysfb device/driver is not likely to come back either. As the very first thing, the series should state which failures is is going to resolve, - failed hardware init, - invalid initial modesetting, - runtime errors (such ENOMEM, failed firmware loading), - others? And then specify how a recovery to sysfb could look in each supported scenario. In terms of implementation, make any transition between drivers
>>> gradually. The native driver needs to acquire the hardware resource (framebuffer and I/O apertures) without unloading the sysfb driver. Luckily there's struct drm_device.unplug, which does that. [1] Flipping this field disables hardware access for DRM drivers. All sysfb drivers support this. To get the sysfb drivers ready, I suggest dedicated helpers for each drivers aperture. The aperture helpers can use these callback to flip the DRM driver off and on again. For example, efidrm could do this as a minimum: int efidrm_aperture_suspend() { dev->unplug = true; remove_resource(/*framebuffer aperture*/) return 0 } int efidrm_aperture_resume() { insert_resource(/*framebuffer aperture*/) dev->unplug = false; return 0 } struct aperture_funcs efidrm_aperture_funcs { .suspend = efidrm_aperture_suspend, .resume = efidrm_aperture_resume, } Pass this struct when efidrm acquires the framebuffer aperture, so that the aperture helpers can control the behavior of efidrm. With this, a multi-
>>> step takeover from sysfb to native driver can be tried. It's still a massive effort that requires an audit of each driver's probing logic. There's no copy-paste pattern AFAICT. I suggest to pick one simple driver first and make a prototype. Let me also say that I DO like the general idea you're proposing. But if it was easy, we would likely have done it already. Best regards Thomas
>>>> z
>
^ permalink raw reply
* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
From: Ville Syrjälä @ 2026-01-15 15:10 UTC (permalink / raw)
To: Christian König
Cc: Thomas Zimmermann, Zack Rusin, dri-devel, Alex Deucher, amd-gfx,
Ard Biesheuvel, Ce Sun, Chia-I Wu, Danilo Krummrich, Dave Airlie,
Deepak Rawat, Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh,
Hans de Goede, Hawking Zhang, Helge Deller, intel-gfx, intel-xe,
Jani Nikula, Javier Martinez Canillas, Jocelyn Falempe,
Joonas Lahtinen, Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv,
linux-kernel, Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
virtualization, Vitaly Prosyak
In-Reply-To: <9058636d-cc18-4c8f-92cf-782fd8f771af@amd.com>
On Thu, Jan 15, 2026 at 03:39:00PM +0100, Christian König wrote:
> Sorry to being late, but I only now realized what you are doing here.
>
> On 1/15/26 12:02, Thomas Zimmermann wrote:
> > Hi,
> >
> > apologies for the delay. I wanted to reply and then forgot about it.
> >
> > Am 10.01.26 um 05:52 schrieb Zack Rusin:
> >> On Fri, Jan 9, 2026 at 5:34 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >>> Hi
> >>>
> >>> Am 29.12.25 um 22:58 schrieb Zack Rusin:
> >>>> Almost a rite of passage for every DRM developer and most Linux users
> >>>> is upgrading your DRM driver/updating boot flags/changing some config
> >>>> and having DRM driver fail at probe resulting in a blank screen.
> >>>>
> >>>> Currently there's no way to recover from DRM driver probe failure. PCI
> >>>> DRM driver explicitly throw out the existing sysfb to get exclusive
> >>>> access to PCI resources so if the probe fails the system is left without
> >>>> a functioning display driver.
> >>>>
> >>>> Add code to sysfb to recever system framebuffer when DRM driver's probe
> >>>> fails. This means that a DRM driver that fails to load reloads the system
> >>>> framebuffer driver.
> >>>>
> >>>> This works best with simpledrm. Without it Xorg won't recover because
> >>>> it still tries to load the vendor specific driver which ends up usually
> >>>> not working at all. With simpledrm the system recovers really nicely
> >>>> ending up with a working console and not a blank screen.
> >>>>
> >>>> There's a caveat in that some hardware might require some special magic
> >>>> register write to recover EFI display. I'd appreciate it a lot if
> >>>> maintainers could introduce a temporary failure in their drivers
> >>>> probe to validate that the sysfb recovers and they get a working console.
> >>>> The easiest way to double check it is by adding:
> >>>> /* XXX: Temporary failure to test sysfb restore - REMOVE BEFORE COMMIT */
> >>>> dev_info(&pdev->dev, "Testing sysfb restore: forcing probe failure\n");
> >>>> ret = -EINVAL;
> >>>> goto out_error;
> >>>> or such right after the devm_aperture_remove_conflicting_pci_devices .
> >>> Recovering the display like that is guess work and will at best work
> >>> with simple discrete devices where the framebuffer is always located in
> >>> a confined graphics aperture.
> >>>
> >>> But the problem you're trying to solve is a real one.
> >>>
> >>> What we'd want to do instead is to take the initial hardware state into
> >>> account when we do the initial mode-setting operation.
> >>>
> >>> The first step is to move each driver's remove_conflicting_devices call
> >>> to the latest possible location in the probe function. We usually do it
> >>> first, because that's easy. But on most hardware, it could happen much
> >>> later.
> >> Well, some drivers (vbox, vmwgfx, bochs and currus-qemu) do it because
> >> they request pci regions which is going to fail otherwise. Because
> >> grabbining the pci resources is in general the very first thing that
> >> those drivers need to do to setup anything, we
> >> remove_conflicting_devices first or at least very early.
> >
> > To my knowledge, requesting resources is more about correctness than a hard requirement to use an I/O or memory range. Has this changed?
>
> Nope that is not correct.
>
> At least for AMD GPUs remove_conflicting_devices() really early is necessary because otherwise some operations just result in a spontaneous system reboot.
>
> For example resizing the PCIe BAR giving access to VRAM or disabling VGA emulation (which AFAIK is used for EFI as well) is only possible when the VGA or EFI framebuffer driver is kicked out first.
>
> And disabling VGA emulation is among the absolutely first steps you do to take over the scanout config.
It's similar for Intel. For us VGA emulation won't be used for
EFI boot, but we still can't have the previous driver poking
around in memory while the real driver is initializing. The
entire memory layout may get completely shuffled so there's
no telling where such memory accesses would land.
And I suppose reBAR is a concern for us as well.
--
Ville Syrjälä
Intel
^ permalink raw reply
* Re: [PATCH 00/12] Recover sysfb after DRM probe failure
From: Thomas Zimmermann @ 2026-01-15 14:54 UTC (permalink / raw)
To: Christian König, Zack Rusin
Cc: dri-devel, Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun,
Chia-I Wu, Danilo Krummrich, Dave Airlie, Deepak Rawat,
Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh, Hans de Goede,
Hawking Zhang, Helge Deller, intel-gfx, intel-xe, Jani Nikula,
Javier Martinez Canillas, Jocelyn Falempe, Joonas Lahtinen,
Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv, linux-kernel,
Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
Thomas Hellström, Timur Kristóf, Tvrtko Ursulin,
virtualization, Vitaly Prosyak
In-Reply-To: <9058636d-cc18-4c8f-92cf-782fd8f771af@amd.com>
Hi
Am 15.01.26 um 15:39 schrieb Christian König:
> Sorry to being late, but I only now realized what you are doing here.
>
> On 1/15/26 12:02, Thomas Zimmermann wrote:
>> Hi,
>>
>> apologies for the delay. I wanted to reply and then forgot about it.
>>
>> Am 10.01.26 um 05:52 schrieb Zack Rusin:
>>> On Fri, Jan 9, 2026 at 5:34 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>>> Hi
>>>>
>>>> Am 29.12.25 um 22:58 schrieb Zack Rusin:
>>>>> Almost a rite of passage for every DRM developer and most Linux users
>>>>> is upgrading your DRM driver/updating boot flags/changing some config
>>>>> and having DRM driver fail at probe resulting in a blank screen.
>>>>>
>>>>> Currently there's no way to recover from DRM driver probe failure. PCI
>>>>> DRM driver explicitly throw out the existing sysfb to get exclusive
>>>>> access to PCI resources so if the probe fails the system is left without
>>>>> a functioning display driver.
>>>>>
>>>>> Add code to sysfb to recever system framebuffer when DRM driver's probe
>>>>> fails. This means that a DRM driver that fails to load reloads the system
>>>>> framebuffer driver.
>>>>>
>>>>> This works best with simpledrm. Without it Xorg won't recover because
>>>>> it still tries to load the vendor specific driver which ends up usually
>>>>> not working at all. With simpledrm the system recovers really nicely
>>>>> ending up with a working console and not a blank screen.
>>>>>
>>>>> There's a caveat in that some hardware might require some special magic
>>>>> register write to recover EFI display. I'd appreciate it a lot if
>>>>> maintainers could introduce a temporary failure in their drivers
>>>>> probe to validate that the sysfb recovers and they get a working console.
>>>>> The easiest way to double check it is by adding:
>>>>> /* XXX: Temporary failure to test sysfb restore - REMOVE BEFORE COMMIT */
>>>>> dev_info(&pdev->dev, "Testing sysfb restore: forcing probe failure\n");
>>>>> ret = -EINVAL;
>>>>> goto out_error;
>>>>> or such right after the devm_aperture_remove_conflicting_pci_devices .
>>>> Recovering the display like that is guess work and will at best work
>>>> with simple discrete devices where the framebuffer is always located in
>>>> a confined graphics aperture.
>>>>
>>>> But the problem you're trying to solve is a real one.
>>>>
>>>> What we'd want to do instead is to take the initial hardware state into
>>>> account when we do the initial mode-setting operation.
>>>>
>>>> The first step is to move each driver's remove_conflicting_devices call
>>>> to the latest possible location in the probe function. We usually do it
>>>> first, because that's easy. But on most hardware, it could happen much
>>>> later.
>>> Well, some drivers (vbox, vmwgfx, bochs and currus-qemu) do it because
>>> they request pci regions which is going to fail otherwise. Because
>>> grabbining the pci resources is in general the very first thing that
>>> those drivers need to do to setup anything, we
>>> remove_conflicting_devices first or at least very early.
>> To my knowledge, requesting resources is more about correctness than a hard requirement to use an I/O or memory range. Has this changed?
> Nope that is not correct.
>
> At least for AMD GPUs remove_conflicting_devices() really early is necessary because otherwise some operations just result in a spontaneous system reboot.
Here I was only talking about avoiding calls to request_resource() and
similar interfaces.
>
> For example resizing the PCIe BAR giving access to VRAM or disabling VGA emulation (which AFAIK is used for EFI as well) is only possible when the VGA or EFI framebuffer driver is kicked out first.
Yeah, that's what I expected.
>
> And disabling VGA emulation is among the absolutely first steps you do to take over the scanout config.
Assuming the driver (or driver author) is careful, is it possible to
only read state from AMD hardware at such an early time?
We usually do remove_conflicting_devices() as the first thing in most
driver's probe function. As a first step, it would be helpful to
postpone itto a later point.
>
> So I absolutely clearly have to reject the amdgpu patch in this series, that will break tons of use cases.
Don't worry, we're still in the early ideation phase.
Best regards
Thomas
>
> Regards,
> Christian.
>
>>> I also don't think it's possible or even desirable by some drivers to
>>> reuse the initial state, good example here is vmwgfx where by default
>>> some people will setup their vm's with e.g. 8mb ram, when the vmwgfx
>>> loads we allow scanning out from system memory, so you can set your vm
>>> up with 8mb of vram but still use 4k resolutions when the driver
>>> loads, this way the suspend size of the vm is very predictable (tiny
>>> vram plus whatever ram was setup) while still allowing a lot of
>>> flexibility.
>> If there's no initial state to switch from, the first modeset can fail while leaving the display unusable. There's no way around that. Going back to the old state is not an option unless the driver has been written to support this.
>>
>> The case of vmwgfx is special, but does not effect the overall problem. For vmwgfx, it would be best to import that initial state and support a transparent modeset from vram to system memory (and back) at least during this initial state.
>>
>>
>>> In general I think however this is planned it's two or three separate series:
>>> 1) infrastructure to reload the sysfb driver (what this series is)
>>> 2) making sure that drivers that do want to recover cleanly actually
>>> clean out all the state on exit properly,
>>> 3) abstracting at least some of that cleanup in some driver independent way
>> That's really not going to work. For example, in the current series, you invoke devm_aperture_remove_conflicting_pci_devices_done() after drm_mode_reset(), drm_dev_register() and drm_client_setup(). Each of these calls can modify hardware state. In the case of _register() and _setup(), the DRM clients can perform a modeset, which destroys the initial hardware state. Patch 1 of this series removes the sysfb device/driver entirely. That should be a no-go as it significantly complicates recovery. For example, if the native drivers failed from an allocation failure, the sysfb device/driver is not likely to come back either. As the very first thing, the series should state which failures is is going to resolve, - failed hardware init, - invalid initial modesetting, - runtime errors (such ENOMEM, failed firmware loading), - others? And then specify how a recovery to sysfb could look in each supported scenario. In terms of implementation, make any transition between drivers
>> gradually. The native driver needs to acquire the hardware resource (framebuffer and I/O apertures) without unloading the sysfb driver. Luckily there's struct drm_device.unplug, which does that. [1] Flipping this field disables hardware access for DRM drivers. All sysfb drivers support this. To get the sysfb drivers ready, I suggest dedicated helpers for each drivers aperture. The aperture helpers can use these callback to flip the DRM driver off and on again. For example, efidrm could do this as a minimum: int efidrm_aperture_suspend() { dev->unplug = true; remove_resource(/*framebuffer aperture*/) return 0 } int efidrm_aperture_resume() { insert_resource(/*framebuffer aperture*/) dev->unplug = false; return 0 } struct aperture_funcs efidrm_aperture_funcs { .suspend = efidrm_aperture_suspend, .resume = efidrm_aperture_resume, } Pass this struct when efidrm acquires the framebuffer aperture, so that the aperture helpers can control the behavior of efidrm. With this, a multi-
>> step takeover from sysfb to native driver can be tried. It's still a massive effort that requires an audit of each driver's probing logic. There's no copy-paste pattern AFAICT. I suggest to pick one simple driver first and make a prototype. Let me also say that I DO like the general idea you're proposing. But if it was easy, we would likely have done it already. Best regards Thomas
>>> z
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg)
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox