* [PATCH 1/2] firmware: sysfb: Allow re-creating system framebuffer after init
2022-12-22 18:30 [PATCH 0/2] Recover from failure to probe GPU Mario Limonciello
@ 2022-12-22 18:30 ` Mario Limonciello
2022-12-22 19:41 ` [PATCH 0/2] Recover from failure to probe GPU Javier Martinez Canillas
2022-12-24 9:34 ` Thomas Zimmermann
2 siblings, 0 replies; 9+ messages in thread
From: Mario Limonciello @ 2022-12-22 18:30 UTC (permalink / raw)
To: Javier Martinez Canillas, Alex Deucher, Ard Biesheuvel
Cc: Carlos Soriano Sanchez, amd-gfx, dri-devel, David Airlie,
Daniel Vetter, christian.koenig, Mario Limonciello, linux-efi,
linux-kernel
When GPU kernel drivers have failed to load for any reason the
current experience is that the screen is frozen. This is because
one of the first things that these drivers do is to call `sysfb_disable`.
For end users this is quite jarring and hard to recover from. Allow
drivers to request the framebuffer to be re-created for a failure cleanup.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
drivers/firmware/efi/sysfb_efi.c | 6 +++---
drivers/firmware/sysfb.c | 15 ++++++++++++++-
drivers/firmware/sysfb_simplefb.c | 4 ++--
include/linux/sysfb.h | 5 +++++
4 files changed, 24 insertions(+), 6 deletions(-)
diff --git a/drivers/firmware/efi/sysfb_efi.c b/drivers/firmware/efi/sysfb_efi.c
index 7882d4b3f2be..a890cb6d44fa 100644
--- a/drivers/firmware/efi/sysfb_efi.c
+++ b/drivers/firmware/efi/sysfb_efi.c
@@ -185,7 +185,7 @@ static int __init efifb_set_system(const struct dmi_system_id *id)
&efifb_dmi_list[enumid] \
}
-static const struct dmi_system_id efifb_dmi_system_table[] __initconst = {
+static const struct dmi_system_id efifb_dmi_system_table[] = {
EFIFB_DMI_SYSTEM_ID("Apple Computer, Inc.", "iMac4,1", M_I17),
/* At least one of these two will be right; maybe both? */
EFIFB_DMI_SYSTEM_ID("Apple Computer, Inc.", "iMac5,1", M_I20),
@@ -235,7 +235,7 @@ static const struct dmi_system_id efifb_dmi_system_table[] __initconst = {
* pitch). We simply swap width and height for these devices so that we can
* correctly deal with some of them coming with multiple resolutions.
*/
-static const struct dmi_system_id efifb_dmi_swap_width_height[] __initconst = {
+static const struct dmi_system_id efifb_dmi_swap_width_height[] = {
{
/*
* Lenovo MIIX310-10ICR, only some batches have the troublesome
@@ -333,7 +333,7 @@ static const struct fwnode_operations efifb_fwnode_ops = {
#ifdef CONFIG_EFI
static struct fwnode_handle efifb_fwnode;
-__init void sysfb_apply_efi_quirks(struct platform_device *pd)
+void sysfb_apply_efi_quirks(struct platform_device *pd)
{
if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI ||
!(screen_info.capabilities & VIDEO_CAPABILITY_SKIP_QUIRKS))
diff --git a/drivers/firmware/sysfb.c b/drivers/firmware/sysfb.c
index 3fd3563d962b..7f2254bd2071 100644
--- a/drivers/firmware/sysfb.c
+++ b/drivers/firmware/sysfb.c
@@ -69,7 +69,7 @@ void sysfb_disable(void)
}
EXPORT_SYMBOL_GPL(sysfb_disable);
-static __init int sysfb_init(void)
+static int sysfb_init(void)
{
struct screen_info *si = &screen_info;
struct simplefb_platform_data mode;
@@ -124,6 +124,19 @@ static __init int sysfb_init(void)
mutex_unlock(&disable_lock);
return ret;
}
+/**
+ * sysfb_enable() - re-enable the Generic System Framebuffers support
+ *
+ * This causes the system framebuffer initialization to be re-run.
+ * It is intended to be called by DRM drivers that failed probe for cleanup.
+ *
+ */
+int sysfb_enable(void)
+{
+ disabled = false;
+ return sysfb_init();
+}
+EXPORT_SYMBOL_GPL(sysfb_enable);
/* must execute after PCI subsystem for EFI quirks */
device_initcall(sysfb_init);
diff --git a/drivers/firmware/sysfb_simplefb.c b/drivers/firmware/sysfb_simplefb.c
index a353e27f83f5..82735ff81191 100644
--- a/drivers/firmware/sysfb_simplefb.c
+++ b/drivers/firmware/sysfb_simplefb.c
@@ -24,7 +24,7 @@ static const char simplefb_resname[] = "BOOTFB";
static const struct simplefb_format formats[] = SIMPLEFB_FORMATS;
/* try parsing screen_info into a simple-framebuffer mode struct */
-__init bool sysfb_parse_mode(const struct screen_info *si,
+bool sysfb_parse_mode(const struct screen_info *si,
struct simplefb_platform_data *mode)
{
const struct simplefb_format *f;
@@ -57,7 +57,7 @@ __init bool sysfb_parse_mode(const struct screen_info *si,
return false;
}
-__init struct platform_device *sysfb_create_simplefb(const struct screen_info *si,
+struct platform_device *sysfb_create_simplefb(const struct screen_info *si,
const struct simplefb_platform_data *mode)
{
struct platform_device *pd;
diff --git a/include/linux/sysfb.h b/include/linux/sysfb.h
index 8ba8b5be5567..14d447576e57 100644
--- a/include/linux/sysfb.h
+++ b/include/linux/sysfb.h
@@ -58,6 +58,7 @@ struct efifb_dmi_info {
#ifdef CONFIG_SYSFB
void sysfb_disable(void);
+int sysfb_enable(void);
#else /* CONFIG_SYSFB */
@@ -65,6 +66,10 @@ static inline void sysfb_disable(void)
{
}
+static int sysfb_enable(void)
+{
+}
+
#endif /* CONFIG_SYSFB */
#ifdef CONFIG_EFI
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH 0/2] Recover from failure to probe GPU
2022-12-22 18:30 [PATCH 0/2] Recover from failure to probe GPU Mario Limonciello
2022-12-22 18:30 ` [PATCH 1/2] firmware: sysfb: Allow re-creating system framebuffer after init Mario Limonciello
@ 2022-12-22 19:41 ` Javier Martinez Canillas
2022-12-23 15:51 ` Mario Limonciello
2022-12-24 9:34 ` Thomas Zimmermann
2 siblings, 1 reply; 9+ messages in thread
From: Javier Martinez Canillas @ 2022-12-22 19:41 UTC (permalink / raw)
To: Mario Limonciello, Alex Deucher, linux-efi, Thomas Zimmermann
Cc: Carlos Soriano Sanchez, amd-gfx, dri-devel, David Airlie,
Daniel Vetter, christian.koenig, linux-kernel
[adding Thomas Zimmermann to CC list]
Hello Mario,
Interesting case.
On 12/22/22 19:30, Mario Limonciello wrote:
> One of the first thing that KMS drivers do during initialization is
> destroy the system firmware framebuffer by means of
> `drm_aperture_remove_conflicting_pci_framebuffers`
>
The reason why that's done at the very beginning is that there are no
guarantees that the firmware-provided framebuffer would keep working
after the real display controller driver re-initializes the IP block.
> This means that if for any reason the GPU failed to probe the user
> will be stuck with at best a screen frozen at the last thing that
> was shown before the KMS driver continued it's probe.
>
> The problem is most pronounced when new GPU support is introduced
> because users will need to have a recent linux-firmware snapshot
> on their system when they boot a kernel with matching support.
>
Right. That's a problem indeed but as mentioned there's a gap between
the firmware-provided framebuffer is removed and the real driver sets
up its framebuffer.
> However the problem is further exaggerated in the case of amdgpu because
> it has migrated to "IP discovery" where amdgpu will attempt to load
> on "ALL" AMD GPUs even if the driver is missing support for IP blocks
> contained in that GPU.
>
> IP discovery requires some probing and isn't run until after the
> framebuffer has been destroyed.
>
> This means a situation can occur where a user purchases a new GPU not
> yet supported by a distribution and when booting the installer it will
> "freeze" even if the distribution doesn't have the matching kernel support
> for those IP blocks.
>
> The perfect example of this is Ubuntu 21.10 and the new dGPUs just
> launched by AMD. The installation media ships with kernel 5.19 (which
> has IP discovery) but the amdgpu support for those IP blocks landed in
> kernel 6.0. The matching linux-firmware was released after 21.10's launch.
> The screen will freeze without nomodeset. Even if a user manages to install
> and then upgrades to kernel 6.0 after install they'll still have the
> problem of missing firmware, and the same experience.
>
> This is quite jarring for users, particularly if they don't know
> that they have to use "nomodeset" to install.
>
I'm not familiar with AMD GPUs, but could be possible that this discovery
and firmware loading step be done at the beginning before the firmware FB
is removed ? That way the FB removal will not happen unless that succeeds.
> To help the situation, allow drivers to re-run the init process for the
> firmware framebuffer during a failed probe. As this problem is most
> pronounced with amdgpu, this is the only driver changed.
>
> But if this makes sense more generally for other KMS drivers, the call
> can be added to the cleanup routine for those too.
>
The problem I see is that depending on how far the driver's probe function
went, there may not be possible to re-run the init process. Since firmware
provided framebuffer may already been destroyed or the IP block just be in
a half initialized state.
I'm not against this series if it solves the issue in practice for amdgpu,
but don't think is a general solution and would like to know Thomas' opinion
on this before as well.
--
Best regards,
Javier Martinez Canillas
Core Platforms
Red Hat
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] Recover from failure to probe GPU
2022-12-22 19:41 ` [PATCH 0/2] Recover from failure to probe GPU Javier Martinez Canillas
@ 2022-12-23 15:51 ` Mario Limonciello
0 siblings, 0 replies; 9+ messages in thread
From: Mario Limonciello @ 2022-12-23 15:51 UTC (permalink / raw)
To: Javier Martinez Canillas, Alex Deucher, linux-efi,
Thomas Zimmermann
Cc: Carlos Soriano Sanchez, amd-gfx, dri-devel, David Airlie,
Daniel Vetter, christian.koenig, linux-kernel
On 12/22/22 13:41, Javier Martinez Canillas wrote:
> [adding Thomas Zimmermann to CC list]
>
> Hello Mario,
>
> Interesting case.
>
> On 12/22/22 19:30, Mario Limonciello wrote:
>> One of the first thing that KMS drivers do during initialization is
>> destroy the system firmware framebuffer by means of
>> `drm_aperture_remove_conflicting_pci_framebuffers`
>>
>
> The reason why that's done at the very beginning is that there are no
> guarantees that the firmware-provided framebuffer would keep working
> after the real display controller driver re-initializes the IP block.
>
>> This means that if for any reason the GPU failed to probe the user
>> will be stuck with at best a screen frozen at the last thing that
>> was shown before the KMS driver continued it's probe.
>>
>> The problem is most pronounced when new GPU support is introduced
>> because users will need to have a recent linux-firmware snapshot
>> on their system when they boot a kernel with matching support.
>>
>
> Right. That's a problem indeed but as mentioned there's a gap between
> the firmware-provided framebuffer is removed and the real driver sets
> up its framebuffer.
>
>> However the problem is further exaggerated in the case of amdgpu because
>> it has migrated to "IP discovery" where amdgpu will attempt to load
>> on "ALL" AMD GPUs even if the driver is missing support for IP blocks
>> contained in that GPU.
>>
>> IP discovery requires some probing and isn't run until after the
>> framebuffer has been destroyed.
>>
>> This means a situation can occur where a user purchases a new GPU not
>> yet supported by a distribution and when booting the installer it will
>> "freeze" even if the distribution doesn't have the matching kernel support
>> for those IP blocks.
>>
>> The perfect example of this is Ubuntu 21.10 and the new dGPUs just
>> launched by AMD. The installation media ships with kernel 5.19 (which
>> has IP discovery) but the amdgpu support for those IP blocks landed in
>> kernel 6.0. The matching linux-firmware was released after 21.10's launch.
>> The screen will freeze without nomodeset. Even if a user manages to install
>> and then upgrades to kernel 6.0 after install they'll still have the
>> problem of missing firmware, and the same experience.
s/21.10/22.10/
>>
>> This is quite jarring for users, particularly if they don't know
>> that they have to use "nomodeset" to install.
>>
>
> I'm not familiar with AMD GPUs, but could be possible that this discovery
> and firmware loading step be done at the beginning before the firmware FB
> is removed ? That way the FB removal will not happen unless that succeeds.
Possible? I think so, but maybe Alex can comment on this after the
holidays as he's more familiar.
It would mean splitting and introducing an entirely new phase to driver
initialization. The information about the discovery table comes from VRAM.
amdgpu_driver_load_kms -> amdgpu_device_init -> amdgpu_device_ip_early_init
Basically that code specific would have to call earlier and then there
would need to be a separate set of code for all the IP blocks to *just*
collect what firmware they need.
>
>> To help the situation, allow drivers to re-run the init process for the
>> firmware framebuffer during a failed probe. As this problem is most
>> pronounced with amdgpu, this is the only driver changed.
>>
>> But if this makes sense more generally for other KMS drivers, the call
>> can be added to the cleanup routine for those too.
>>
>
> The problem I see is that depending on how far the driver's probe function
> went, there may not be possible to re-run the init process. Since firmware
> provided framebuffer may already been destroyed or the IP block just be in
> a half initialized state.
>
> I'm not against this series if it solves the issue in practice for amdgpu,
> but don't think is a general solution and would like to know Thomas' opinion
> on this before as well
Running on this idea I'm pretty sure that request_firmware returns
-ENOENT in this case. So another proposal for when to trigger this flow
would be to only do it on -ENOENT. We could then also change
amdgpu_discovery.c to return -ENOENT when an IP block isn't supported
instead of the current -EINVAL.
Or we could instead co-opt -ENOTSUPP and remap all the cases that we
explicitly want the system framebuffer to re-initialize to that.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] Recover from failure to probe GPU
2022-12-22 18:30 [PATCH 0/2] Recover from failure to probe GPU Mario Limonciello
2022-12-22 18:30 ` [PATCH 1/2] firmware: sysfb: Allow re-creating system framebuffer after init Mario Limonciello
2022-12-22 19:41 ` [PATCH 0/2] Recover from failure to probe GPU Javier Martinez Canillas
@ 2022-12-24 9:34 ` Thomas Zimmermann
2022-12-25 15:30 ` Christian König
2 siblings, 1 reply; 9+ messages in thread
From: Thomas Zimmermann @ 2022-12-24 9:34 UTC (permalink / raw)
To: Mario Limonciello, Javier Martinez Canillas, Alex Deucher,
linux-efi
Cc: linux-kernel, dri-devel, amd-gfx, Carlos Soriano Sanchez,
christian.koenig
[-- Attachment #1.1: Type: text/plain, Size: 7446 bytes --]
Hi
Am 22.12.22 um 19:30 schrieb Mario Limonciello:
> One of the first thing that KMS drivers do during initialization is
> destroy the system firmware framebuffer by means of
> `drm_aperture_remove_conflicting_pci_framebuffers`
>
> This means that if for any reason the GPU failed to probe the user
> will be stuck with at best a screen frozen at the last thing that
> was shown before the KMS driver continued it's probe.
>
> The problem is most pronounced when new GPU support is introduced
> because users will need to have a recent linux-firmware snapshot
> on their system when they boot a kernel with matching support.
>
> However the problem is further exaggerated in the case of amdgpu because
> it has migrated to "IP discovery" where amdgpu will attempt to load
> on "ALL" AMD GPUs even if the driver is missing support for IP blocks
> contained in that GPU.
>
> IP discovery requires some probing and isn't run until after the
> framebuffer has been destroyed.
>
> This means a situation can occur where a user purchases a new GPU not
> yet supported by a distribution and when booting the installer it will
> "freeze" even if the distribution doesn't have the matching kernel support
> for those IP blocks.
>
> The perfect example of this is Ubuntu 21.10 and the new dGPUs just
> launched by AMD. The installation media ships with kernel 5.19 (which
> has IP discovery) but the amdgpu support for those IP blocks landed in
> kernel 6.0. The matching linux-firmware was released after 21.10's launch.
> The screen will freeze without nomodeset. Even if a user manages to install
> and then upgrades to kernel 6.0 after install they'll still have the
> problem of missing firmware, and the same experience.
>
> This is quite jarring for users, particularly if they don't know
> that they have to use "nomodeset" to install.
>
> To help the situation, allow drivers to re-run the init process for the
> firmware framebuffer during a failed probe. As this problem is most
> pronounced with amdgpu, this is the only driver changed.
>
> But if this makes sense more generally for other KMS drivers, the call
> can be added to the cleanup routine for those too.
Just a quick drive-by comment: as Javier noted, at some point while
probing, your driver has changed the device' state and the system FB
will be gone. you cannot reestablish the sysfb after that.
You are, however free to read device state at any time, as long as it
has no side effects.
So why not just move the call to
drm_aperture_remove_conflicting_pci_framebuffers() to a later point when
you know that your driver supports the hardware? That's the solution we
always proposed to this kind of problem. It's safe and won't require any
changes to the aperture helpers.
Best regards
Thomas
>
> Here is a sample of what happens with missing GPU firmware and this
> series:
>
> [ 5.950056] amdgpu 0000:63:00.0: vgaarb: deactivate vga console
> [ 5.950114] amdgpu 0000:63:00.0: enabling device (0006 -> 0007)
> [ 5.950883] [drm] initializing kernel modesetting (YELLOW_CARP 0x1002:0x1681 0x17AA:0x22F1 0xD2).
> [ 5.952954] [drm] register mmio base: 0xB0A00000
> [ 5.952958] [drm] register mmio size: 524288
> [ 5.954633] [drm] add ip block number 0 <nv_common>
> [ 5.954636] [drm] add ip block number 1 <gmc_v10_0>
> [ 5.954637] [drm] add ip block number 2 <navi10_ih>
> [ 5.954638] [drm] add ip block number 3 <psp>
> [ 5.954639] [drm] add ip block number 4 <smu>
> [ 5.954641] [drm] add ip block number 5 <dm>
> [ 5.954642] [drm] add ip block number 6 <gfx_v10_0>
> [ 5.954643] [drm] add ip block number 7 <sdma_v5_2>
> [ 5.954644] [drm] add ip block number 8 <vcn_v3_0>
> [ 5.954645] [drm] add ip block number 9 <jpeg_v3_0>
> [ 5.954663] amdgpu 0000:63:00.0: amdgpu: Fetched VBIOS from VFCT
> [ 5.954666] amdgpu: ATOM BIOS: 113-REMBRANDT-X37
> [ 5.954677] [drm] VCN(0) decode is enabled in VM mode
> [ 5.954678] [drm] VCN(0) encode is enabled in VM mode
> [ 5.954680] [drm] JPEG decode is enabled in VM mode
> [ 5.954681] amdgpu 0000:63:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
> [ 5.954683] amdgpu 0000:63:00.0: amdgpu: PCIE atomic ops is not supported
> [ 5.954724] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
> [ 5.954732] amdgpu 0000:63:00.0: amdgpu: VRAM: 512M 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
> [ 5.954735] amdgpu 0000:63:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
> [ 5.954738] amdgpu 0000:63:00.0: amdgpu: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
> [ 5.954747] [drm] Detected VRAM RAM=512M, BAR=512M
> [ 5.954750] [drm] RAM width 256bits LPDDR5
> [ 5.954834] [drm] amdgpu: 512M of VRAM memory ready
> [ 5.954838] [drm] amdgpu: 15680M of GTT memory ready.
> [ 5.954873] [drm] GART: num cpu pages 262144, num gpu pages 262144
> [ 5.955333] [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
> [ 5.955502] amdgpu 0000:63:00.0: Direct firmware load for amdgpu/yellow_carp_toc.bin failed with error -2
> [ 5.955505] amdgpu 0000:63:00.0: amdgpu: fail to request/validate toc microcode
> [ 5.955510] [drm:psp_sw_init [amdgpu]] *ERROR* Failed to load psp firmware!
> [ 5.955725] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init of IP block <psp> failed -2
> [ 5.955952] amdgpu 0000:63:00.0: amdgpu: amdgpu_device_ip_init failed
> [ 5.955954] amdgpu 0000:63:00.0: amdgpu: Fatal error during GPU init
> [ 5.955957] amdgpu 0000:63:00.0: amdgpu: amdgpu: finishing device.
> [ 5.971162] efifb: probing for efifb
> [ 5.971281] efifb: showing boot graphics
> [ 5.974803] efifb: framebuffer at 0x910000000, using 20252k, total 20250k
> [ 5.974805] efifb: mode is 2880x1800x32, linelength=11520, pages=1
> [ 5.974807] efifb: scrolling: redraw
> [ 5.974807] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> [ 5.974974] Console: switching to colour frame buffer device 180x56
> [ 5.978181] fb0: EFI VGA frame buffer device
> [ 5.978199] amdgpu: probe of 0000:63:00.0 failed with error -2
> [ 5.978285] [drm] amdgpu: ttm finalized
>
> Now if the user loads the firmware into the system they can re-load the
> driver or re-attach using sysfs and it gracefully recovers.
>
> [ 665.080480] [drm] Initialized amdgpu 3.49.0 20150101 for 0000:63:00.0 on minor 0
> [ 665.090075] fbcon: amdgpudrmfb (fb0) is primary device
> [ 665.090248] [drm] DSC precompute is not needed.
>
> Mario Limonciello (2):
> firmware: sysfb: Allow re-creating system framebuffer after init
> drm/amd: Re-create firmware framebuffer on failure to probe
>
> drivers/firmware/efi/sysfb_efi.c | 6 +++---
> drivers/firmware/sysfb.c | 15 ++++++++++++++-
> drivers/firmware/sysfb_simplefb.c | 4 ++--
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 ++
> include/linux/sysfb.h | 5 +++++
> 5 files changed, 26 insertions(+), 6 deletions(-)
>
>
> base-commit: 830b3c68c1fb1e9176028d02ef86f3cf76aa2476
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] Recover from failure to probe GPU
2022-12-24 9:34 ` Thomas Zimmermann
@ 2022-12-25 15:30 ` Christian König
2022-12-27 15:40 ` Alex Deucher
0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2022-12-25 15:30 UTC (permalink / raw)
To: Thomas Zimmermann, Mario Limonciello, Javier Martinez Canillas,
Alex Deucher, linux-efi
Cc: linux-kernel, dri-devel, amd-gfx, Carlos Soriano Sanchez
Am 24.12.22 um 10:34 schrieb Thomas Zimmermann:
> Hi
>
> Am 22.12.22 um 19:30 schrieb Mario Limonciello:
>> One of the first thing that KMS drivers do during initialization is
>> destroy the system firmware framebuffer by means of
>> `drm_aperture_remove_conflicting_pci_framebuffers`
>>
>> This means that if for any reason the GPU failed to probe the user
>> will be stuck with at best a screen frozen at the last thing that
>> was shown before the KMS driver continued it's probe.
>>
>> The problem is most pronounced when new GPU support is introduced
>> because users will need to have a recent linux-firmware snapshot
>> on their system when they boot a kernel with matching support.
>>
>> However the problem is further exaggerated in the case of amdgpu because
>> it has migrated to "IP discovery" where amdgpu will attempt to load
>> on "ALL" AMD GPUs even if the driver is missing support for IP blocks
>> contained in that GPU.
>>
>> IP discovery requires some probing and isn't run until after the
>> framebuffer has been destroyed.
>>
>> This means a situation can occur where a user purchases a new GPU not
>> yet supported by a distribution and when booting the installer it will
>> "freeze" even if the distribution doesn't have the matching kernel
>> support
>> for those IP blocks.
>>
>> The perfect example of this is Ubuntu 21.10 and the new dGPUs just
>> launched by AMD. The installation media ships with kernel 5.19 (which
>> has IP discovery) but the amdgpu support for those IP blocks landed in
>> kernel 6.0. The matching linux-firmware was released after 21.10's
>> launch.
>> The screen will freeze without nomodeset. Even if a user manages to
>> install
>> and then upgrades to kernel 6.0 after install they'll still have the
>> problem of missing firmware, and the same experience.
>>
>> This is quite jarring for users, particularly if they don't know
>> that they have to use "nomodeset" to install.
>>
>> To help the situation, allow drivers to re-run the init process for the
>> firmware framebuffer during a failed probe. As this problem is most
>> pronounced with amdgpu, this is the only driver changed.
>>
>> But if this makes sense more generally for other KMS drivers, the call
>> can be added to the cleanup routine for those too.
>
> Just a quick drive-by comment: as Javier noted, at some point while
> probing, your driver has changed the device' state and the system FB
> will be gone. you cannot reestablish the sysfb after that.
I was about to note exactly that as well. This effort here is
unfortunately pretty pointless.
>
> You are, however free to read device state at any time, as long as it
> has no side effects.
>
> So why not just move the call to
> drm_aperture_remove_conflicting_pci_framebuffers() to a later point
> when you know that your driver supports the hardware? That's the
> solution we always proposed to this kind of problem. It's safe and
> won't require any changes to the aperture helpers.
if I'm not completely mistaken that's a little bit tricky. Currently
it's not possible to read the discovery table before disabling the VGA
and/or current framebuffer.
We might be able to do this, but it's probably not easy.
Regards,
Christian.
>
> Best regards
> Thomas
>
>>
>> Here is a sample of what happens with missing GPU firmware and this
>> series:
>>
>> [ 5.950056] amdgpu 0000:63:00.0: vgaarb: deactivate vga console
>> [ 5.950114] amdgpu 0000:63:00.0: enabling device (0006 -> 0007)
>> [ 5.950883] [drm] initializing kernel modesetting (YELLOW_CARP
>> 0x1002:0x1681 0x17AA:0x22F1 0xD2).
>> [ 5.952954] [drm] register mmio base: 0xB0A00000
>> [ 5.952958] [drm] register mmio size: 524288
>> [ 5.954633] [drm] add ip block number 0 <nv_common>
>> [ 5.954636] [drm] add ip block number 1 <gmc_v10_0>
>> [ 5.954637] [drm] add ip block number 2 <navi10_ih>
>> [ 5.954638] [drm] add ip block number 3 <psp>
>> [ 5.954639] [drm] add ip block number 4 <smu>
>> [ 5.954641] [drm] add ip block number 5 <dm>
>> [ 5.954642] [drm] add ip block number 6 <gfx_v10_0>
>> [ 5.954643] [drm] add ip block number 7 <sdma_v5_2>
>> [ 5.954644] [drm] add ip block number 8 <vcn_v3_0>
>> [ 5.954645] [drm] add ip block number 9 <jpeg_v3_0>
>> [ 5.954663] amdgpu 0000:63:00.0: amdgpu: Fetched VBIOS from VFCT
>> [ 5.954666] amdgpu: ATOM BIOS: 113-REMBRANDT-X37
>> [ 5.954677] [drm] VCN(0) decode is enabled in VM mode
>> [ 5.954678] [drm] VCN(0) encode is enabled in VM mode
>> [ 5.954680] [drm] JPEG decode is enabled in VM mode
>> [ 5.954681] amdgpu 0000:63:00.0: amdgpu: Trusted Memory Zone (TMZ)
>> feature disabled as experimental (default)
>> [ 5.954683] amdgpu 0000:63:00.0: amdgpu: PCIE atomic ops is not
>> supported
>> [ 5.954724] [drm] vm size is 262144 GB, 4 levels, block size is
>> 9-bit, fragment size is 9-bit
>> [ 5.954732] amdgpu 0000:63:00.0: amdgpu: VRAM: 512M
>> 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
>> [ 5.954735] amdgpu 0000:63:00.0: amdgpu: GART: 1024M
>> 0x0000000000000000 - 0x000000003FFFFFFF
>> [ 5.954738] amdgpu 0000:63:00.0: amdgpu: AGP: 267419648M
>> 0x000000F800000000 - 0x0000FFFFFFFFFFFF
>> [ 5.954747] [drm] Detected VRAM RAM=512M, BAR=512M
>> [ 5.954750] [drm] RAM width 256bits LPDDR5
>> [ 5.954834] [drm] amdgpu: 512M of VRAM memory ready
>> [ 5.954838] [drm] amdgpu: 15680M of GTT memory ready.
>> [ 5.954873] [drm] GART: num cpu pages 262144, num gpu pages 262144
>> [ 5.955333] [drm] PCIE GART of 1024M enabled (table at
>> 0x000000F41FC00000).
>> [ 5.955502] amdgpu 0000:63:00.0: Direct firmware load for
>> amdgpu/yellow_carp_toc.bin failed with error -2
>> [ 5.955505] amdgpu 0000:63:00.0: amdgpu: fail to request/validate
>> toc microcode
>> [ 5.955510] [drm:psp_sw_init [amdgpu]] *ERROR* Failed to load psp
>> firmware!
>> [ 5.955725] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init
>> of IP block <psp> failed -2
>> [ 5.955952] amdgpu 0000:63:00.0: amdgpu: amdgpu_device_ip_init failed
>> [ 5.955954] amdgpu 0000:63:00.0: amdgpu: Fatal error during GPU init
>> [ 5.955957] amdgpu 0000:63:00.0: amdgpu: amdgpu: finishing device.
>> [ 5.971162] efifb: probing for efifb
>> [ 5.971281] efifb: showing boot graphics
>> [ 5.974803] efifb: framebuffer at 0x910000000, using 20252k, total
>> 20250k
>> [ 5.974805] efifb: mode is 2880x1800x32, linelength=11520, pages=1
>> [ 5.974807] efifb: scrolling: redraw
>> [ 5.974807] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
>> [ 5.974974] Console: switching to colour frame buffer device 180x56
>> [ 5.978181] fb0: EFI VGA frame buffer device
>> [ 5.978199] amdgpu: probe of 0000:63:00.0 failed with error -2
>> [ 5.978285] [drm] amdgpu: ttm finalized
>>
>> Now if the user loads the firmware into the system they can re-load the
>> driver or re-attach using sysfs and it gracefully recovers.
>>
>> [ 665.080480] [drm] Initialized amdgpu 3.49.0 20150101 for
>> 0000:63:00.0 on minor 0
>> [ 665.090075] fbcon: amdgpudrmfb (fb0) is primary device
>> [ 665.090248] [drm] DSC precompute is not needed.
>>
>> Mario Limonciello (2):
>> firmware: sysfb: Allow re-creating system framebuffer after init
>> drm/amd: Re-create firmware framebuffer on failure to probe
>>
>> drivers/firmware/efi/sysfb_efi.c | 6 +++---
>> drivers/firmware/sysfb.c | 15 ++++++++++++++-
>> drivers/firmware/sysfb_simplefb.c | 4 ++--
>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 ++
>> include/linux/sysfb.h | 5 +++++
>> 5 files changed, 26 insertions(+), 6 deletions(-)
>>
>>
>> base-commit: 830b3c68c1fb1e9176028d02ef86f3cf76aa2476
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] Recover from failure to probe GPU
2022-12-25 15:30 ` Christian König
@ 2022-12-27 15:40 ` Alex Deucher
2022-12-27 17:04 ` Alex Deucher
0 siblings, 1 reply; 9+ messages in thread
From: Alex Deucher @ 2022-12-27 15:40 UTC (permalink / raw)
To: Christian König
Cc: Thomas Zimmermann, Mario Limonciello, Javier Martinez Canillas,
Alex Deucher, linux-efi, Carlos Soriano Sanchez, amd-gfx,
linux-kernel, dri-devel
On Sun, Dec 25, 2022 at 10:31 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 24.12.22 um 10:34 schrieb Thomas Zimmermann:
> > Hi
> >
> > Am 22.12.22 um 19:30 schrieb Mario Limonciello:
> >> One of the first thing that KMS drivers do during initialization is
> >> destroy the system firmware framebuffer by means of
> >> `drm_aperture_remove_conflicting_pci_framebuffers`
> >>
> >> This means that if for any reason the GPU failed to probe the user
> >> will be stuck with at best a screen frozen at the last thing that
> >> was shown before the KMS driver continued it's probe.
> >>
> >> The problem is most pronounced when new GPU support is introduced
> >> because users will need to have a recent linux-firmware snapshot
> >> on their system when they boot a kernel with matching support.
> >>
> >> However the problem is further exaggerated in the case of amdgpu because
> >> it has migrated to "IP discovery" where amdgpu will attempt to load
> >> on "ALL" AMD GPUs even if the driver is missing support for IP blocks
> >> contained in that GPU.
> >>
> >> IP discovery requires some probing and isn't run until after the
> >> framebuffer has been destroyed.
> >>
> >> This means a situation can occur where a user purchases a new GPU not
> >> yet supported by a distribution and when booting the installer it will
> >> "freeze" even if the distribution doesn't have the matching kernel
> >> support
> >> for those IP blocks.
> >>
> >> The perfect example of this is Ubuntu 21.10 and the new dGPUs just
> >> launched by AMD. The installation media ships with kernel 5.19 (which
> >> has IP discovery) but the amdgpu support for those IP blocks landed in
> >> kernel 6.0. The matching linux-firmware was released after 21.10's
> >> launch.
> >> The screen will freeze without nomodeset. Even if a user manages to
> >> install
> >> and then upgrades to kernel 6.0 after install they'll still have the
> >> problem of missing firmware, and the same experience.
> >>
> >> This is quite jarring for users, particularly if they don't know
> >> that they have to use "nomodeset" to install.
> >>
> >> To help the situation, allow drivers to re-run the init process for the
> >> firmware framebuffer during a failed probe. As this problem is most
> >> pronounced with amdgpu, this is the only driver changed.
> >>
> >> But if this makes sense more generally for other KMS drivers, the call
> >> can be added to the cleanup routine for those too.
> >
> > Just a quick drive-by comment: as Javier noted, at some point while
> > probing, your driver has changed the device' state and the system FB
> > will be gone. you cannot reestablish the sysfb after that.
>
> I was about to note exactly that as well. This effort here is
> unfortunately pretty pointless.
>
> >
> > You are, however free to read device state at any time, as long as it
> > has no side effects.
> >
> > So why not just move the call to
> > drm_aperture_remove_conflicting_pci_framebuffers() to a later point
> > when you know that your driver supports the hardware? That's the
> > solution we always proposed to this kind of problem. It's safe and
> > won't require any changes to the aperture helpers.
>
> if I'm not completely mistaken that's a little bit tricky. Currently
> it's not possible to read the discovery table before disabling the VGA
> and/or current framebuffer.
>
> We might be able to do this, but it's probably not easy.
It should be possible. It's populated by the PSP/VBIOS at power up,
so all you need to do is read the right offset in vram. For
firmwares, we currently read them from the filesystem from the
relevant IP code, but we could also just read it in amdgpu_discovery.c
when we walk the IP discovery table.
Alex
>
> Regards,
> Christian.
>
>
> >
> > Best regards
> > Thomas
> >
> >>
> >> Here is a sample of what happens with missing GPU firmware and this
> >> series:
> >>
> >> [ 5.950056] amdgpu 0000:63:00.0: vgaarb: deactivate vga console
> >> [ 5.950114] amdgpu 0000:63:00.0: enabling device (0006 -> 0007)
> >> [ 5.950883] [drm] initializing kernel modesetting (YELLOW_CARP
> >> 0x1002:0x1681 0x17AA:0x22F1 0xD2).
> >> [ 5.952954] [drm] register mmio base: 0xB0A00000
> >> [ 5.952958] [drm] register mmio size: 524288
> >> [ 5.954633] [drm] add ip block number 0 <nv_common>
> >> [ 5.954636] [drm] add ip block number 1 <gmc_v10_0>
> >> [ 5.954637] [drm] add ip block number 2 <navi10_ih>
> >> [ 5.954638] [drm] add ip block number 3 <psp>
> >> [ 5.954639] [drm] add ip block number 4 <smu>
> >> [ 5.954641] [drm] add ip block number 5 <dm>
> >> [ 5.954642] [drm] add ip block number 6 <gfx_v10_0>
> >> [ 5.954643] [drm] add ip block number 7 <sdma_v5_2>
> >> [ 5.954644] [drm] add ip block number 8 <vcn_v3_0>
> >> [ 5.954645] [drm] add ip block number 9 <jpeg_v3_0>
> >> [ 5.954663] amdgpu 0000:63:00.0: amdgpu: Fetched VBIOS from VFCT
> >> [ 5.954666] amdgpu: ATOM BIOS: 113-REMBRANDT-X37
> >> [ 5.954677] [drm] VCN(0) decode is enabled in VM mode
> >> [ 5.954678] [drm] VCN(0) encode is enabled in VM mode
> >> [ 5.954680] [drm] JPEG decode is enabled in VM mode
> >> [ 5.954681] amdgpu 0000:63:00.0: amdgpu: Trusted Memory Zone (TMZ)
> >> feature disabled as experimental (default)
> >> [ 5.954683] amdgpu 0000:63:00.0: amdgpu: PCIE atomic ops is not
> >> supported
> >> [ 5.954724] [drm] vm size is 262144 GB, 4 levels, block size is
> >> 9-bit, fragment size is 9-bit
> >> [ 5.954732] amdgpu 0000:63:00.0: amdgpu: VRAM: 512M
> >> 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
> >> [ 5.954735] amdgpu 0000:63:00.0: amdgpu: GART: 1024M
> >> 0x0000000000000000 - 0x000000003FFFFFFF
> >> [ 5.954738] amdgpu 0000:63:00.0: amdgpu: AGP: 267419648M
> >> 0x000000F800000000 - 0x0000FFFFFFFFFFFF
> >> [ 5.954747] [drm] Detected VRAM RAM=512M, BAR=512M
> >> [ 5.954750] [drm] RAM width 256bits LPDDR5
> >> [ 5.954834] [drm] amdgpu: 512M of VRAM memory ready
> >> [ 5.954838] [drm] amdgpu: 15680M of GTT memory ready.
> >> [ 5.954873] [drm] GART: num cpu pages 262144, num gpu pages 262144
> >> [ 5.955333] [drm] PCIE GART of 1024M enabled (table at
> >> 0x000000F41FC00000).
> >> [ 5.955502] amdgpu 0000:63:00.0: Direct firmware load for
> >> amdgpu/yellow_carp_toc.bin failed with error -2
> >> [ 5.955505] amdgpu 0000:63:00.0: amdgpu: fail to request/validate
> >> toc microcode
> >> [ 5.955510] [drm:psp_sw_init [amdgpu]] *ERROR* Failed to load psp
> >> firmware!
> >> [ 5.955725] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init
> >> of IP block <psp> failed -2
> >> [ 5.955952] amdgpu 0000:63:00.0: amdgpu: amdgpu_device_ip_init failed
> >> [ 5.955954] amdgpu 0000:63:00.0: amdgpu: Fatal error during GPU init
> >> [ 5.955957] amdgpu 0000:63:00.0: amdgpu: amdgpu: finishing device.
> >> [ 5.971162] efifb: probing for efifb
> >> [ 5.971281] efifb: showing boot graphics
> >> [ 5.974803] efifb: framebuffer at 0x910000000, using 20252k, total
> >> 20250k
> >> [ 5.974805] efifb: mode is 2880x1800x32, linelength=11520, pages=1
> >> [ 5.974807] efifb: scrolling: redraw
> >> [ 5.974807] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> >> [ 5.974974] Console: switching to colour frame buffer device 180x56
> >> [ 5.978181] fb0: EFI VGA frame buffer device
> >> [ 5.978199] amdgpu: probe of 0000:63:00.0 failed with error -2
> >> [ 5.978285] [drm] amdgpu: ttm finalized
> >>
> >> Now if the user loads the firmware into the system they can re-load the
> >> driver or re-attach using sysfs and it gracefully recovers.
> >>
> >> [ 665.080480] [drm] Initialized amdgpu 3.49.0 20150101 for
> >> 0000:63:00.0 on minor 0
> >> [ 665.090075] fbcon: amdgpudrmfb (fb0) is primary device
> >> [ 665.090248] [drm] DSC precompute is not needed.
> >>
> >> Mario Limonciello (2):
> >> firmware: sysfb: Allow re-creating system framebuffer after init
> >> drm/amd: Re-create firmware framebuffer on failure to probe
> >>
> >> drivers/firmware/efi/sysfb_efi.c | 6 +++---
> >> drivers/firmware/sysfb.c | 15 ++++++++++++++-
> >> drivers/firmware/sysfb_simplefb.c | 4 ++--
> >> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 ++
> >> include/linux/sysfb.h | 5 +++++
> >> 5 files changed, 26 insertions(+), 6 deletions(-)
> >>
> >>
> >> base-commit: 830b3c68c1fb1e9176028d02ef86f3cf76aa2476
> >
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] Recover from failure to probe GPU
2022-12-27 15:40 ` Alex Deucher
@ 2022-12-27 17:04 ` Alex Deucher
2022-12-27 19:23 ` Javier Martinez Canillas
0 siblings, 1 reply; 9+ messages in thread
From: Alex Deucher @ 2022-12-27 17:04 UTC (permalink / raw)
To: Christian König
Cc: Thomas Zimmermann, Mario Limonciello, Javier Martinez Canillas,
Alex Deucher, linux-efi, Carlos Soriano Sanchez, amd-gfx,
linux-kernel, dri-devel
On Tue, Dec 27, 2022 at 10:40 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Sun, Dec 25, 2022 at 10:31 AM Christian König
> <christian.koenig@amd.com> wrote:
> >
> > Am 24.12.22 um 10:34 schrieb Thomas Zimmermann:
> > > Hi
> > >
> > > Am 22.12.22 um 19:30 schrieb Mario Limonciello:
> > >> One of the first thing that KMS drivers do during initialization is
> > >> destroy the system firmware framebuffer by means of
> > >> `drm_aperture_remove_conflicting_pci_framebuffers`
> > >>
> > >> This means that if for any reason the GPU failed to probe the user
> > >> will be stuck with at best a screen frozen at the last thing that
> > >> was shown before the KMS driver continued it's probe.
> > >>
> > >> The problem is most pronounced when new GPU support is introduced
> > >> because users will need to have a recent linux-firmware snapshot
> > >> on their system when they boot a kernel with matching support.
> > >>
> > >> However the problem is further exaggerated in the case of amdgpu because
> > >> it has migrated to "IP discovery" where amdgpu will attempt to load
> > >> on "ALL" AMD GPUs even if the driver is missing support for IP blocks
> > >> contained in that GPU.
> > >>
> > >> IP discovery requires some probing and isn't run until after the
> > >> framebuffer has been destroyed.
> > >>
> > >> This means a situation can occur where a user purchases a new GPU not
> > >> yet supported by a distribution and when booting the installer it will
> > >> "freeze" even if the distribution doesn't have the matching kernel
> > >> support
> > >> for those IP blocks.
> > >>
> > >> The perfect example of this is Ubuntu 21.10 and the new dGPUs just
> > >> launched by AMD. The installation media ships with kernel 5.19 (which
> > >> has IP discovery) but the amdgpu support for those IP blocks landed in
> > >> kernel 6.0. The matching linux-firmware was released after 21.10's
> > >> launch.
> > >> The screen will freeze without nomodeset. Even if a user manages to
> > >> install
> > >> and then upgrades to kernel 6.0 after install they'll still have the
> > >> problem of missing firmware, and the same experience.
> > >>
> > >> This is quite jarring for users, particularly if they don't know
> > >> that they have to use "nomodeset" to install.
> > >>
> > >> To help the situation, allow drivers to re-run the init process for the
> > >> firmware framebuffer during a failed probe. As this problem is most
> > >> pronounced with amdgpu, this is the only driver changed.
> > >>
> > >> But if this makes sense more generally for other KMS drivers, the call
> > >> can be added to the cleanup routine for those too.
> > >
> > > Just a quick drive-by comment: as Javier noted, at some point while
> > > probing, your driver has changed the device' state and the system FB
> > > will be gone. you cannot reestablish the sysfb after that.
> >
> > I was about to note exactly that as well. This effort here is
> > unfortunately pretty pointless.
> >
> > >
> > > You are, however free to read device state at any time, as long as it
> > > has no side effects.
> > >
> > > So why not just move the call to
> > > drm_aperture_remove_conflicting_pci_framebuffers() to a later point
> > > when you know that your driver supports the hardware? That's the
> > > solution we always proposed to this kind of problem. It's safe and
> > > won't require any changes to the aperture helpers.
> >
> > if I'm not completely mistaken that's a little bit tricky. Currently
> > it's not possible to read the discovery table before disabling the VGA
> > and/or current framebuffer.
> >
> > We might be able to do this, but it's probably not easy.
>
>
> It should be possible. It's populated by the PSP/VBIOS at power up,
> so all you need to do is read the right offset in vram. For
> firmwares, we currently read them from the filesystem from the
> relevant IP code, but we could also just read it in amdgpu_discovery.c
> when we walk the IP discovery table.
I think something like this would do the trick:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2017b3466612..45aee27ab6b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2141,6 +2141,11 @@ static int amdgpu_device_ip_early_init(struct
amdgpu_device *adev)
break;
}
+ /* Get rid of things like offb */
+ r = drm_aperture_remove_conflicting_pci_framebuffers(pdev,
&amdgpu_kms_driver);
+ if (r)
+ return r;
+
if (amdgpu_has_atpx() &&
(amdgpu_is_atpx_hybrid() ||
amdgpu_has_atpx_dgpu_power_cntl()) &&
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index b8cfa48fb296..4e74d7abc3c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2123,11 +2123,6 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
}
#endif
- /* Get rid of things like offb */
- ret = drm_aperture_remove_conflicting_pci_framebuffers(pdev,
&amdgpu_kms_driver);
- if (ret)
- return ret;
-
adev = devm_drm_dev_alloc(&pdev->dev, &amdgpu_kms_driver,
typeof(*adev), ddev);
if (IS_ERR(adev))
return PTR_ERR(adev);
>
> Alex
>
>
> >
> > Regards,
> > Christian.
> >
> >
> > >
> > > Best regards
> > > Thomas
> > >
> > >>
> > >> Here is a sample of what happens with missing GPU firmware and this
> > >> series:
> > >>
> > >> [ 5.950056] amdgpu 0000:63:00.0: vgaarb: deactivate vga console
> > >> [ 5.950114] amdgpu 0000:63:00.0: enabling device (0006 -> 0007)
> > >> [ 5.950883] [drm] initializing kernel modesetting (YELLOW_CARP
> > >> 0x1002:0x1681 0x17AA:0x22F1 0xD2).
> > >> [ 5.952954] [drm] register mmio base: 0xB0A00000
> > >> [ 5.952958] [drm] register mmio size: 524288
> > >> [ 5.954633] [drm] add ip block number 0 <nv_common>
> > >> [ 5.954636] [drm] add ip block number 1 <gmc_v10_0>
> > >> [ 5.954637] [drm] add ip block number 2 <navi10_ih>
> > >> [ 5.954638] [drm] add ip block number 3 <psp>
> > >> [ 5.954639] [drm] add ip block number 4 <smu>
> > >> [ 5.954641] [drm] add ip block number 5 <dm>
> > >> [ 5.954642] [drm] add ip block number 6 <gfx_v10_0>
> > >> [ 5.954643] [drm] add ip block number 7 <sdma_v5_2>
> > >> [ 5.954644] [drm] add ip block number 8 <vcn_v3_0>
> > >> [ 5.954645] [drm] add ip block number 9 <jpeg_v3_0>
> > >> [ 5.954663] amdgpu 0000:63:00.0: amdgpu: Fetched VBIOS from VFCT
> > >> [ 5.954666] amdgpu: ATOM BIOS: 113-REMBRANDT-X37
> > >> [ 5.954677] [drm] VCN(0) decode is enabled in VM mode
> > >> [ 5.954678] [drm] VCN(0) encode is enabled in VM mode
> > >> [ 5.954680] [drm] JPEG decode is enabled in VM mode
> > >> [ 5.954681] amdgpu 0000:63:00.0: amdgpu: Trusted Memory Zone (TMZ)
> > >> feature disabled as experimental (default)
> > >> [ 5.954683] amdgpu 0000:63:00.0: amdgpu: PCIE atomic ops is not
> > >> supported
> > >> [ 5.954724] [drm] vm size is 262144 GB, 4 levels, block size is
> > >> 9-bit, fragment size is 9-bit
> > >> [ 5.954732] amdgpu 0000:63:00.0: amdgpu: VRAM: 512M
> > >> 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
> > >> [ 5.954735] amdgpu 0000:63:00.0: amdgpu: GART: 1024M
> > >> 0x0000000000000000 - 0x000000003FFFFFFF
> > >> [ 5.954738] amdgpu 0000:63:00.0: amdgpu: AGP: 267419648M
> > >> 0x000000F800000000 - 0x0000FFFFFFFFFFFF
> > >> [ 5.954747] [drm] Detected VRAM RAM=512M, BAR=512M
> > >> [ 5.954750] [drm] RAM width 256bits LPDDR5
> > >> [ 5.954834] [drm] amdgpu: 512M of VRAM memory ready
> > >> [ 5.954838] [drm] amdgpu: 15680M of GTT memory ready.
> > >> [ 5.954873] [drm] GART: num cpu pages 262144, num gpu pages 262144
> > >> [ 5.955333] [drm] PCIE GART of 1024M enabled (table at
> > >> 0x000000F41FC00000).
> > >> [ 5.955502] amdgpu 0000:63:00.0: Direct firmware load for
> > >> amdgpu/yellow_carp_toc.bin failed with error -2
> > >> [ 5.955505] amdgpu 0000:63:00.0: amdgpu: fail to request/validate
> > >> toc microcode
> > >> [ 5.955510] [drm:psp_sw_init [amdgpu]] *ERROR* Failed to load psp
> > >> firmware!
> > >> [ 5.955725] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init
> > >> of IP block <psp> failed -2
> > >> [ 5.955952] amdgpu 0000:63:00.0: amdgpu: amdgpu_device_ip_init failed
> > >> [ 5.955954] amdgpu 0000:63:00.0: amdgpu: Fatal error during GPU init
> > >> [ 5.955957] amdgpu 0000:63:00.0: amdgpu: amdgpu: finishing device.
> > >> [ 5.971162] efifb: probing for efifb
> > >> [ 5.971281] efifb: showing boot graphics
> > >> [ 5.974803] efifb: framebuffer at 0x910000000, using 20252k, total
> > >> 20250k
> > >> [ 5.974805] efifb: mode is 2880x1800x32, linelength=11520, pages=1
> > >> [ 5.974807] efifb: scrolling: redraw
> > >> [ 5.974807] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> > >> [ 5.974974] Console: switching to colour frame buffer device 180x56
> > >> [ 5.978181] fb0: EFI VGA frame buffer device
> > >> [ 5.978199] amdgpu: probe of 0000:63:00.0 failed with error -2
> > >> [ 5.978285] [drm] amdgpu: ttm finalized
> > >>
> > >> Now if the user loads the firmware into the system they can re-load the
> > >> driver or re-attach using sysfs and it gracefully recovers.
> > >>
> > >> [ 665.080480] [drm] Initialized amdgpu 3.49.0 20150101 for
> > >> 0000:63:00.0 on minor 0
> > >> [ 665.090075] fbcon: amdgpudrmfb (fb0) is primary device
> > >> [ 665.090248] [drm] DSC precompute is not needed.
> > >>
> > >> Mario Limonciello (2):
> > >> firmware: sysfb: Allow re-creating system framebuffer after init
> > >> drm/amd: Re-create firmware framebuffer on failure to probe
> > >>
> > >> drivers/firmware/efi/sysfb_efi.c | 6 +++---
> > >> drivers/firmware/sysfb.c | 15 ++++++++++++++-
> > >> drivers/firmware/sysfb_simplefb.c | 4 ++--
> > >> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 ++
> > >> include/linux/sysfb.h | 5 +++++
> > >> 5 files changed, 26 insertions(+), 6 deletions(-)
> > >>
> > >>
> > >> base-commit: 830b3c68c1fb1e9176028d02ef86f3cf76aa2476
> > >
> >
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH 0/2] Recover from failure to probe GPU
2022-12-27 17:04 ` Alex Deucher
@ 2022-12-27 19:23 ` Javier Martinez Canillas
0 siblings, 0 replies; 9+ messages in thread
From: Javier Martinez Canillas @ 2022-12-27 19:23 UTC (permalink / raw)
To: Alex Deucher, Christian König
Cc: Thomas Zimmermann, Mario Limonciello, Alex Deucher, linux-efi,
Carlos Soriano Sanchez, amd-gfx, linux-kernel, dri-devel
Hello Alex,
On 12/27/22 18:04, Alex Deucher wrote:
[...]
>
> I think something like this would do the trick:
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 2017b3466612..45aee27ab6b1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2141,6 +2141,11 @@ static int amdgpu_device_ip_early_init(struct
> amdgpu_device *adev)
> break;
> }
>
> + /* Get rid of things like offb */
> + r = drm_aperture_remove_conflicting_pci_framebuffers(pdev,
> &amdgpu_kms_driver);
> + if (r)
> + return r;
> +
> if (amdgpu_has_atpx() &&
> (amdgpu_is_atpx_hybrid() ||
> amdgpu_has_atpx_dgpu_power_cntl()) &&
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index b8cfa48fb296..4e74d7abc3c2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2123,11 +2123,6 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
> }
> #endif
>
> - /* Get rid of things like offb */
> - ret = drm_aperture_remove_conflicting_pci_framebuffers(pdev,
> &amdgpu_kms_driver);
> - if (ret)
> - return ret;
> -
I'm not familiar with the amdgpu driver but yes, something like that
is what I had in mind.
--
Best regards,
Javier Martinez Canillas
Core Platforms
Red Hat
^ permalink raw reply [flat|nested] 9+ messages in thread