From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: amd-gfx list <amd-gfx@lists.freedesktop.org>,
dri-devel <dri-devel@lists.freedesktop.org>,
Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
"Deucher, Alexander" <Alexander.Deucher@amd.com>
Subject: Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"
Date: Fri, 15 Dec 2023 13:37:35 +0100 [thread overview]
Message-ID: <8bce512e-abb6-495d-85a4-63648229859e@gmail.com> (raw)
In-Reply-To: <CABXGCsMBWwRFRA+EJKF0v6BwZ+uTQHr4Yn9E9_iYgZ6KRbwsJQ@mail.gmail.com>
Am 15.12.23 um 12:45 schrieb Mikhail Gavrilov:
> On Tue, Feb 28, 2023 at 5:43 PM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> The point is it doesn't need to talk to the amdgpu hardware. What it
>> does is that it talks to the good old VGA/VESA emulation and that just
>> happens to be still enabled by the BIOS/GRUB.
>>
>> And that VGA/VESA emulation doesn't need any BAR or whatever to keep the
>> hw running in the state where it was initialized before the kernel
>> started. The kernel just grabs the addresses where it needs to write the
>> display data and keeps going with that.
>>
>> But when a hw specific driver wants to load this is the first thing
>> which gets disabled because we need to load new firmware. And with the
>> BARs disabled this can't be re-enabled without rebooting the system.
>>
>>> My suggestion is that if
>>> amdgpu fails to talk to the hardware, then let another suitable driver
>>> do it. I attached a system log when I apply "pci=nocrs" with
>>> "modprobe.blacklist=amdgpu" for showing that graphics work right in
>>> this case.
>>> To do this, does the Linux module loading mechanism need to be refined?
>> That's actually working as expected. The real problem is that the BIOS
>> on that system is so broken that we can't access the hw correctly.
>>
>> What we could to do is to check the BARs very early on and refuse to
>> load when they are disable. The problem with this approach is that there
>> are systems where it is normal that the BARs are disable until the
>> driver loads and get enabled during the hardware initialization process.
>>
>> What you might want to look into is to find a quirk for the BIOS to
>> properly enable the nvme controller.
>>
> That's interesting. I noticed that now amdgpu could work even with
> parameter [pci=nocrs] on 6.7.0-0.rc4 and higher kernels.
> It means BARs became available?
> I attached here the kerner log and lspci. What's changed?
I have no idea :)
From the logs I can see that the AMDGPU now has the proper BARs assigned:
[ 5.722015] pci 0000:03:00.0: [1002:73df] type 00 class 0x038000
[ 5.722051] pci 0000:03:00.0: reg 0x10: [mem
0xf800000000-0xfbffffffff 64bit pref]
[ 5.722081] pci 0000:03:00.0: reg 0x18: [mem
0xfc00000000-0xfc0fffffff 64bit pref]
[ 5.722112] pci 0000:03:00.0: reg 0x24: [mem 0xfca00000-0xfcafffff]
[ 5.722134] pci 0000:03:00.0: reg 0x30: [mem 0xfcb00000-0xfcb1ffff pref]
[ 5.722368] pci 0000:03:00.0: PME# supported from D1 D2 D3hot D3cold
[ 5.722484] pci 0000:03:00.0: 63.008 Gb/s available PCIe bandwidth,
limited by 8.0 GT/s PCIe x8 link at 0000:00:01.1 (capable of 252.048
Gb/s with 16.0 GT/s PCIe x16 link)
And with that the driver can work perfectly fine.
Have you updated the BIOS or added/removed some other hardware? Maybe
somebody added a quirk for your BIOS into the PCIe code or something
like that.
Regards,
Christian.
next prev parent reply other threads:[~2023-12-15 12:37 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-23 23:40 amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init" Mikhail Gavrilov
2023-02-24 7:12 ` Keyword Review - " Christian König
2023-02-24 7:13 ` Christian König
2023-02-24 8:38 ` Mikhail Gavrilov
2023-02-24 12:29 ` Christian König
2023-02-24 15:31 ` Christian König
2023-02-24 16:21 ` Mikhail Gavrilov
2023-02-27 10:22 ` Christian König
2023-02-28 9:52 ` Mikhail Gavrilov
2023-02-28 12:43 ` Christian König
2023-12-15 11:45 ` Mikhail Gavrilov
2023-12-15 12:37 ` Christian König [this message]
2023-12-19 9:45 ` Mikhail Gavrilov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8bce512e-abb6-495d-85a4-63648229859e@gmail.com \
--to=ckoenig.leichtzumerken@gmail.com \
--cc=Alexander.Deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mikhail.v.gavrilov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox