From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: amd-gfx list <amd-gfx@lists.freedesktop.org>,
dri-devel <dri-devel@lists.freedesktop.org>,
Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
"Deucher, Alexander" <Alexander.Deucher@amd.com>
Subject: Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"
Date: Fri, 24 Feb 2023 13:29:32 +0100 [thread overview]
Message-ID: <43016018-4d0a-94dc-ce93-b4bff2dce71c@gmail.com> (raw)
In-Reply-To: <CABXGCsM7JPxtQm6B7vk+ZcXfphgQm=ArJZKiDUdbk9hujyRtmg@mail.gmail.com>
Am 24.02.23 um 09:38 schrieb Mikhail Gavrilov:
> On Fri, Feb 24, 2023 at 12:13 PM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> Hi Mikhail,
>>
>> this is pretty clearly a problem with the system and/or it's BIOS and
>> not the GPU hw or the driver.
>>
>> The option pci=nocrs makes the kernel ignore additional resource windows
>> the BIOS reports through ACPI. This then most likely leads to problems
>> with amdgpu because it can't bring up its PCIe resources any more.
>>
>> The output of "sudo lspci -vvvv -s $BUSID_OF_AMDGPU" might help
>> understand the problem
> I attach both lspci for pci=nocrs and without pci=nocrs.
>
> The differences for Cezanne Radeon Vega Series:
> with pci=nocrs:
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
> Interrupt: pin A routed to IRQ 255
> Region 4: I/O ports at e000 [disabled] [size=256]
> Capabilities: [c0] MSI-X: Enable- Count=4 Masked-
>
> Without pci=nocrs:
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx+
> Interrupt: pin A routed to IRQ 44
> Region 4: I/O ports at e000 [size=256]
> Capabilities: [c0] MSI-X: Enable+ Count=4 Masked-
>
>
> The differences for Navi 22 Radeon 6800M:
> with pci=nocrs:
> Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
> Interrupt: pin A routed to IRQ 255
> Region 0: Memory at f800000000 (64-bit, prefetchable) [disabled] [size=16G]
> Region 2: Memory at fc00000000 (64-bit, prefetchable) [disabled] [size=256M]
> Region 5: Memory at fca00000 (32-bit, non-prefetchable) [disabled] [size=1M]
Well that explains it. When the PCI subsystem has to disable the BARs of
the GPU we can't access it any more.
The only thing we could do is to make sure that the driver at least
fails gracefully.
Do you still have network access to the box when amdgpu fails to load
and could grab whatevery is in dmesg?
Thanks,
Christian.
> AtomicOpsCtl: ReqEn-
> Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
> Address: 0000000000000000 Data: 0000
>
> Without pci=nocrs:
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx+
> Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 103
> Region 0: Memory at f800000000 (64-bit, prefetchable) [size=16G]
> Region 2: Memory at fc00000000 (64-bit, prefetchable) [size=256M]
> Region 5: Memory at fca00000 (32-bit, non-prefetchable) [size=1M]
> AtomicOpsCtl: ReqEn+
> Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> Address: 00000000fee00000 Data: 0000
>
>> but I strongly suggest to try a BIOS update first.
> This is the first thing that was done. And I am afraid no more BIOS updates.
> https://rog.asus.com/laptops/rog-strix/2021-rog-strix-g15-advantage-edition-series/helpdesk_bios/
>
> I also have experience in dealing with manufacturers' tech support.
> Usually it ends with "we do not provide drivers for Linux".
>
next prev parent reply other threads:[~2023-02-24 12:29 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-23 23:40 amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init" Mikhail Gavrilov
2023-02-24 7:12 ` Keyword Review - " Christian König
2023-02-24 7:13 ` Christian König
2023-02-24 8:38 ` Mikhail Gavrilov
2023-02-24 12:29 ` Christian König [this message]
2023-02-24 15:31 ` Christian König
2023-02-24 16:21 ` Mikhail Gavrilov
2023-02-27 10:22 ` Christian König
2023-02-28 9:52 ` Mikhail Gavrilov
2023-02-28 12:43 ` Christian König
2023-12-15 11:45 ` Mikhail Gavrilov
2023-12-15 12:37 ` Christian König
2023-12-19 9:45 ` Mikhail Gavrilov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43016018-4d0a-94dc-ce93-b4bff2dce71c@gmail.com \
--to=ckoenig.leichtzumerken@gmail.com \
--cc=Alexander.Deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mikhail.v.gavrilov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox