Linux PCI subsystem development
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>, "Dag B" <dag@bakke.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>, linux-pci@vger.kernel.org
Subject: Re: PCIE BAR resizing blocked by another BAR on same device?
Date: Fri, 19 Apr 2024 17:31:28 +0200	[thread overview]
Message-ID: <71456a7b-8dbd-4108-ad15-ddd7c0811b0d@amd.com> (raw)
In-Reply-To: <815337f1-920f-b2ad-7f28-b1b366eb23f5@linux.intel.com>

Am 19.04.24 um 17:19 schrieb Ilpo Järvinen:
> On Thu, 18 Apr 2024, Dag B wrote:
>> On 18.04.2024 14:24, Christian König wrote:
>>> Am 18.04.24 um 12:42 schrieb Dag B:
>>>> [SNIP]
>>>>>>> Is there a good ELI13 resource explaining how resizable BAR works in
>>>>>>> Linux?
>>>>>>>
>>>>>>> My current kernel command-line contains: pci=assign-busses,realloc
>>>>> That's a really really bad idea. The "assign-busses" flag was introduced
>>>>> to get 20year old laptops to see their cardbus PCI devices.
>>>> I threw a lot of mud at the wall to see what stuck. Removing it now did
>>>> not make a big difference.
>>>>
>>>> Removing realloc prevents the second TB3 GPU from being initialized, so
>>>> keeping that for now.
>>> That's really interesting. Why does it fail without that?
>>>
>>> It basically means that your BIOS is somehow broken and only the Linux PCI
>>> subsystem is able to assign resources correctly.
>>>
>>> Please provide the output of "sudo lspci -v" and "sudo lspci -tv" as file
>>> attachment (*not* inline in a mail!).
>>
>> In case I have expressed myself awkwardly, the realloc=off case appears to
>> make the device driver have issues with the second GPU.
>>
>>
>> I have attached both outputs, for realloc=off.
>>
>> Not knowing what is considered acceptable message sizes on this m/l, I
>> uploaded the same for realloc=on, as well as output from dmesg for both cases
>> to:
>>
>> https://github.com/dagbdagb/p53
>>
>> If the m/l has mechanisms to archive attachments and replace them with links,
>> I'll redo the exercise in a follow-up email. I understand the value of having
>> the 'context' of the discussion readily available in one place.
> The mem BAR & bridge window configuration is identical between
> realloc=on/off.
>
> The error seems to relate to io BAR:
>
> [    2.782439] nvidia 0000:09:00.0: BAR 5 [io  0x0000-0x007f]: not claimed; can't enable device
> [    2.783139] NVRM: pci_enable_device failed, aborting
>
> With realloc=on, the entire IO window is disabled for the bridges and for
> some reason nvidia doesn't abort in that case.

That actually makes a lot of sense.

At least on AMD hardware the IO window is only used for VGA emulation 
and I strongly suspect it's the same on the NVIDIA GPUs.

So what basically happens is that the BIOS for some reason enables the 
IO range on both GPUs while when Linux makes the re-alloc it disables 
the ranges. Most likely because the Linux PCI code knows that they 
should only be used if this device is the primary VGA device used during 
boot.

Now when pci_enable_device() is called the function checks if all 
enabled BARs actually have resources and without realloc=on the I/O BAR 
has nothing allocated and the function fails. While with realloc=on the 
BAR is disabled.

Well, what a mess. @Dag I would just strongly suggest to see if you can 
update the BIOS. What happens here is clearly incorrect.

Regarding the resizing as far as I can see the BIOS allocates only a 
single 1GiB window to the upstream bridge, that is most likely way to 
small for anything than the default 256MiB BAR.

Maybe try to force assign more address space to this bridge. IIRC one of 
the kernel parameters could be used for that, but of hand I don't 
remember the syntax.

Regards,
Christian.





  reply	other threads:[~2024-04-19 15:31 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-17 13:16 PCIE BAR resizing blocked by another BAR on same device? Dag B
2024-04-17 15:13 ` Bjorn Helgaas
2024-04-18  7:51   ` Christian König
2024-04-18 10:42     ` Dag B
2024-04-18 12:24       ` Christian König
2024-04-18 13:13         ` Dag B
2024-04-18 22:54           ` Dag B
2024-04-19 15:19           ` Ilpo Järvinen
2024-04-19 15:31             ` Christian König [this message]
2024-04-27 20:42               ` Dag B

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=71456a7b-8dbd-4108-ad15-ddd7c0811b0d@amd.com \
    --to=christian.koenig@amd.com \
    --cc=dag@bakke.com \
    --cc=helgaas@kernel.org \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox