Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Stewart Hildebrand <stewart.hildebrand@amd.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>, <linux-pci@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 3/8] PCI: Restore resource alignment
Date: Wed, 14 Aug 2024 14:12:08 -0400	[thread overview]
Message-ID: <cc986313-4eb3-4ec9-a217-5d8dac141fe9@amd.com> (raw)
In-Reply-To: <20240808215443.GA158993@bhelgaas>

On 8/8/24 17:54, Bjorn Helgaas wrote:
> On Thu, Aug 08, 2024 at 04:28:50PM -0400, Stewart Hildebrand wrote:
>> On 8/8/24 15:28, Bjorn Helgaas wrote:
>>> On Wed, Aug 07, 2024 at 11:17:12AM -0400, Stewart Hildebrand wrote:
>>>> Devices with alignment specified will lose their alignment in cases when
>>>> the bridge resources have been released, e.g. due to insufficient bridge
>>>> window size. Restore the alignment.
>>>
>>> I guess this fixes a problem when the user has specified
>>> "pci=resource_alignment=..." and we've decided to release and
>>> reallocate a bridge window?  Just looking for a bit more concrete
>>> description of what this problem would look like to a user.
>>
>> Yes. When alignment has been specified via pcibios_default_alignment()
>> or by the user with "pci=resource_alignment=...", and the bridge window
>> is being reallocated, the specified alignment is lost and the resource
>> may not be sufficiently aligned after reallocation.
>>
>> I can expand the commit description.
> 
> I think a hint about where the alignment gets lost would be helpful,
> too.
> 
> This seems like a problem users could be seeing today, even
> independent of the device passthrough issue that I think is the main
> thrust of this series.  If there's a problem report or an easy way to
> reproduce this problem, that would be nice, too.

Oh, sorry, I just realized that it's only alignments with
IORESOURCE_STARTALIGN that get lost during bridge window realloc.
Specifically, r->start gets overwritten in __release_child_resources().
There are a few code paths, such as the one in
__release_child_resources(), that assume IORESOURCE_SIZEALIGN. In case
of IORESOURCE_SIZEALIGN, the alignment is restored by a simple
calculation. However, with IORESOURCE_STARTALIGN, we can't use a simple
calculation, instead we need to consult
pci_reassigndev_resource_alignment() to restore the alignment. I'll
update the commit message.

An alternative approach might be to store the alignment in a new member
in struct resource, thus saving us from having to call
pci_reassigndev_resource_alignment() for each PCI device on the bridge
undergoing reallocation.

BTW, this patch and the two "[powerpc,x86]/pci: Preserve
IORESOURCE_STARTALIGN alignment" patches could potentially be folded
into a single patch as they're all dealing with fixing
IORESOURCE_STARTALIGN.

To repro the issue on x86, we will need to apply this series except the
current patch [3/8].

I ran into the issue with the following device. Let's call it example 1.
Scrubbed/partial output from lspci -vv:

	Region 0: Memory at d1924600 (32-bit, non-prefetchable) [size=256]
	Region 1: Memory at d1924400 (32-bit, non-prefetchable) [size=512]
	Region 2: Memory at d1924000 (32-bit, non-prefetchable) [size=1K]
	Region 3: Memory at d1920000 (32-bit, non-prefetchable) [size=16K]
	Region 4: Memory at d1900000 (32-bit, non-prefetchable) [size=128K]
	Region 5: Memory at d1800000 (32-bit, non-prefetchable) [size=1M]
	Capabilities: [b0] MSI-X: Enable- Count=2 Masked-
		Vector table: BAR=0 offset=00000080
		PBA: BAR=0 offset=00000090
	Capabilities: [200 v1] Single Root I/O Virtualization (SR-IOV)
		IOVCap:	Migration-, Interrupt Message Number: 000
		IOVCtl:	Enable- Migration- Interrupt- MSE- ARIHierarchy+
		IOVSta:	Migration-
		Initial VFs: 4, Total VFs: 4, Number of VFs: 0, Function Dependency Link: 00
		VF offset: 6, stride: 1, Device ID: 0100
		Supported Page Size: 00000553, System Page Size: 00000001
		Region 0: Memory at 00000000d0800000 (64-bit, non-prefetchable)
		VF Migration: offset: 00000000, BIR: 0
	Kernel driver in use: pci-endpoint-test
	Kernel modules: pci_endpoint_test

BARs 0, 1, and 2 are small, and the host firmware placed them on the
same page. The host firmware did not page align the BARs. There is also
an SR-IOV BAR that the firmware couldn't fit in the bridge window.

In example 1, I did not specify pci=realloc=on. I used a kernel with
CONFIG_PCI_REALLOC_ENABLE_AUTO=y, and the SR-IOV BAR triggered the
bridge window realloc. Alignment was requested for BARs 0, 1, and 2 of
this device, using IORESOURCE_STARTALIGN, because of patch [8/8]. The
alignment was lost during realloc.

Such a device from example 1 may be hard to come by, so here's a way to
reproduce with qemu. Example 2:

01:00.0 Unclassified device [00ff]: Red Hat, Inc. QEMU PCI Test Device
        Subsystem: Red Hat, Inc. QEMU Virtual Machine
        Flags: fast devsel
        Memory at fe800000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at c000 [size=256]
        Memory at 7050000000 (64-bit, prefetchable) [size=32]

01:01.0 Unclassified device [00ff]: Red Hat, Inc. QEMU PCI Test Device
        Subsystem: Red Hat, Inc. QEMU Virtual Machine
        Flags: fast devsel
        Memory at fe801000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at c100 [size=256]
        Memory at 7050000020 (64-bit, prefetchable) [size=32]

01:02.0 Unclassified device [00ff]: Red Hat, Inc. QEMU PCI Test Device
        Subsystem: Red Hat, Inc. QEMU Virtual Machine
        Flags: fast devsel
        Memory at fe802000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at c200 [size=256]
        Memory at 7050000040 (64-bit, prefetchable) [size=32]

01:03.0 Unclassified device [00ff]: Red Hat, Inc. QEMU PCI Test Device
        Subsystem: Red Hat, Inc. QEMU Virtual Machine
        Flags: fast devsel
        Memory at fe803000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at c300 [size=256]
        Memory at 7040000000 (64-bit, prefetchable) [size=256M]

This example can reproduced with Qemu's pci-testdev and a SeaBIOS hack.
In this case we will need to specify pci=realloc=on to trigger the
realloc because there's no SR-IOV BAR to trigger it automatically. Add
this to your usual qemu-system-x86_64 args:

    -append "console=ttyS0 ignore_loglevel pci=realloc=on" \
    -device pcie-pci-bridge,id=pcie.1 \
    -device pci-testdev,bus=pcie.1,membar=32 \
    -device pci-testdev,bus=pcie.1,membar=32 \
    -device pci-testdev,bus=pcie.1,membar=32 \
    -device pci-testdev,bus=pcie.1,membar=268435456

Apply this SeaBIOS hack:

diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c
index b3e359d7..769007a4 100644
--- a/src/fw/pciinit.c
+++ b/src/fw/pciinit.c
@@ -25,7 +25,7 @@
 #include "util.h" // pci_setup
 #include "x86.h" // outb

-#define PCI_DEVICE_MEM_MIN    (1<<12)  // 4k == page size
+#define PCI_DEVICE_MEM_MIN    (0)
 #define PCI_BRIDGE_MEM_MIN    (1<<21)  // 2M == hugepage size
 #define PCI_BRIDGE_IO_MIN      0x1000  // mandated by pci bridge spec
@@ -1089,6 +1089,7 @@ pci_region_map_one_entry(struct pci_region_entry *entry, u64 addr)
         pci_config_writew(bdf, PCI_MEMORY_LIMIT, limit >> PCI_MEMORY_SHIFT);
     }
     if (entry->type == PCI_REGION_TYPE_PREFMEM) {
+        limit = addr + PCI_BRIDGE_MEM_MIN - 1;
         pci_config_writew(bdf, PCI_PREF_MEMORY_BASE, addr >> PCI_PREF_MEMORY_SHIFT);
         pci_config_writew(bdf, PCI_PREF_MEMORY_LIMIT, limit >> PCI_PREF_MEMORY_SHIFT);
         pci_config_writel(bdf, PCI_PREF_BASE_UPPER32, addr >> 32);


  reply	other threads:[~2024-08-14 18:12 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-07 15:17 [PATCH v3 0/8] PCI: Align small BARs Stewart Hildebrand
2024-08-07 15:17 ` [PATCH v3 1/8] x86/PCI: Improve code readability Stewart Hildebrand
2024-08-08  8:55   ` Ilpo Järvinen
2024-08-07 15:17 ` [PATCH v3 2/8] PCI: Don't unnecessarily disable memory decoding Stewart Hildebrand
2024-08-07 15:17 ` [PATCH v3 3/8] PCI: Restore resource alignment Stewart Hildebrand
2024-08-08 19:28   ` Bjorn Helgaas
2024-08-08 20:28     ` Stewart Hildebrand
2024-08-08 21:54       ` Bjorn Helgaas
2024-08-14 18:12         ` Stewart Hildebrand [this message]
2024-08-07 15:17 ` [PATCH v3 4/8] PCI: Restore memory decoding after reallocation Stewart Hildebrand
2024-08-08 19:37   ` Bjorn Helgaas
2024-08-14 18:31     ` Stewart Hildebrand
2024-08-07 15:17 ` [PATCH v3 5/8] x86/PCI: Preserve IORESOURCE_STARTALIGN alignment Stewart Hildebrand
2024-08-08 20:10   ` Bjorn Helgaas
2024-08-07 15:17 ` [PATCH v3 6/8] powerpc/pci: " Stewart Hildebrand
2024-08-07 15:17 ` [PATCH v3 7/8] PCI: Don't reassign resources that are already aligned Stewart Hildebrand
2024-08-07 15:17 ` [PATCH v3 8/8] PCI: Align small BARs Stewart Hildebrand
2024-08-08 21:53   ` Bjorn Helgaas
2024-08-14 13:55     ` Stewart Hildebrand
2024-08-14 22:05       ` Bjorn Helgaas
2024-08-07 15:42 ` [PATCH v3 0/8] " Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cc986313-4eb3-4ec9-a217-5d8dac141fe9@amd.com \
    --to=stewart.hildebrand@amd.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox