From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alex Williamson <alex.williamson@nvidia.com>
Cc: "Tushar Dave" <tdave@nvidia.com>,
"Cédric Le Goater" <clg@redhat.com>,
"Ard Biesheuvel" <ardb@kernel.org>,
"devel@edk2.groups.io" <devel@edk2.groups.io>,
qemu-devel@nongnu.org, jgg@nvidia.com, skolothumtho@nvidia.com,
qemu-arm@nongnu.org, peter.maydell@linaro.org,
marcel.apfelbaum@gmail.com
Subject: Re: [edk2-devel] [RFC PATCH 0/8] hw/arm/virt, hw/pci: PCI pre-enumeration and fixed BAR allocation
Date: Tue, 12 May 2026 19:12:04 -0400 [thread overview]
Message-ID: <20260512191140-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20260512170650.4551c9f6@nvidia.com>
On Tue, May 12, 2026 at 05:06:50PM -0600, Alex Williamson wrote:
> On Tue, 12 May 2026 12:25:45 -0500
> Tushar Dave <tdave@nvidia.com> wrote:
>
> > On 5/11/2026 6:43 AM, Ard Biesheuvel wrote:
> > > Hello Tushar,
> > >
> > > On Fri, 8 May 2026, at 20:37, Tushar Dave via groups.io wrote:
> > >> This RFC introduces a mechanism to specify Guest Physical Addresses
> > >> (GPAs) for PCI BARs, allowing explicit placement of guest MMIO BAR
> > >> addresses to match host physical addresses for assigned devices.
> > >>
> > >> On some platforms, P2P DMA is performed between devices within the same
> > >> IOMMU group. The PCI fabric ACS is configured to permit direct P2P
> > >> without going through the host bridge in order to achieve the required
> > >> performance.
> > >>
> > >> To support this multi-device IOMMU group P2P scenario in virtualization,
> > >> the VM may need to use the same MMIO BAR addresses as the host physical
> > >> address layout.
> > >>
> > >
> > > Did you consider implementing this using Enhanced Allocation (EA)? If so,
> > > could you explain why it is not suitable here?
> >
> > I have not evaluated EA for this design. When I looked at EDK2, I
> > chose PcdPciDisableBusEnumeration because it cleanly preserves fixed
> > BAR programming established by the hypervisor — at the cost of QEMU
> > performing PCI bus number and resource assignment.
> >
> > I did a quick search and do not see EA support in EDK2. Any pointers
> > to EA being used in a similar fashion to achieve fixed BAR placement
> > would be appreciated.
>
> EA wasn't on my radar either, but I did some research and chatted with
> Tushar and I think it could work. I'll sketch out a rough idea of what
> it might looks like.
>
> EA describes BAR equivalents (fixed base address, size, and type) in a
> separate capability while the corresponding device BAR registers appear
> unimplemented. Linux already consumes endpoint EA capabilities and
> marks the resulting resources IORESOURCE_PCI_FIXED. EDK2 doesn't know
> about EA (cap 0x14 isn't defined anywhere in MdePkg, and PciBusDxe
> never consults it afaict), but that turns out to be useful here rather
> than a problem.
>
> Starting at the QEMU device, for a vfio-pci device we'd need to
> virtualize the real BARs as unimplemented and surface that information
> via a synthesized EA capability instead. It's debatable whether this
> is a generic PCI mechanism or vfio-pci specific, whether HPA is
> automatically used as the base address for vfio-pci devices or
> user-specified, and the capability offset in config space. None of
> those fundamentally change the shape of the flow.
>
> For the absolute bare-minimum level of support (EA device on the root
> complex, EA resources don't overlap the VM address space or MMIO range,
> EDK2 firmware, Linux guest booted with pci=nocrs) I think this actually
> works with just adding the EA capability above. Let's walk through
> those constraints and how we relax them.
>
> At the firmware level we lean on the real BAR registers being
> unimplemented for EA devices, so EDK2 allocates no MMIO or IO resources
> for them. Only bus numbers get assigned if the EA device sits in a PCI
> hierarchy. That's exactly what we want, EDK2 doing conventional bus
> assignment but staying out of the EA resource flow entirely.
>
> Instead of firmware EA enlightenment we lean on the guest OS. Linux
> reads endpoint EA today, but the bridge aperture sizing path ignores
> those fixed resources. As Tushar's series demonstrates, generically
> handling mixed "fixed-BAR" and programmable-BAR devices in one
> hierarchy is hard. An incremental Linux enhancement that greatly
> simplifies the problem space would be to program bridge apertures only
> for hierarchies consisting entirely of fixed resources. The math
> becomes trivial (window spans min..max of fixed children, aligned to
> bridge granularity), and there's no regression risk, these hierarchies
> currently fail silently. The sizer ignores fixed children and the
> fixed-claim walk-up finds no containing parent. This enhancement,
> plus the homogeneous-hierarchy constraint, removes the root-complex
> constraint and lets us mirror the bare-metal topologies we need.
>
> Resource ranges are a bit messier. The extent of the EA device ranges
> could be determined in QEMU and the VM address map adjusted to prevent
> overlap. Tushar already has a similar user-specified machine option in
> this series. That range also needs to reach the guest as a CRS (to
> avoid pci=nocrs) but needs to stay distinct from the DT range passed to
> EDK2 for programmable BAR devices so EDK2 won't place a programmable
> BAR or bridge window into the EA region. So long as we keep EA and
> programmable devices in separate hierarchies, EDK2 only needs the
> programmable range via DT and we can add the EA range as additional CRS
> ranges visible only to the guest.
>
> In practice, EDK2 programs all the programmable devices and the EA
> devices live entirely in the additional CRS. A possibly cleaner
> alternative is additional PXB host bridges for the EA devices, each
> with its own CRS. That sidesteps the DT/CRS split entirely since the
> EA PXB has nothing for EDK2 to allocate anyway.
>
> If we agree that homogeneous hierarchies (no mixing of EA and
> programmable BARs) is a reasonable constraint, and possibly extend that
> to homogeneous per host bridge to simplify the CRS mapping, we have the
> following work items:
>
> * Extend Linux EA support to program bridge apertures for subordinate
> homogeneous EA hierarchies.
>
> * Develop options to virtualize programmable BARs as EA for vfio-pci
> devices, if not generically for the benefit of testing.
>
> * Implement a way to poke holes in the VM address space and plumb
> through to account for addresses used by EA devices.
>
> * Provide those same ranges to the guest via CRS (but not via DT to
> EDK2), or alternatively expose them through additional PXB host
> bridges.
>
> Does that shape roughly seem accurate? Are there additional gaps I've
> missed? Thanks,
>
> Alex
just one question why not do it in firmware so windows
is thinkably also handled?
next prev parent reply other threads:[~2026-05-12 23:12 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-08 18:37 [RFC PATCH 0/8] hw/arm/virt, hw/pci: PCI pre-enumeration and fixed BAR allocation Tushar Dave
2026-05-08 18:37 ` [RFC PATCH 1/8] hw/pci: add fixed-bars property to allow fixed BAR addresses Tushar Dave
2026-05-08 18:37 ` [RFC PATCH 2/8] hw/pci: enumerate PCI bus and program bridge bus numbers Tushar Dave
2026-05-08 18:37 ` [RFC PATCH 3/8] hw/pci: introduce allocator for fixed BAR placement Tushar Dave
2026-05-08 18:37 ` [RFC PATCH 4/8] hw/pci: pack remaining BARs and update bridge windows Tushar Dave
2026-05-08 18:37 ` [RFC PATCH 5/8] hw/pci: allocate remaining BARs for buses without fixed BARs Tushar Dave
2026-05-08 18:37 ` [RFC PATCH 6/8] hw/pci: finalize bridge prefetch windows after BAR allocation Tushar Dave
2026-05-08 18:37 ` [RFC PATCH 7/8] hw/arm/virt: add pcie-mmio-window machine property Tushar Dave
2026-05-08 18:37 ` [RFC PATCH 8/8] hw/arm/virt: add pci-pre-enum " Tushar Dave
2026-05-11 7:46 ` [RFC PATCH 0/8] hw/arm/virt, hw/pci: PCI pre-enumeration and fixed BAR allocation Peter Maydell
2026-05-11 12:26 ` Jason Gunthorpe
2026-05-11 18:38 ` Mohamed Mediouni
2026-05-11 20:28 ` Jason Gunthorpe
2026-05-11 9:09 ` Michael S. Tsirkin
2026-05-11 18:10 ` Tushar Dave
2026-05-11 22:09 ` Michael S. Tsirkin
2026-05-11 11:43 ` [edk2-devel] " Ard Biesheuvel
2026-05-12 17:25 ` Tushar Dave
2026-05-12 23:06 ` Alex Williamson
2026-05-12 23:12 ` Michael S. Tsirkin [this message]
2026-05-12 23:57 ` Alex Williamson
2026-05-13 11:36 ` Jason Gunthorpe
2026-05-13 14:25 ` Ard Biesheuvel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260512191140-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=alex.williamson@nvidia.com \
--cc=ardb@kernel.org \
--cc=clg@redhat.com \
--cc=devel@edk2.groups.io \
--cc=jgg@nvidia.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=skolothumtho@nvidia.com \
--cc=tdave@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.