* A lingering doubt on PCI-MMIO region of PCI-passthrough-device @ 2025-12-14 12:08 Ajay Garg 2025-12-14 19:52 ` Alex Williamson 0 siblings, 1 reply; 7+ messages in thread From: Ajay Garg @ 2025-12-14 12:08 UTC (permalink / raw) To: iommu, linux-pci, Linux Kernel Mailing List Hi everyone. Let's assume x86_64-linux host and guest, with full-virtualization and iommu hardware capabilities. Before launching vm, qemu with the help vfio "installs" "dev1" on the virtual-pci-root-complex of guest. After bootup, the guest does the usual enumeration, finds "dev1" on the pci-bus, and programs the BARs in its domain. However, there is no guarantee that guest-pci-mmio-physical-ranges would be identical to "what would have been" host-pci-mmio-physical-ranges. Then how does the EPT/SLAT tables get set up for correct mapping from GPA => HPA for dev1's-BARs-MMIO-regions ? Will be grateful for pointers. Thanks and Regards, Ajay ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: A lingering doubt on PCI-MMIO region of PCI-passthrough-device 2025-12-14 12:08 A lingering doubt on PCI-MMIO region of PCI-passthrough-device Ajay Garg @ 2025-12-14 19:52 ` Alex Williamson 2025-12-15 3:50 ` Ajay Garg 0 siblings, 1 reply; 7+ messages in thread From: Alex Williamson @ 2025-12-14 19:52 UTC (permalink / raw) To: Ajay Garg; +Cc: iommu, linux-pci, Linux Kernel Mailing List On Sun, 14 Dec 2025 17:38:50 +0530 Ajay Garg <ajaygargnsit@gmail.com> wrote: > Hi everyone. > > Let's assume x86_64-linux host and guest, with full-virtualization and > iommu hardware capabilities. > > Before launching vm, qemu with the help vfio "installs" "dev1" on the > virtual-pci-root-complex of guest. > After bootup, the guest does the usual enumeration, finds "dev1" on > the pci-bus, and programs the BARs in its domain. > > However, there is no guarantee that guest-pci-mmio-physical-ranges > would be identical to "what would have been" > host-pci-mmio-physical-ranges. > Then how does the EPT/SLAT tables get set up for correct mapping from > GPA => HPA for dev1's-BARs-MMIO-regions ? The guest doesn't get to program the device physical BARs, nor does it require identity mapping in the guest. The BAR programming is virtualized. QEMU mmaps the BAR and that mmap is the backing for the mapping into the guest. Thanks, Alex ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: A lingering doubt on PCI-MMIO region of PCI-passthrough-device 2025-12-14 19:52 ` Alex Williamson @ 2025-12-15 3:50 ` Ajay Garg 2025-12-19 6:23 ` Ajay Garg 0 siblings, 1 reply; 7+ messages in thread From: Ajay Garg @ 2025-12-15 3:50 UTC (permalink / raw) To: Alex Williamson; +Cc: iommu, linux-pci, Linux Kernel Mailing List Thanks Alex. So does something like the following happen : i) During bootup, guest starts pci-enumeration as usual. ii) Upon discovering the "passthrough-device", guest carves the physical MMIO regions (as usual) in the guest's physical-address-space, and starts-to/attempts to program the BARs with the guest-physical-base-addresses carved out. iii) These attempts to program the BARs (lying in the "passthrough-device"'s config-space), are intercepted by the hypervisor instead (causing a VM-exit in the interim). iv) The hypervisor uses the above info to update the EPT, to ensure GPA => HPA conversions go fine when the guest tries to access the PCI-MMIO regions later (once gurst is fully booted up). Also, the hypervisor marks the operation as success (without "really" re-programming the BARs). v) The VM-entry is called, and the guest resumes with the "impression" that the BARs have been "programmed by guest". Is the above sequencing correct at a bird's view level? Once again, many thanks for the help ! Thanks and Regards, Ajay ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: A lingering doubt on PCI-MMIO region of PCI-passthrough-device 2025-12-15 3:50 ` Ajay Garg @ 2025-12-19 6:23 ` Ajay Garg 2025-12-20 0:06 ` Alex Williamson 0 siblings, 1 reply; 7+ messages in thread From: Ajay Garg @ 2025-12-19 6:23 UTC (permalink / raw) To: Alex Williamson, QEMU Developers Cc: iommu, linux-pci, Linux Kernel Mailing List Hi Alex. Kindly help if the steps listed in the previous email are correct. (Have added qemu mailing-list too, as it might be a qemu thing too as virtual-pci is in picture). On Mon, Dec 15, 2025 at 9:20 AM Ajay Garg <ajaygargnsit@gmail.com> wrote: > > Thanks Alex. > > So does something like the following happen : > > i) > During bootup, guest starts pci-enumeration as usual. > > ii) > Upon discovering the "passthrough-device", guest carves the physical > MMIO regions (as usual) in the guest's physical-address-space, and > starts-to/attempts to program the BARs with the > guest-physical-base-addresses carved out. > > iii) > These attempts to program the BARs (lying in the > "passthrough-device"'s config-space), are intercepted by the > hypervisor instead (causing a VM-exit in the interim). > > iv) > The hypervisor uses the above info to update the EPT, to ensure GPA => > HPA conversions go fine when the guest tries to access the PCI-MMIO > regions later (once gurst is fully booted up). Also, the hypervisor > marks the operation as success (without "really" re-programming the > BARs). > > v) > The VM-entry is called, and the guest resumes with the "impression" > that the BARs have been "programmed by guest". > > Is the above sequencing correct at a bird's view level? > > > Once again, many thanks for the help ! > > Thanks and Regards, > Ajay ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: A lingering doubt on PCI-MMIO region of PCI-passthrough-device 2025-12-19 6:23 ` Ajay Garg @ 2025-12-20 0:06 ` Alex Williamson 2025-12-20 12:52 ` Ajay Garg 0 siblings, 1 reply; 7+ messages in thread From: Alex Williamson @ 2025-12-20 0:06 UTC (permalink / raw) To: Ajay Garg; +Cc: QEMU Developers, iommu, linux-pci, Linux Kernel Mailing List On Fri, 19 Dec 2025 11:53:56 +0530 Ajay Garg <ajaygargnsit@gmail.com> wrote: > Hi Alex. > Kindly help if the steps listed in the previous email are correct. > > (Have added qemu mailing-list too, as it might be a qemu thing too as > virtual-pci is in picture). > > On Mon, Dec 15, 2025 at 9:20 AM Ajay Garg <ajaygargnsit@gmail.com> wrote: > > > > Thanks Alex. > > > > So does something like the following happen : > > > > i) > > During bootup, guest starts pci-enumeration as usual. > > > > ii) > > Upon discovering the "passthrough-device", guest carves the physical > > MMIO regions (as usual) in the guest's physical-address-space, and > > starts-to/attempts to program the BARs with the > > guest-physical-base-addresses carved out. > > > > iii) > > These attempts to program the BARs (lying in the > > "passthrough-device"'s config-space), are intercepted by the > > hypervisor instead (causing a VM-exit in the interim). > > > > iv) > > The hypervisor uses the above info to update the EPT, to ensure GPA => > > HPA conversions go fine when the guest tries to access the PCI-MMIO > > regions later (once gurst is fully booted up). Also, the hypervisor > > marks the operation as success (without "really" re-programming the > > BARs). > > > > v) > > The VM-entry is called, and the guest resumes with the "impression" > > that the BARs have been "programmed by guest". > > > > Is the above sequencing correct at a bird's view level? It's not far off. The key is simply that we can create a host virtual mapping to the device BARs, ie. an mmap. The guest enumerates emulated BARs, they're only used for sizing and locating the BARs in the guest physical address space. When the guest BAR is programmed and memory enabled, the address space in QEMU is populated at the BAR indicated GPA using the mmap backing. KVM memory slots are used to fill the mappings in the vCPU. The same BAR mmap is also used to provide DMA mapping of the BAR through the IOMMU in the legacy type1 IOMMU backend case. Barring a vIOMMU, the IOMMU IOVA space is the guest physical address space. Thanks, Alex ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: A lingering doubt on PCI-MMIO region of PCI-passthrough-device 2025-12-20 0:06 ` Alex Williamson @ 2025-12-20 12:52 ` Ajay Garg 2025-12-20 13:24 ` Ajay Garg 0 siblings, 1 reply; 7+ messages in thread From: Ajay Garg @ 2025-12-20 12:52 UTC (permalink / raw) To: Alex Williamson Cc: QEMU Developers, iommu, linux-pci, Linux Kernel Mailing List Thanks Alex. I was/am aware of GPA-ranges backed by mmap'ed HVA-ranges. On further thought, I think I have all the missing pieces (except one, as mentioned at last in current email). I'll list the steps below : a) There are three stages : * pre-configuration by host/qemu. * guest-vm bios. * guest-vm kernel. b) Host procures following memory-slots (amongst others) via mmap : * guest-ram * pci-config-space : via vfio's ioctls' help. * pci-bar-mmio-space : via vfio's ioctls' help. For the above memory-slots, * guest-ram physical-address is known (0), so ept-mappings for guest-ram are set up even before guest-vm begins to boot up. * there is no concept of guest-physical-address for pci-config-space. * pci-bar-mmio-space physical address is not known yet, so ept-mappings for pci-bar-mmio-space are not set up (yet). c) qemu starts the guest, and guest-vm-bios runs next. This bios is "owned by qemu", and is "definitely different" from the host-bios (qemu is an altogether different "hardware"). qemu-bios and host-bios handle pci bus/enumeration "completely differently". When the pci-enumeration runs during this guest-vm-bios stage, it accesses the pci-device config-space (backed on the host by mmap'ed mappings). Note that guest-kernel is still not in picture. "OBVIOUSLY", all accesses (reads/writes) to pci-config space go to the pci-config-space memory-slot (handled purely by qemu-bios code). Once the guest-vm bios carves out guest-physical-addresses for the pci-device-bars, it programs the bars by writing to bars-offsets in the pci-config-space. qemu detects this, and does the following : * does not relay the actual-writes to physical bars on the host. * since the bar-guest-physical-addresses are now known, so now the missing ept-mappings for pci-bar-mmio-space are now set up. d) Finally, guest-kernel takes over, and * all accesses to ram go through vanilla two-stages translation. * all accesses to pci-bars-mmio go through vanilla two-stages translation. Requests : i) Alex / QEMU-experts : kindly correct me if I am wrong :) till now. ii) Once kernel boots up, how are accesses to pci-config-space handled? Is again qemu-bios involved in pci-config-space accesses after guest-kernel has booted up? Once again, many thanks to everyone for their time and help. Thanks and Regards, Ajay On Sat, Dec 20, 2025 at 5:36 AM Alex Williamson <alex@shazbot.org> wrote: > > On Fri, 19 Dec 2025 11:53:56 +0530 > Ajay Garg <ajaygargnsit@gmail.com> wrote: > > > Hi Alex. > > Kindly help if the steps listed in the previous email are correct. > > > > (Have added qemu mailing-list too, as it might be a qemu thing too as > > virtual-pci is in picture). > > > > On Mon, Dec 15, 2025 at 9:20 AM Ajay Garg <ajaygargnsit@gmail.com> wrote: > > > > > > Thanks Alex. > > > > > > So does something like the following happen : > > > > > > i) > > > During bootup, guest starts pci-enumeration as usual. > > > > > > ii) > > > Upon discovering the "passthrough-device", guest carves the physical > > > MMIO regions (as usual) in the guest's physical-address-space, and > > > starts-to/attempts to program the BARs with the > > > guest-physical-base-addresses carved out. > > > > > > iii) > > > These attempts to program the BARs (lying in the > > > "passthrough-device"'s config-space), are intercepted by the > > > hypervisor instead (causing a VM-exit in the interim). > > > > > > iv) > > > The hypervisor uses the above info to update the EPT, to ensure GPA => > > > HPA conversions go fine when the guest tries to access the PCI-MMIO > > > regions later (once gurst is fully booted up). Also, the hypervisor > > > marks the operation as success (without "really" re-programming the > > > BARs). > > > > > > v) > > > The VM-entry is called, and the guest resumes with the "impression" > > > that the BARs have been "programmed by guest". > > > > > > Is the above sequencing correct at a bird's view level? > > It's not far off. The key is simply that we can create a host virtual > mapping to the device BARs, ie. an mmap. The guest enumerates emulated > BARs, they're only used for sizing and locating the BARs in the guest > physical address space. When the guest BAR is programmed and memory > enabled, the address space in QEMU is populated at the BAR indicated > GPA using the mmap backing. KVM memory slots are used to fill the > mappings in the vCPU. The same BAR mmap is also used to provide DMA > mapping of the BAR through the IOMMU in the legacy type1 IOMMU backend > case. Barring a vIOMMU, the IOMMU IOVA space is the guest physical > address space. Thanks, > > Alex ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: A lingering doubt on PCI-MMIO region of PCI-passthrough-device 2025-12-20 12:52 ` Ajay Garg @ 2025-12-20 13:24 ` Ajay Garg 0 siblings, 0 replies; 7+ messages in thread From: Ajay Garg @ 2025-12-20 13:24 UTC (permalink / raw) To: Alex Williamson Cc: QEMU Developers, iommu, linux-pci, Linux Kernel Mailing List Is guest-acpi-mcfg/mmconfig tables the answer to my last question? :) i.e. qemu-bios setting up acpi mcfg / mmconfig addresses, which are backed by mmap'ed pci-config-space mappings on the host (while also setting up ept-mappings for pci-config-space too)? On Sat, Dec 20, 2025 at 6:22 PM Ajay Garg <ajaygargnsit@gmail.com> wrote: > > Thanks Alex. > > I was/am aware of GPA-ranges backed by mmap'ed HVA-ranges. > On further thought, I think I have all the missing pieces (except one, > as mentioned at last in current email). > > I'll list the steps below : > > a) > There are three stages : > > * pre-configuration by host/qemu. > * guest-vm bios. > * guest-vm kernel. > > b) > Host procures following memory-slots (amongst others) via mmap : > > * guest-ram > * pci-config-space : via vfio's ioctls' help. > * pci-bar-mmio-space : via vfio's ioctls' help. > > For the above memory-slots, > > * > guest-ram physical-address is known (0), so ept-mappings for guest-ram > are set up even before guest-vm begins to boot up. > > * > there is no concept of guest-physical-address for pci-config-space. > > * > pci-bar-mmio-space physical address is not known yet, so ept-mappings > for pci-bar-mmio-space are not set up (yet). > > c) > qemu starts the guest, and guest-vm-bios runs next. > > This bios is "owned by qemu", and is "definitely different" from the > host-bios (qemu is an altogether different "hardware"). qemu-bios and > host-bios handle pci bus/enumeration "completely differently". > > When the pci-enumeration runs during this guest-vm-bios stage, it > accesses the pci-device config-space (backed on the host by mmap'ed > mappings). Note that guest-kernel is still not in picture. > > "OBVIOUSLY", all accesses (reads/writes) to pci-config space go to the > pci-config-space memory-slot (handled purely by qemu-bios code). > > Once the guest-vm bios carves out guest-physical-addresses for the > pci-device-bars, it programs the bars by writing to bars-offsets in > the pci-config-space. qemu detects this, and does the following : > > * does not relay the actual-writes to physical bars on the host. > * since the bar-guest-physical-addresses are now known, so now the > missing ept-mappings > for pci-bar-mmio-space are now set up. > > d) > Finally, guest-kernel takes over, and > > * all accesses to ram go through vanilla two-stages translation. > * all accesses to pci-bars-mmio go through vanilla two-stages translation. > > > Requests : > > i) > Alex / QEMU-experts : kindly correct me if I am wrong :) till now. > > ii) > Once kernel boots up, how are accesses to pci-config-space handled? Is > again qemu-bios involved in pci-config-space accesses after > guest-kernel has booted up? > > > Once again, many thanks to everyone for their time and help. > > Thanks and Regards, > Ajay > > > On Sat, Dec 20, 2025 at 5:36 AM Alex Williamson <alex@shazbot.org> wrote: > > > > On Fri, 19 Dec 2025 11:53:56 +0530 > > Ajay Garg <ajaygargnsit@gmail.com> wrote: > > > > > Hi Alex. > > > Kindly help if the steps listed in the previous email are correct. > > > > > > (Have added qemu mailing-list too, as it might be a qemu thing too as > > > virtual-pci is in picture). > > > > > > On Mon, Dec 15, 2025 at 9:20 AM Ajay Garg <ajaygargnsit@gmail.com> wrote: > > > > > > > > Thanks Alex. > > > > > > > > So does something like the following happen : > > > > > > > > i) > > > > During bootup, guest starts pci-enumeration as usual. > > > > > > > > ii) > > > > Upon discovering the "passthrough-device", guest carves the physical > > > > MMIO regions (as usual) in the guest's physical-address-space, and > > > > starts-to/attempts to program the BARs with the > > > > guest-physical-base-addresses carved out. > > > > > > > > iii) > > > > These attempts to program the BARs (lying in the > > > > "passthrough-device"'s config-space), are intercepted by the > > > > hypervisor instead (causing a VM-exit in the interim). > > > > > > > > iv) > > > > The hypervisor uses the above info to update the EPT, to ensure GPA => > > > > HPA conversions go fine when the guest tries to access the PCI-MMIO > > > > regions later (once gurst is fully booted up). Also, the hypervisor > > > > marks the operation as success (without "really" re-programming the > > > > BARs). > > > > > > > > v) > > > > The VM-entry is called, and the guest resumes with the "impression" > > > > that the BARs have been "programmed by guest". > > > > > > > > Is the above sequencing correct at a bird's view level? > > > > It's not far off. The key is simply that we can create a host virtual > > mapping to the device BARs, ie. an mmap. The guest enumerates emulated > > BARs, they're only used for sizing and locating the BARs in the guest > > physical address space. When the guest BAR is programmed and memory > > enabled, the address space in QEMU is populated at the BAR indicated > > GPA using the mmap backing. KVM memory slots are used to fill the > > mappings in the vCPU. The same BAR mmap is also used to provide DMA > > mapping of the BAR through the IOMMU in the legacy type1 IOMMU backend > > case. Barring a vIOMMU, the IOMMU IOVA space is the guest physical > > address space. Thanks, > > > > Alex ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-12-20 13:25 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-14 12:08 A lingering doubt on PCI-MMIO region of PCI-passthrough-device Ajay Garg 2025-12-14 19:52 ` Alex Williamson 2025-12-15 3:50 ` Ajay Garg 2025-12-19 6:23 ` Ajay Garg 2025-12-20 0:06 ` Alex Williamson 2025-12-20 12:52 ` Ajay Garg 2025-12-20 13:24 ` Ajay Garg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox