* virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group @ 2023-01-09 13:24 Eric Auger 2023-01-09 21:11 ` Eric Auger 0 siblings, 1 reply; 11+ messages in thread From: Eric Auger @ 2023-01-09 13:24 UTC (permalink / raw) To: Jean-Philippe Brucker, qemu list Cc: Peter Xu, Alex Williamson, Michael S. Tsirkin, jasowang@redhat.com Hi, we have a trouble with virtio-iommu and protected assigned devices downstream to a pcie-to-pci bridge. In that use case we observe the assigned devices are not put to any group. This is true on both x86 and aarch64. This use case works with intel-iommu. *** Guest PCI topology is: lspci -tv -[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller +-01.0 Device 1234:1111 +-02.0-[01-02]----00.0-[02]----01.0 Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller +-02.1-[03]-- +-02.2-[04]----00.0 Red Hat, Inc. Virtio block device +-0a.0 Red Hat, Inc. Device 1057 +-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Controller +-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] \-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller All the assigned devices are aliased and they get devfn=0x0. see qemu pci_device_iommu_address_space in hw/pci.c Initially I see the following traces pci_device_iommu_address_space name=vfio-pci BDF=0x8 bus=0 devfn=0x8 pci_device_iommu_address_space name=vfio-pci BDF=0x8 bus=0 devfn=0x8 call iommu_fn with bus=0x55f556dde180 and devfn=0 virtio_iommu_init_iommu_mr init virtio-iommu-memory-region-0-0 Note the bus is 0 at this time and devfn that is used in the virtio-iommu is 0. So an associated IOMMU MR is created with this bus at devfn=0 slot. This is before bus actual numbering. However later on, I see virtio_iommu_probe() and virtio_iommu_attach() getting called with ep_id=520 because in the qemu virtio-iommu device, virtio_iommu_mr(pe_id) fails to find the iommu_mr and returns -ENOENT On guest side I see that acpi_iommu_configure_id/iommu_probe_device() fails (__iommu_probe_device) and also __iommu_attach_device would also fail anyway. I guess those get called before actual bus number recomputation? on aarch64 I eventually see the "good" MR beeing created, ie. featuring the right bus number: qemu-system-aarch64: pci_device_iommu_address_space name=vfio-pci BDF=0x208 bus=2 devfn=0x8 qemu-system-aarch64: pci_device_iommu_address_space name=vfio-pci BDF=0x208 bus=2 devfn=0x8 call iommu_fn with bus=0xaaaaef12c450 and devfn=0 But this does not happen on x86. Jean, do you have any idea about how to fix that? Do you think we have a trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks like virtio probe and attach commands are called too early, before the bus is actually correctly numbered. Thanks Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group 2023-01-09 13:24 virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group Eric Auger @ 2023-01-09 21:11 ` Eric Auger 2023-01-11 7:14 ` Jason Wang 2023-01-13 12:39 ` Jean-Philippe Brucker 0 siblings, 2 replies; 11+ messages in thread From: Eric Auger @ 2023-01-09 21:11 UTC (permalink / raw) To: Jean-Philippe Brucker, qemu list Cc: Peter Xu, Alex Williamson, Michael S. Tsirkin, jasowang@redhat.com Hi, On 1/9/23 14:24, Eric Auger wrote: > Hi, > > we have a trouble with virtio-iommu and protected assigned devices > downstream to a pcie-to-pci bridge. In that use case we observe the > assigned devices are not put to any group. This is true on both x86 and > aarch64. This use case works with intel-iommu. > > *** Guest PCI topology is: > lspci -tv > -[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM > Controller > +-01.0 Device 1234:1111 > +-02.0-[01-02]----00.0-[02]----01.0 Broadcom Inc. and > subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller > +-02.1-[03]-- > +-02.2-[04]----00.0 Red Hat, Inc. Virtio block device > +-0a.0 Red Hat, Inc. Device 1057 > +-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Controller > +-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port > SATA Controller [AHCI mode] > \-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller > > > All the assigned devices are aliased and they get devfn=0x0. > see qemu pci_device_iommu_address_space in hw/pci.c > > Initially I see the following traces > pci_device_iommu_address_space name=vfio-pci BDF=0x8 bus=0 devfn=0x8 > pci_device_iommu_address_space name=vfio-pci BDF=0x8 bus=0 devfn=0x8 > call iommu_fn with bus=0x55f556dde180 and devfn=0 > virtio_iommu_init_iommu_mr init virtio-iommu-memory-region-0-0 > > Note the bus is 0 at this time and devfn that is used in the > virtio-iommu is 0. So an associated IOMMU MR is created with this bus at > devfn=0 slot. This is before bus actual numbering. > > However later on, I see virtio_iommu_probe() and virtio_iommu_attach() > getting called with ep_id=520 > because in the qemu virtio-iommu device, virtio_iommu_mr(pe_id) fails to > find the iommu_mr and returns -ENOENT > > On guest side I see that > acpi_iommu_configure_id/iommu_probe_device() fails > (__iommu_probe_device) and also __iommu_attach_device would also fail > anyway. > > I guess those get called before actual bus number recomputation? > > on aarch64 I eventually see the "good" MR beeing created, ie. featuring > the right bus number: > qemu-system-aarch64: pci_device_iommu_address_space name=vfio-pci > BDF=0x208 bus=2 devfn=0x8 > qemu-system-aarch64: pci_device_iommu_address_space name=vfio-pci > BDF=0x208 bus=2 devfn=0x8 call iommu_fn with bus=0xaaaaef12c450 and devfn=0 > > But this does not happen on x86. > > Jean, do you have any idea about how to fix that? Do you think we have a > trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks > like virtio probe and attach commands are called too early, before the > bus is actually correctly numbered. So after further investigations looks this is not a problem of bus number, which is good at the time of the virtio cmd calls but rather a problem related to the devfn (0 was used when creating the IOMMU MR) whereas the virtio-iommu cmds looks for the non aliased devfn. With that fixed, the probe and attach at least succeeds. The device still does not work for me but I will continue my investigations and send a tentative fix. Thanks Eric > > Thanks > > Eric > > > > > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group 2023-01-09 21:11 ` Eric Auger @ 2023-01-11 7:14 ` Jason Wang 2023-01-18 18:38 ` Eric Auger 2023-01-13 12:39 ` Jean-Philippe Brucker 1 sibling, 1 reply; 11+ messages in thread From: Jason Wang @ 2023-01-11 7:14 UTC (permalink / raw) To: Eric Auger Cc: Jean-Philippe Brucker, qemu list, Peter Xu, Alex Williamson, Michael S. Tsirkin On Tue, Jan 10, 2023 at 5:11 AM Eric Auger <eauger@redhat.com> wrote: > > Hi, > > On 1/9/23 14:24, Eric Auger wrote: > > Hi, > > > > we have a trouble with virtio-iommu and protected assigned devices > > downstream to a pcie-to-pci bridge. In that use case we observe the > > assigned devices are not put to any group. This is true on both x86 and > > aarch64. This use case works with intel-iommu. > > > > *** Guest PCI topology is: > > lspci -tv > > -[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM > > Controller > > +-01.0 Device 1234:1111 > > +-02.0-[01-02]----00.0-[02]----01.0 Broadcom Inc. and > > subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller > > +-02.1-[03]-- > > +-02.2-[04]----00.0 Red Hat, Inc. Virtio block device > > +-0a.0 Red Hat, Inc. Device 1057 > > +-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Controller > > +-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port > > SATA Controller [AHCI mode] > > \-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller > > > > > > All the assigned devices are aliased and they get devfn=0x0. > > see qemu pci_device_iommu_address_space in hw/pci.c > > > > Initially I see the following traces > > pci_device_iommu_address_space name=vfio-pci BDF=0x8 bus=0 devfn=0x8 > > pci_device_iommu_address_space name=vfio-pci BDF=0x8 bus=0 devfn=0x8 > > call iommu_fn with bus=0x55f556dde180 and devfn=0 > > virtio_iommu_init_iommu_mr init virtio-iommu-memory-region-0-0 > > > > Note the bus is 0 at this time and devfn that is used in the > > virtio-iommu is 0. So an associated IOMMU MR is created with this bus at > > devfn=0 slot. This is before bus actual numbering. > > > > However later on, I see virtio_iommu_probe() and virtio_iommu_attach() > > getting called with ep_id=520 > > because in the qemu virtio-iommu device, virtio_iommu_mr(pe_id) fails to > > find the iommu_mr and returns -ENOENT > > > > On guest side I see that > > acpi_iommu_configure_id/iommu_probe_device() fails > > (__iommu_probe_device) and also __iommu_attach_device would also fail > > anyway. > > > > I guess those get called before actual bus number recomputation? > > > > on aarch64 I eventually see the "good" MR beeing created, ie. featuring > > the right bus number: > > qemu-system-aarch64: pci_device_iommu_address_space name=vfio-pci > > BDF=0x208 bus=2 devfn=0x8 > > qemu-system-aarch64: pci_device_iommu_address_space name=vfio-pci > > BDF=0x208 bus=2 devfn=0x8 call iommu_fn with bus=0xaaaaef12c450 and devfn=0 > > > > But this does not happen on x86. > > > > Jean, do you have any idea about how to fix that? Do you think we have a > > trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks > > like virtio probe and attach commands are called too early, before the > > bus is actually correctly numbered. > > So after further investigations looks this is not a problem of bus > number, which is good at the time of the virtio cmd calls but rather a > problem related to the devfn (0 was used when creating the IOMMU MR) > whereas the virtio-iommu cmds looks for the non aliased devfn. With that > fixed, the probe and attach at least succeeds. The device still does not > work for me but I will continue my investigations and send a tentative fix. Haven't thought this deeply, just one thing in my mind and in case that may help: intel-iommu doesn't use bus no as the key for hashing address spaces since it could be configured by the guest: /* * Note that we use pointer to PCIBus as the key, so hashing/shifting * based on the pointer value is intended. Note that we deal with * collisions through vtd_as_equal(). */ static guint vtd_as_hash(gconstpointer v) { const struct vtd_as_key *key = v; guint value = (guint)(uintptr_t)key->bus; return (guint)(value << 8 | key->devfn); } Thanks > > Thanks > > Eric > > > > Thanks > > > > Eric > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group 2023-01-11 7:14 ` Jason Wang @ 2023-01-18 18:38 ` Eric Auger 0 siblings, 0 replies; 11+ messages in thread From: Eric Auger @ 2023-01-18 18:38 UTC (permalink / raw) To: Jason Wang Cc: Jean-Philippe Brucker, qemu list, Peter Xu, Alex Williamson, Michael S. Tsirkin Hi Jason, On 1/11/23 08:14, Jason Wang wrote: > On Tue, Jan 10, 2023 at 5:11 AM Eric Auger <eauger@redhat.com> wrote: >> >> Hi, >> >> On 1/9/23 14:24, Eric Auger wrote: >>> Hi, >>> >>> we have a trouble with virtio-iommu and protected assigned devices >>> downstream to a pcie-to-pci bridge. In that use case we observe the >>> assigned devices are not put to any group. This is true on both x86 and >>> aarch64. This use case works with intel-iommu. >>> >>> *** Guest PCI topology is: >>> lspci -tv >>> -[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM >>> Controller >>> +-01.0 Device 1234:1111 >>> +-02.0-[01-02]----00.0-[02]----01.0 Broadcom Inc. and >>> subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller >>> +-02.1-[03]-- >>> +-02.2-[04]----00.0 Red Hat, Inc. Virtio block device >>> +-0a.0 Red Hat, Inc. Device 1057 >>> +-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Controller >>> +-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port >>> SATA Controller [AHCI mode] >>> \-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller >>> >>> >>> All the assigned devices are aliased and they get devfn=0x0. >>> see qemu pci_device_iommu_address_space in hw/pci.c >>> >>> Initially I see the following traces >>> pci_device_iommu_address_space name=vfio-pci BDF=0x8 bus=0 devfn=0x8 >>> pci_device_iommu_address_space name=vfio-pci BDF=0x8 bus=0 devfn=0x8 >>> call iommu_fn with bus=0x55f556dde180 and devfn=0 >>> virtio_iommu_init_iommu_mr init virtio-iommu-memory-region-0-0 >>> >>> Note the bus is 0 at this time and devfn that is used in the >>> virtio-iommu is 0. So an associated IOMMU MR is created with this bus at >>> devfn=0 slot. This is before bus actual numbering. >>> >>> However later on, I see virtio_iommu_probe() and virtio_iommu_attach() >>> getting called with ep_id=520 >>> because in the qemu virtio-iommu device, virtio_iommu_mr(pe_id) fails to >>> find the iommu_mr and returns -ENOENT >>> >>> On guest side I see that >>> acpi_iommu_configure_id/iommu_probe_device() fails >>> (__iommu_probe_device) and also __iommu_attach_device would also fail >>> anyway. >>> >>> I guess those get called before actual bus number recomputation? >>> >>> on aarch64 I eventually see the "good" MR beeing created, ie. featuring >>> the right bus number: >>> qemu-system-aarch64: pci_device_iommu_address_space name=vfio-pci >>> BDF=0x208 bus=2 devfn=0x8 >>> qemu-system-aarch64: pci_device_iommu_address_space name=vfio-pci >>> BDF=0x208 bus=2 devfn=0x8 call iommu_fn with bus=0xaaaaef12c450 and devfn=0 >>> >>> But this does not happen on x86. >>> >>> Jean, do you have any idea about how to fix that? Do you think we have a >>> trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks >>> like virtio probe and attach commands are called too early, before the >>> bus is actually correctly numbered. >> >> So after further investigations looks this is not a problem of bus >> number, which is good at the time of the virtio cmd calls but rather a >> problem related to the devfn (0 was used when creating the IOMMU MR) >> whereas the virtio-iommu cmds looks for the non aliased devfn. With that >> fixed, the probe and attach at least succeeds. The device still does not >> work for me but I will continue my investigations and send a tentative fix. > > Haven't thought this deeply, just one thing in my mind and in case > that may help: Sorry for the delay, I did not see the follow-ups on this thread :-(, > > intel-iommu doesn't use bus no as the key for hashing address spaces > since it could be configured by the guest: > > /* > * Note that we use pointer to PCIBus as the key, so hashing/shifting > * based on the pointer value is intended. Note that we deal with > * collisions through vtd_as_equal(). > */ > static guint vtd_as_hash(gconstpointer v) > { > const struct vtd_as_key *key = v; > guint value = (guint)(uintptr_t)key->bus; > > return (guint)(value << 8 | key->devfn); > } I think we have something similar on virtio-iommu. We use the old flavour "as_by_busptr" whose key is the PCIBus pointer. This was basically copied from the intel-iommu and then you replaced it with da8d439c8048 ("intel-iommu: drop VTDBus") Thanks Eric > > Thanks > >> >> Thanks >> >> Eric >>> >>> Thanks >>> >>> Eric >>> >>> >>> >>> >>> >>> >>> >> > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group 2023-01-09 21:11 ` Eric Auger 2023-01-11 7:14 ` Jason Wang @ 2023-01-13 12:39 ` Jean-Philippe Brucker 2023-01-13 17:57 ` Alex Williamson 2023-01-18 18:40 ` Eric Auger 1 sibling, 2 replies; 11+ messages in thread From: Jean-Philippe Brucker @ 2023-01-13 12:39 UTC (permalink / raw) To: Eric Auger Cc: qemu list, Peter Xu, Alex Williamson, Michael S. Tsirkin, jasowang@redhat.com Hi, On Mon, Jan 09, 2023 at 10:11:19PM +0100, Eric Auger wrote: > > Jean, do you have any idea about how to fix that? Do you think we have a > > trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks > > like virtio probe and attach commands are called too early, before the > > bus is actually correctly numbered. > > So after further investigations looks this is not a problem of bus > number, which is good at the time of the virtio cmd calls but rather a > problem related to the devfn (0 was used when creating the IOMMU MR) > whereas the virtio-iommu cmds looks for the non aliased devfn. With that > fixed, the probe and attach at least succeeds. The device still does not > work for me but I will continue my investigations and send a tentative fix. If I remember correctly VIOT can deal with bus numbers because bridges are assigned a range by QEMU, but I haven't tested that in detail, and I don't know how it holds with conventional PCI bridges. Do you have an example command-line I could use to experiment (and the fix you're mentioning)? Thanks, Jean ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group 2023-01-13 12:39 ` Jean-Philippe Brucker @ 2023-01-13 17:57 ` Alex Williamson 2023-01-18 18:03 ` Jean-Philippe Brucker 2023-01-18 18:40 ` Eric Auger 1 sibling, 1 reply; 11+ messages in thread From: Alex Williamson @ 2023-01-13 17:57 UTC (permalink / raw) To: Jean-Philippe Brucker Cc: Eric Auger, qemu list, Peter Xu, Michael S. Tsirkin, jasowang@redhat.com On Fri, 13 Jan 2023 12:39:18 +0000 Jean-Philippe Brucker <jean-philippe@linaro.org> wrote: > Hi, > > On Mon, Jan 09, 2023 at 10:11:19PM +0100, Eric Auger wrote: > > > Jean, do you have any idea about how to fix that? Do you think we have a > > > trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks > > > like virtio probe and attach commands are called too early, before the > > > bus is actually correctly numbered. > > > > So after further investigations looks this is not a problem of bus > > number, which is good at the time of the virtio cmd calls but rather a > > problem related to the devfn (0 was used when creating the IOMMU MR) > > whereas the virtio-iommu cmds looks for the non aliased devfn. With that > > fixed, the probe and attach at least succeeds. The device still does not > > work for me but I will continue my investigations and send a tentative fix. > > If I remember correctly VIOT can deal with bus numbers because bridges are > assigned a range by QEMU, but I haven't tested that in detail, and I don't > know how it holds with conventional PCI bridges. In my reading of the virtio-iommu spec, I noted that it specifies the bus numbers *at the time of OS handoff*, so it essentially washes its hands of the OS renumbering buses while leaving subtle dependencies on initial numbering in the guest and QEMU implementations. On bare metal, a conventional bridge aliases the devices downstream of it. We reflect that in QEMU by aliasing those devices to the AddressSpace of the bridge. IIRC, Linux guests will use a for-each-dma-alias function when programming IOMMU translation tables to populate the bridge alias, where a physical IOMMU would essentially only see that bridge RID. Thanks, Alex ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group 2023-01-13 17:57 ` Alex Williamson @ 2023-01-18 18:03 ` Jean-Philippe Brucker 2023-01-18 18:28 ` Alex Williamson 0 siblings, 1 reply; 11+ messages in thread From: Jean-Philippe Brucker @ 2023-01-18 18:03 UTC (permalink / raw) To: Alex Williamson Cc: Eric Auger, qemu list, Peter Xu, Michael S. Tsirkin, jasowang@redhat.com On Fri, Jan 13, 2023 at 10:57:00AM -0700, Alex Williamson wrote: > On Fri, 13 Jan 2023 12:39:18 +0000 > Jean-Philippe Brucker <jean-philippe@linaro.org> wrote: > > > Hi, > > > > On Mon, Jan 09, 2023 at 10:11:19PM +0100, Eric Auger wrote: > > > > Jean, do you have any idea about how to fix that? Do you think we have a > > > > trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks > > > > like virtio probe and attach commands are called too early, before the > > > > bus is actually correctly numbered. > > > > > > So after further investigations looks this is not a problem of bus > > > number, which is good at the time of the virtio cmd calls but rather a > > > problem related to the devfn (0 was used when creating the IOMMU MR) > > > whereas the virtio-iommu cmds looks for the non aliased devfn. With that > > > fixed, the probe and attach at least succeeds. The device still does not > > > work for me but I will continue my investigations and send a tentative fix. > > > > If I remember correctly VIOT can deal with bus numbers because bridges are > > assigned a range by QEMU, but I haven't tested that in detail, and I don't > > know how it holds with conventional PCI bridges. > > In my reading of the virtio-iommu spec, Hm, is that the virtio-iommu spec or ACPI VIOT/device tree spec? The virtio-iommu spec shouldn't refer to PCI buses at the moment. The intent is that for PCI, the "endpoint ID" passed in an ATTACH request corresponds to PCI segment and RID of PCI devices at the time of the request (so after the OS renumbered the buses). If you found something in the spec that contradicts this, it should be fixed. Note that "endpoint" is a misnomer, it can refer to PCI bridges as well, anything that can issue DMA transactions. > I noted that it specifies the > bus numbers *at the time of OS handoff*, so it essentially washes its > hands of the OS renumbering buses while leaving subtle dependencies on > initial numbering in the guest and QEMU implementations. Yes we needed to describe in the firmware tables (device-tree and ACPI VIOT) which devices the IOMMU manages. And at the time we generate the tables, if we want to refer to PCI devices behind bridges, we can either use catch-all ranges for any possible bus numbers they will get, or initialize bus numbers in bridges and pass those to the OS. But that's only to communicate the IOMMU topology to the OS, because we couldn't come up with anything better. After it sets up PCI the OS should be able to use its own configuration of the PCI topology in virtio-iommu requests. > On bare metal, a conventional bridge aliases the devices downstream of > it. We reflect that in QEMU by aliasing those devices to the > AddressSpace of the bridge. IIRC, Linux guests will use a > for-each-dma-alias function when programming IOMMU translation tables > to populate the bridge alias, where a physical IOMMU would essentially > only see that bridge RID. Thanks, Yes there might be something missing in the Linux driver, I'll have a look Thanks, Jean ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group 2023-01-18 18:03 ` Jean-Philippe Brucker @ 2023-01-18 18:28 ` Alex Williamson 2023-01-18 18:48 ` Eric Auger 2023-01-20 15:35 ` Jean-Philippe Brucker 0 siblings, 2 replies; 11+ messages in thread From: Alex Williamson @ 2023-01-18 18:28 UTC (permalink / raw) To: Jean-Philippe Brucker Cc: Eric Auger, qemu list, Peter Xu, Michael S. Tsirkin, jasowang@redhat.com On Wed, 18 Jan 2023 18:03:13 +0000 Jean-Philippe Brucker <jean-philippe@linaro.org> wrote: > On Fri, Jan 13, 2023 at 10:57:00AM -0700, Alex Williamson wrote: > > On Fri, 13 Jan 2023 12:39:18 +0000 > > Jean-Philippe Brucker <jean-philippe@linaro.org> wrote: > > > > > Hi, > > > > > > On Mon, Jan 09, 2023 at 10:11:19PM +0100, Eric Auger wrote: > > > > > Jean, do you have any idea about how to fix that? Do you think we have a > > > > > trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks > > > > > like virtio probe and attach commands are called too early, before the > > > > > bus is actually correctly numbered. > > > > > > > > So after further investigations looks this is not a problem of bus > > > > number, which is good at the time of the virtio cmd calls but rather a > > > > problem related to the devfn (0 was used when creating the IOMMU MR) > > > > whereas the virtio-iommu cmds looks for the non aliased devfn. With that > > > > fixed, the probe and attach at least succeeds. The device still does not > > > > work for me but I will continue my investigations and send a tentative fix. > > > > > > If I remember correctly VIOT can deal with bus numbers because bridges are > > > assigned a range by QEMU, but I haven't tested that in detail, and I don't > > > know how it holds with conventional PCI bridges. > > > > In my reading of the virtio-iommu spec, > > Hm, is that the virtio-iommu spec or ACPI VIOT/device tree spec? > The virtio-iommu spec shouldn't refer to PCI buses at the moment. The > intent is that for PCI, the "endpoint ID" passed in an ATTACH request > corresponds to PCI segment and RID of PCI devices at the time of the > request (so after the OS renumbered the buses). If you found something in > the spec that contradicts this, it should be fixed. Note that "endpoint" > is a misnomer, it can refer to PCI bridges as well, anything that can > issue DMA transactions. Sorry, the ACPI spec defining the VIOT table[1]: Each node identifies one or more devices using either their PCI Handle or their base MMIO (Memory-Mapped I/O) address. A PCI Handle is a PCI Segment number and a BDF (Bus-Device-Function) with the following layout: * Bits 15:8 Bus Number * Bits 7:3 Device Number * Bits 2:0 Function Number This identifier corresponds to the one observed by the operating system when parsing the PCI configuration space for the first time after boot. > > I noted that it specifies the > > bus numbers *at the time of OS handoff*, so it essentially washes its > > hands of the OS renumbering buses while leaving subtle dependencies on > > initial numbering in the guest and QEMU implementations. > > Yes we needed to describe in the firmware tables (device-tree and ACPI > VIOT) which devices the IOMMU manages. And at the time we generate the > tables, if we want to refer to PCI devices behind bridges, we can either > use catch-all ranges for any possible bus numbers they will get, or > initialize bus numbers in bridges and pass those to the OS. > > But that's only to communicate the IOMMU topology to the OS, because we > couldn't come up with anything better. After it sets up PCI the OS should > be able to use its own configuration of the PCI topology in virtio-iommu > requests. The VT-d spec[2](8.3.1) has a more elegant solution using a path described in a device scope, based on a root bus number (not susceptible to OS renumbering) and a sequence of devfns to uniquely describe a hierarchy or endpoint, invariant of OS bus renumbering. Thanks, Alex [1]https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#virtual-i-o-translation-viot-table-header [2]https://cdrdv2-public.intel.com/671081/vt-directed-io-spec.pdf ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group 2023-01-18 18:28 ` Alex Williamson @ 2023-01-18 18:48 ` Eric Auger 2023-01-20 15:35 ` Jean-Philippe Brucker 1 sibling, 0 replies; 11+ messages in thread From: Eric Auger @ 2023-01-18 18:48 UTC (permalink / raw) To: Alex Williamson, Jean-Philippe Brucker Cc: qemu list, Peter Xu, Michael S. Tsirkin, jasowang@redhat.com Hi, On 1/18/23 19:28, Alex Williamson wrote: > On Wed, 18 Jan 2023 18:03:13 +0000 > Jean-Philippe Brucker <jean-philippe@linaro.org> wrote: > >> On Fri, Jan 13, 2023 at 10:57:00AM -0700, Alex Williamson wrote: >>> On Fri, 13 Jan 2023 12:39:18 +0000 >>> Jean-Philippe Brucker <jean-philippe@linaro.org> wrote: >>> >>>> Hi, >>>> >>>> On Mon, Jan 09, 2023 at 10:11:19PM +0100, Eric Auger wrote: >>>>>> Jean, do you have any idea about how to fix that? Do you think we have a >>>>>> trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks >>>>>> like virtio probe and attach commands are called too early, before the >>>>>> bus is actually correctly numbered. >>>>> >>>>> So after further investigations looks this is not a problem of bus >>>>> number, which is good at the time of the virtio cmd calls but rather a >>>>> problem related to the devfn (0 was used when creating the IOMMU MR) >>>>> whereas the virtio-iommu cmds looks for the non aliased devfn. With that >>>>> fixed, the probe and attach at least succeeds. The device still does not >>>>> work for me but I will continue my investigations and send a tentative fix. >>>> >>>> If I remember correctly VIOT can deal with bus numbers because bridges are >>>> assigned a range by QEMU, but I haven't tested that in detail, and I don't >>>> know how it holds with conventional PCI bridges. >>> >>> In my reading of the virtio-iommu spec, >> >> Hm, is that the virtio-iommu spec or ACPI VIOT/device tree spec? >> The virtio-iommu spec shouldn't refer to PCI buses at the moment. The >> intent is that for PCI, the "endpoint ID" passed in an ATTACH request >> corresponds to PCI segment and RID of PCI devices at the time of the >> request (so after the OS renumbered the buses). If you found something in >> the spec that contradicts this, it should be fixed. Note that "endpoint" >> is a misnomer, it can refer to PCI bridges as well, anything that can >> issue DMA transactions. > > Sorry, the ACPI spec defining the VIOT table[1]: > > Each node identifies one or more devices using either their PCI > Handle or their base MMIO (Memory-Mapped I/O) address. A PCI > Handle is a PCI Segment number and a BDF (Bus-Device-Function) > with the following layout: > > * Bits 15:8 Bus Number > > * Bits 7:3 Device Number > > * Bits 2:0 Function Number > > This identifier corresponds to the one observed by the > operating system when parsing the PCI configuration space for > the first time after boot. > >>> I noted that it specifies the >>> bus numbers *at the time of OS handoff*, so it essentially washes its >>> hands of the OS renumbering buses while leaving subtle dependencies on >>> initial numbering in the guest and QEMU implementations. >> >> Yes we needed to describe in the firmware tables (device-tree and ACPI >> VIOT) which devices the IOMMU manages. And at the time we generate the >> tables, if we want to refer to PCI devices behind bridges, we can either >> use catch-all ranges for any possible bus numbers they will get, or >> initialize bus numbers in bridges and pass those to the OS. >> >> But that's only to communicate the IOMMU topology to the OS, because we >> couldn't come up with anything better. After it sets up PCI the OS should >> be able to use its own configuration of the PCI topology in virtio-iommu >> requests. > > The VT-d spec[2](8.3.1) has a more elegant solution using a path > described in a device scope, based on a root bus number (not > susceptible to OS renumbering) and a sequence of devfns to uniquely > describe a hierarchy or endpoint, invariant of OS bus renumbering. > Thanks, Independently on the potential issue raised by Alex about later bus renumbering, I observe that the VIOT content, in my case, is correct and properly advertises the translation of the RIDs of all my devices. So the iommu group topology issue I have on guest is not due to the VIOT ACPI table content. Eric > > Alex > > [1]https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#virtual-i-o-translation-viot-table-header > [2]https://cdrdv2-public.intel.com/671081/vt-directed-io-spec.pdf > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group 2023-01-18 18:28 ` Alex Williamson 2023-01-18 18:48 ` Eric Auger @ 2023-01-20 15:35 ` Jean-Philippe Brucker 1 sibling, 0 replies; 11+ messages in thread From: Jean-Philippe Brucker @ 2023-01-20 15:35 UTC (permalink / raw) To: Alex Williamson Cc: Eric Auger, qemu list, Peter Xu, Michael S. Tsirkin, jasowang@redhat.com On Wed, Jan 18, 2023 at 11:28:32AM -0700, Alex Williamson wrote: > The VT-d spec[2](8.3.1) has a more elegant solution using a path > described in a device scope, based on a root bus number (not > susceptible to OS renumbering) and a sequence of devfns to uniquely > describe a hierarchy or endpoint, invariant of OS bus renumbering. That's a good idea, we could describe the hierarchy using only devfns. I think I based VIOT mostly on IORT and device-tree which don't provide that as far as I know, but could have studied DMAR better. One problem is that for virtio-iommu we'd need to update both device-tree and VIOT (and neither are easy to change). But it's worth thinking about because it would solve a problem we currently have, that a virtio-iommu using the virtio-pci transport cannot be placed behind a bridge, including a root port, because the firmware tables cannot refer to it. Thanks, Jean ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group 2023-01-13 12:39 ` Jean-Philippe Brucker 2023-01-13 17:57 ` Alex Williamson @ 2023-01-18 18:40 ` Eric Auger 1 sibling, 0 replies; 11+ messages in thread From: Eric Auger @ 2023-01-18 18:40 UTC (permalink / raw) To: Jean-Philippe Brucker Cc: qemu list, Peter Xu, Alex Williamson, Michael S. Tsirkin, jasowang@redhat.com Hi Jean, On 1/13/23 13:39, Jean-Philippe Brucker wrote: > Hi, > > On Mon, Jan 09, 2023 at 10:11:19PM +0100, Eric Auger wrote: >>> Jean, do you have any idea about how to fix that? Do you think we have a >>> trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks >>> like virtio probe and attach commands are called too early, before the >>> bus is actually correctly numbered. >> >> So after further investigations looks this is not a problem of bus >> number, which is good at the time of the virtio cmd calls but rather a >> problem related to the devfn (0 was used when creating the IOMMU MR) >> whereas the virtio-iommu cmds looks for the non aliased devfn. With that >> fixed, the probe and attach at least succeeds. The device still does not >> work for me but I will continue my investigations and send a tentative fix. > > If I remember correctly VIOT can deal with bus numbers because bridges are > assigned a range by QEMU, but I haven't tested that in detail, and I don't > know how it holds with conventional PCI bridges. Do you have an example > command-line I could use to experiment (and the fix you're mentioning)? You will find command line examples in [RFC] virtio-iommu: Take into account possible aliasing in virtio_iommu_mr() https://lore.kernel.org/all/20230116124709.793084-1-eric.auger@redhat.com/ Please let me know if you need additional details. Eric > > Thanks, > Jean > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-01-20 15:36 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-01-09 13:24 virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group Eric Auger 2023-01-09 21:11 ` Eric Auger 2023-01-11 7:14 ` Jason Wang 2023-01-18 18:38 ` Eric Auger 2023-01-13 12:39 ` Jean-Philippe Brucker 2023-01-13 17:57 ` Alex Williamson 2023-01-18 18:03 ` Jean-Philippe Brucker 2023-01-18 18:28 ` Alex Williamson 2023-01-18 18:48 ` Eric Auger 2023-01-20 15:35 ` Jean-Philippe Brucker 2023-01-18 18:40 ` Eric Auger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).