virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group
@ 2023-01-09 13:24 Eric Auger
  2023-01-09 21:11 ` Eric Auger
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Auger @ 2023-01-09 13:24 UTC (permalink / raw)
  To: Jean-Philippe Brucker, qemu list
  Cc: Peter Xu, Alex Williamson, Michael S. Tsirkin,
	jasowang@redhat.com

Hi,

we have a trouble with virtio-iommu and protected assigned devices
downstream to a pcie-to-pci bridge. In that use case we observe the
assigned devices are not put to any group. This is true on both x86 and
aarch64. This use case works with intel-iommu.

*** Guest PCI topology is:
lspci -tv
-[0000:00]-+-00.0  Intel Corporation 82G33/G31/P35/P31 Express DRAM
Controller
           +-01.0  Device 1234:1111
           +-02.0-[01-02]----00.0-[02]----01.0  Broadcom Inc. and
subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller
           +-02.1-[03]--
           +-02.2-[04]----00.0  Red Hat, Inc. Virtio block device
           +-0a.0  Red Hat, Inc. Device 1057
           +-1f.0  Intel Corporation 82801IB (ICH9) LPC Interface Controller
           +-1f.2  Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port
SATA Controller [AHCI mode]
           \-1f.3  Intel Corporation 82801I (ICH9 Family) SMBus Controller

All the assigned devices are aliased and they get devfn=0x0.
see qemu pci_device_iommu_address_space in hw/pci.c

Initially I see the following traces
pci_device_iommu_address_space name=vfio-pci BDF=0x8 bus=0 devfn=0x8
pci_device_iommu_address_space name=vfio-pci BDF=0x8 bus=0 devfn=0x8
call iommu_fn with bus=0x55f556dde180 and devfn=0
virtio_iommu_init_iommu_mr init virtio-iommu-memory-region-0-0

Note the bus is 0 at this time and devfn that is used in the
virtio-iommu is 0. So an associated IOMMU MR is created with this bus at
devfn=0 slot. This is before bus actual numbering.

However later on, I see virtio_iommu_probe() and virtio_iommu_attach()
getting called with ep_id=520
because in the qemu virtio-iommu device, virtio_iommu_mr(pe_id) fails to
find the iommu_mr and returns -ENOENT

On guest side I see that
acpi_iommu_configure_id/iommu_probe_device() fails
(__iommu_probe_device) and also __iommu_attach_device would also fail
anyway.

I guess those get called before actual bus number recomputation?

on aarch64 I eventually see the "good" MR beeing created, ie. featuring
the right bus number:
qemu-system-aarch64: pci_device_iommu_address_space name=vfio-pci
BDF=0x208 bus=2 devfn=0x8
qemu-system-aarch64: pci_device_iommu_address_space name=vfio-pci
BDF=0x208 bus=2 devfn=0x8 call iommu_fn with bus=0xaaaaef12c450 and devfn=0

But this does not happen on x86.

Jean, do you have any idea about how to fix that? Do you think we have a
trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks
like virtio probe and attach commands are called too early, before the
bus is actually correctly numbered.

Thanks

Eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group
  2023-01-09 13:24 virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group Eric Auger
@ 2023-01-09 21:11 ` Eric Auger
  2023-01-11  7:14   ` Jason Wang
  2023-01-13 12:39   ` Jean-Philippe Brucker
  0 siblings, 2 replies; 11+ messages in thread
From: Eric Auger @ 2023-01-09 21:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker, qemu list
  Cc: Peter Xu, Alex Williamson, Michael S. Tsirkin,
	jasowang@redhat.com

Hi,

On 1/9/23 14:24, Eric Auger wrote:
> Hi,
> 
> we have a trouble with virtio-iommu and protected assigned devices
> downstream to a pcie-to-pci bridge. In that use case we observe the
> assigned devices are not put to any group. This is true on both x86 and
> aarch64. This use case works with intel-iommu.
> 
> *** Guest PCI topology is:
> lspci -tv
> -[0000:00]-+-00.0  Intel Corporation 82G33/G31/P35/P31 Express DRAM
> Controller
>            +-01.0  Device 1234:1111
>            +-02.0-[01-02]----00.0-[02]----01.0  Broadcom Inc. and
> subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller
>            +-02.1-[03]--
>            +-02.2-[04]----00.0  Red Hat, Inc. Virtio block device
>            +-0a.0  Red Hat, Inc. Device 1057
>            +-1f.0  Intel Corporation 82801IB (ICH9) LPC Interface Controller
>            +-1f.2  Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port
> SATA Controller [AHCI mode]
>            \-1f.3  Intel Corporation 82801I (ICH9 Family) SMBus Controller
> 
> 
> All the assigned devices are aliased and they get devfn=0x0.
> see qemu pci_device_iommu_address_space in hw/pci.c
> 
> Initially I see the following traces
> pci_device_iommu_address_space name=vfio-pci BDF=0x8 bus=0 devfn=0x8
> pci_device_iommu_address_space name=vfio-pci BDF=0x8 bus=0 devfn=0x8
> call iommu_fn with bus=0x55f556dde180 and devfn=0
> virtio_iommu_init_iommu_mr init virtio-iommu-memory-region-0-0
> 
> Note the bus is 0 at this time and devfn that is used in the
> virtio-iommu is 0. So an associated IOMMU MR is created with this bus at
> devfn=0 slot. This is before bus actual numbering.
> 
> However later on, I see virtio_iommu_probe() and virtio_iommu_attach()
> getting called with ep_id=520
> because in the qemu virtio-iommu device, virtio_iommu_mr(pe_id) fails to
> find the iommu_mr and returns -ENOENT
> 
> On guest side I see that
> acpi_iommu_configure_id/iommu_probe_device() fails
> (__iommu_probe_device) and also __iommu_attach_device would also fail
> anyway.
> 
> I guess those get called before actual bus number recomputation?
> 
> on aarch64 I eventually see the "good" MR beeing created, ie. featuring
> the right bus number:
> qemu-system-aarch64: pci_device_iommu_address_space name=vfio-pci
> BDF=0x208 bus=2 devfn=0x8
> qemu-system-aarch64: pci_device_iommu_address_space name=vfio-pci
> BDF=0x208 bus=2 devfn=0x8 call iommu_fn with bus=0xaaaaef12c450 and devfn=0
> 
> But this does not happen on x86.
> 
> Jean, do you have any idea about how to fix that? Do you think we have a
> trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks
> like virtio probe and attach commands are called too early, before the
> bus is actually correctly numbered.

So after further investigations looks this is not a problem of bus
number, which is good at the time of the virtio cmd calls but rather a
problem related to the devfn (0 was used when creating the IOMMU MR)
whereas the virtio-iommu cmds looks for the non aliased devfn. With that
fixed, the probe and attach at least succeeds. The device still does not
work for me but I will continue my investigations and send a tentative fix.

Thanks

Eric
> 
> Thanks
> 
> Eric
> 
> 
> 
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group
  2023-01-09 21:11 ` Eric Auger
@ 2023-01-11  7:14   ` Jason Wang
  2023-01-18 18:38     ` Eric Auger
  2023-01-13 12:39   ` Jean-Philippe Brucker
  1 sibling, 1 reply; 11+ messages in thread
From: Jason Wang @ 2023-01-11  7:14 UTC (permalink / raw)
  To: Eric Auger
  Cc: Jean-Philippe Brucker, qemu list, Peter Xu, Alex Williamson,
	Michael S. Tsirkin

On Tue, Jan 10, 2023 at 5:11 AM Eric Auger <eauger@redhat.com> wrote:
>
> Hi,
>
> On 1/9/23 14:24, Eric Auger wrote:
> > Hi,
> >
> > we have a trouble with virtio-iommu and protected assigned devices
> > downstream to a pcie-to-pci bridge. In that use case we observe the
> > assigned devices are not put to any group. This is true on both x86 and
> > aarch64. This use case works with intel-iommu.
> >
> > *** Guest PCI topology is:
> > lspci -tv
> > -[0000:00]-+-00.0  Intel Corporation 82G33/G31/P35/P31 Express DRAM
> > Controller
> >            +-01.0  Device 1234:1111
> >            +-02.0-[01-02]----00.0-[02]----01.0  Broadcom Inc. and
> > subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller
> >            +-02.1-[03]--
> >            +-02.2-[04]----00.0  Red Hat, Inc. Virtio block device
> >            +-0a.0  Red Hat, Inc. Device 1057
> >            +-1f.0  Intel Corporation 82801IB (ICH9) LPC Interface Controller
> >            +-1f.2  Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port
> > SATA Controller [AHCI mode]
> >            \-1f.3  Intel Corporation 82801I (ICH9 Family) SMBus Controller
> >
> >
> > All the assigned devices are aliased and they get devfn=0x0.
> > see qemu pci_device_iommu_address_space in hw/pci.c
> >
> > Initially I see the following traces
> > pci_device_iommu_address_space name=vfio-pci BDF=0x8 bus=0 devfn=0x8
> > pci_device_iommu_address_space name=vfio-pci BDF=0x8 bus=0 devfn=0x8
> > call iommu_fn with bus=0x55f556dde180 and devfn=0
> > virtio_iommu_init_iommu_mr init virtio-iommu-memory-region-0-0
> >
> > Note the bus is 0 at this time and devfn that is used in the
> > virtio-iommu is 0. So an associated IOMMU MR is created with this bus at
> > devfn=0 slot. This is before bus actual numbering.
> >
> > However later on, I see virtio_iommu_probe() and virtio_iommu_attach()
> > getting called with ep_id=520
> > because in the qemu virtio-iommu device, virtio_iommu_mr(pe_id) fails to
> > find the iommu_mr and returns -ENOENT
> >
> > On guest side I see that
> > acpi_iommu_configure_id/iommu_probe_device() fails
> > (__iommu_probe_device) and also __iommu_attach_device would also fail
> > anyway.
> >
> > I guess those get called before actual bus number recomputation?
> >
> > on aarch64 I eventually see the "good" MR beeing created, ie. featuring
> > the right bus number:
> > qemu-system-aarch64: pci_device_iommu_address_space name=vfio-pci
> > BDF=0x208 bus=2 devfn=0x8
> > qemu-system-aarch64: pci_device_iommu_address_space name=vfio-pci
> > BDF=0x208 bus=2 devfn=0x8 call iommu_fn with bus=0xaaaaef12c450 and devfn=0
> >
> > But this does not happen on x86.
> >
> > Jean, do you have any idea about how to fix that? Do you think we have a
> > trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks
> > like virtio probe and attach commands are called too early, before the
> > bus is actually correctly numbered.
>
> So after further investigations looks this is not a problem of bus
> number, which is good at the time of the virtio cmd calls but rather a
> problem related to the devfn (0 was used when creating the IOMMU MR)
> whereas the virtio-iommu cmds looks for the non aliased devfn. With that
> fixed, the probe and attach at least succeeds. The device still does not
> work for me but I will continue my investigations and send a tentative fix.

Haven't thought this deeply, just one thing in my mind and in case
that may help:

intel-iommu doesn't use bus no as the key for hashing address spaces
since it could be configured by the guest:

/*
 * Note that we use pointer to PCIBus as the key, so hashing/shifting
 * based on the pointer value is intended. Note that we deal with
 * collisions through vtd_as_equal().
 */
static guint vtd_as_hash(gconstpointer v)
{
    const struct vtd_as_key *key = v;
    guint value = (guint)(uintptr_t)key->bus;

    return (guint)(value << 8 | key->devfn);
}

Thanks

>
> Thanks
>
> Eric
> >
> > Thanks
> >
> > Eric
> >
> >
> >
> >
> >
> >
> >
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group
  2023-01-09 21:11 ` Eric Auger
  2023-01-11  7:14   ` Jason Wang
@ 2023-01-13 12:39   ` Jean-Philippe Brucker
  2023-01-13 17:57     ` Alex Williamson
  2023-01-18 18:40     ` Eric Auger
  1 sibling, 2 replies; 11+ messages in thread
From: Jean-Philippe Brucker @ 2023-01-13 12:39 UTC (permalink / raw)
  To: Eric Auger
  Cc: qemu list, Peter Xu, Alex Williamson, Michael S. Tsirkin,
	jasowang@redhat.com

Hi,

On Mon, Jan 09, 2023 at 10:11:19PM +0100, Eric Auger wrote:
> > Jean, do you have any idea about how to fix that? Do you think we have a
> > trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks
> > like virtio probe and attach commands are called too early, before the
> > bus is actually correctly numbered.
> 
> So after further investigations looks this is not a problem of bus
> number, which is good at the time of the virtio cmd calls but rather a
> problem related to the devfn (0 was used when creating the IOMMU MR)
> whereas the virtio-iommu cmds looks for the non aliased devfn. With that
> fixed, the probe and attach at least succeeds. The device still does not
> work for me but I will continue my investigations and send a tentative fix.

If I remember correctly VIOT can deal with bus numbers because bridges are
assigned a range by QEMU, but I haven't tested that in detail, and I don't
know how it holds with conventional PCI bridges. Do you have an example
command-line I could use to experiment (and the fix you're mentioning)?

Thanks,
Jean


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group
  2023-01-13 12:39   ` Jean-Philippe Brucker
@ 2023-01-13 17:57     ` Alex Williamson
  2023-01-18 18:03       ` Jean-Philippe Brucker
  2023-01-18 18:40     ` Eric Auger
  1 sibling, 1 reply; 11+ messages in thread
From: Alex Williamson @ 2023-01-13 17:57 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Eric Auger, qemu list, Peter Xu, Michael S. Tsirkin,
	jasowang@redhat.com

On Fri, 13 Jan 2023 12:39:18 +0000
Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:

> Hi,
> 
> On Mon, Jan 09, 2023 at 10:11:19PM +0100, Eric Auger wrote:
> > > Jean, do you have any idea about how to fix that? Do you think we have a
> > > trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks
> > > like virtio probe and attach commands are called too early, before the
> > > bus is actually correctly numbered.  
> > 
> > So after further investigations looks this is not a problem of bus
> > number, which is good at the time of the virtio cmd calls but rather a
> > problem related to the devfn (0 was used when creating the IOMMU MR)
> > whereas the virtio-iommu cmds looks for the non aliased devfn. With that
> > fixed, the probe and attach at least succeeds. The device still does not
> > work for me but I will continue my investigations and send a tentative fix.  
> 
> If I remember correctly VIOT can deal with bus numbers because bridges are
> assigned a range by QEMU, but I haven't tested that in detail, and I don't
> know how it holds with conventional PCI bridges.

In my reading of the virtio-iommu spec, I noted that it specifies the
bus numbers *at the time of OS handoff*, so it essentially washes its
hands of the OS renumbering buses while leaving subtle dependencies on
initial numbering in the guest and QEMU implementations.

On bare metal, a conventional bridge aliases the devices downstream of
it.  We reflect that in QEMU by aliasing those devices to the
AddressSpace of the bridge.  IIRC, Linux guests will use a
for-each-dma-alias function when programming IOMMU translation tables
to populate the bridge alias, where a physical IOMMU would essentially
only see that bridge RID.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group
  2023-01-13 17:57     ` Alex Williamson
@ 2023-01-18 18:03       ` Jean-Philippe Brucker
  2023-01-18 18:28         ` Alex Williamson
  0 siblings, 1 reply; 11+ messages in thread
From: Jean-Philippe Brucker @ 2023-01-18 18:03 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Eric Auger, qemu list, Peter Xu, Michael S. Tsirkin,
	jasowang@redhat.com

On Fri, Jan 13, 2023 at 10:57:00AM -0700, Alex Williamson wrote:
> On Fri, 13 Jan 2023 12:39:18 +0000
> Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:
> 
> > Hi,
> > 
> > On Mon, Jan 09, 2023 at 10:11:19PM +0100, Eric Auger wrote:
> > > > Jean, do you have any idea about how to fix that? Do you think we have a
> > > > trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks
> > > > like virtio probe and attach commands are called too early, before the
> > > > bus is actually correctly numbered.  
> > > 
> > > So after further investigations looks this is not a problem of bus
> > > number, which is good at the time of the virtio cmd calls but rather a
> > > problem related to the devfn (0 was used when creating the IOMMU MR)
> > > whereas the virtio-iommu cmds looks for the non aliased devfn. With that
> > > fixed, the probe and attach at least succeeds. The device still does not
> > > work for me but I will continue my investigations and send a tentative fix.  
> > 
> > If I remember correctly VIOT can deal with bus numbers because bridges are
> > assigned a range by QEMU, but I haven't tested that in detail, and I don't
> > know how it holds with conventional PCI bridges.
> 
> In my reading of the virtio-iommu spec,

Hm, is that the virtio-iommu spec or ACPI VIOT/device tree spec?
The virtio-iommu spec shouldn't refer to PCI buses at the moment. The
intent is that for PCI, the "endpoint ID" passed in an ATTACH request
corresponds to PCI segment and RID of PCI devices at the time of the
request (so after the OS renumbered the buses). If you found something in
the spec that contradicts this, it should be fixed. Note that "endpoint"
is a misnomer, it can refer to PCI bridges as well, anything that can
issue DMA transactions.

> I noted that it specifies the
> bus numbers *at the time of OS handoff*, so it essentially washes its
> hands of the OS renumbering buses while leaving subtle dependencies on
> initial numbering in the guest and QEMU implementations.

Yes we needed to describe in the firmware tables (device-tree and ACPI
VIOT) which devices the IOMMU manages. And at the time we generate the
tables, if we want to refer to PCI devices behind bridges, we can either
use catch-all ranges for any possible bus numbers they will get, or
initialize bus numbers in bridges and pass those to the OS.

But that's only to communicate the IOMMU topology to the OS, because we
couldn't come up with anything better. After it sets up PCI the OS should
be able to use its own configuration of the PCI topology in virtio-iommu
requests.

> On bare metal, a conventional bridge aliases the devices downstream of
> it.  We reflect that in QEMU by aliasing those devices to the
> AddressSpace of the bridge.  IIRC, Linux guests will use a
> for-each-dma-alias function when programming IOMMU translation tables
> to populate the bridge alias, where a physical IOMMU would essentially
> only see that bridge RID.  Thanks,

Yes there might be something missing in the Linux driver, I'll have a look

Thanks,
Jean

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group
  2023-01-18 18:03       ` Jean-Philippe Brucker
@ 2023-01-18 18:28         ` Alex Williamson
  2023-01-18 18:48           ` Eric Auger
  2023-01-20 15:35           ` Jean-Philippe Brucker
  0 siblings, 2 replies; 11+ messages in thread
From: Alex Williamson @ 2023-01-18 18:28 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Eric Auger, qemu list, Peter Xu, Michael S. Tsirkin,
	jasowang@redhat.com

On Wed, 18 Jan 2023 18:03:13 +0000
Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:

> On Fri, Jan 13, 2023 at 10:57:00AM -0700, Alex Williamson wrote:
> > On Fri, 13 Jan 2023 12:39:18 +0000
> > Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:
> >   
> > > Hi,
> > > 
> > > On Mon, Jan 09, 2023 at 10:11:19PM +0100, Eric Auger wrote:  
> > > > > Jean, do you have any idea about how to fix that? Do you think we have a
> > > > > trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks
> > > > > like virtio probe and attach commands are called too early, before the
> > > > > bus is actually correctly numbered.    
> > > > 
> > > > So after further investigations looks this is not a problem of bus
> > > > number, which is good at the time of the virtio cmd calls but rather a
> > > > problem related to the devfn (0 was used when creating the IOMMU MR)
> > > > whereas the virtio-iommu cmds looks for the non aliased devfn. With that
> > > > fixed, the probe and attach at least succeeds. The device still does not
> > > > work for me but I will continue my investigations and send a tentative fix.    
> > > 
> > > If I remember correctly VIOT can deal with bus numbers because bridges are
> > > assigned a range by QEMU, but I haven't tested that in detail, and I don't
> > > know how it holds with conventional PCI bridges.  
> > 
> > In my reading of the virtio-iommu spec,  
> 
> Hm, is that the virtio-iommu spec or ACPI VIOT/device tree spec?
> The virtio-iommu spec shouldn't refer to PCI buses at the moment. The
> intent is that for PCI, the "endpoint ID" passed in an ATTACH request
> corresponds to PCI segment and RID of PCI devices at the time of the
> request (so after the OS renumbered the buses). If you found something in
> the spec that contradicts this, it should be fixed. Note that "endpoint"
> is a misnomer, it can refer to PCI bridges as well, anything that can
> issue DMA transactions.

Sorry, the ACPI spec defining the VIOT table[1]:

	Each node identifies one or more devices using either their PCI
	Handle or their base MMIO (Memory-Mapped I/O) address. A PCI
	Handle is a PCI Segment number and a BDF (Bus-Device-Function)
	with the following layout:

	* Bits 15:8 Bus Number

	* Bits 7:3 Device Number

	* Bits 2:0 Function Number

	This identifier corresponds to the one observed by the
	operating system when parsing the PCI configuration space for
	the first time after boot.

> > I noted that it specifies the
> > bus numbers *at the time of OS handoff*, so it essentially washes its
> > hands of the OS renumbering buses while leaving subtle dependencies on
> > initial numbering in the guest and QEMU implementations.  
> 
> Yes we needed to describe in the firmware tables (device-tree and ACPI
> VIOT) which devices the IOMMU manages. And at the time we generate the
> tables, if we want to refer to PCI devices behind bridges, we can either
> use catch-all ranges for any possible bus numbers they will get, or
> initialize bus numbers in bridges and pass those to the OS.
> 
> But that's only to communicate the IOMMU topology to the OS, because we
> couldn't come up with anything better. After it sets up PCI the OS should
> be able to use its own configuration of the PCI topology in virtio-iommu
> requests.

The VT-d spec[2](8.3.1) has a more elegant solution using a path
described in a device scope, based on a root bus number (not
susceptible to OS renumbering) and a sequence of devfns to uniquely
describe a hierarchy or endpoint, invariant of OS bus renumbering.
Thanks,

Alex

[1]https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#virtual-i-o-translation-viot-table-header
[2]https://cdrdv2-public.intel.com/671081/vt-directed-io-spec.pdf



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group
  2023-01-11  7:14   ` Jason Wang
@ 2023-01-18 18:38     ` Eric Auger
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Auger @ 2023-01-18 18:38 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jean-Philippe Brucker, qemu list, Peter Xu, Alex Williamson,
	Michael S. Tsirkin

Hi Jason,

On 1/11/23 08:14, Jason Wang wrote:
> On Tue, Jan 10, 2023 at 5:11 AM Eric Auger <eauger@redhat.com> wrote:
>>
>> Hi,
>>
>> On 1/9/23 14:24, Eric Auger wrote:
>>> Hi,
>>>
>>> we have a trouble with virtio-iommu and protected assigned devices
>>> downstream to a pcie-to-pci bridge. In that use case we observe the
>>> assigned devices are not put to any group. This is true on both x86 and
>>> aarch64. This use case works with intel-iommu.
>>>
>>> *** Guest PCI topology is:
>>> lspci -tv
>>> -[0000:00]-+-00.0  Intel Corporation 82G33/G31/P35/P31 Express DRAM
>>> Controller
>>>            +-01.0  Device 1234:1111
>>>            +-02.0-[01-02]----00.0-[02]----01.0  Broadcom Inc. and
>>> subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller
>>>            +-02.1-[03]--
>>>            +-02.2-[04]----00.0  Red Hat, Inc. Virtio block device
>>>            +-0a.0  Red Hat, Inc. Device 1057
>>>            +-1f.0  Intel Corporation 82801IB (ICH9) LPC Interface Controller
>>>            +-1f.2  Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port
>>> SATA Controller [AHCI mode]
>>>            \-1f.3  Intel Corporation 82801I (ICH9 Family) SMBus Controller
>>>
>>>
>>> All the assigned devices are aliased and they get devfn=0x0.
>>> see qemu pci_device_iommu_address_space in hw/pci.c
>>>
>>> Initially I see the following traces
>>> pci_device_iommu_address_space name=vfio-pci BDF=0x8 bus=0 devfn=0x8
>>> pci_device_iommu_address_space name=vfio-pci BDF=0x8 bus=0 devfn=0x8
>>> call iommu_fn with bus=0x55f556dde180 and devfn=0
>>> virtio_iommu_init_iommu_mr init virtio-iommu-memory-region-0-0
>>>
>>> Note the bus is 0 at this time and devfn that is used in the
>>> virtio-iommu is 0. So an associated IOMMU MR is created with this bus at
>>> devfn=0 slot. This is before bus actual numbering.
>>>
>>> However later on, I see virtio_iommu_probe() and virtio_iommu_attach()
>>> getting called with ep_id=520
>>> because in the qemu virtio-iommu device, virtio_iommu_mr(pe_id) fails to
>>> find the iommu_mr and returns -ENOENT
>>>
>>> On guest side I see that
>>> acpi_iommu_configure_id/iommu_probe_device() fails
>>> (__iommu_probe_device) and also __iommu_attach_device would also fail
>>> anyway.
>>>
>>> I guess those get called before actual bus number recomputation?
>>>
>>> on aarch64 I eventually see the "good" MR beeing created, ie. featuring
>>> the right bus number:
>>> qemu-system-aarch64: pci_device_iommu_address_space name=vfio-pci
>>> BDF=0x208 bus=2 devfn=0x8
>>> qemu-system-aarch64: pci_device_iommu_address_space name=vfio-pci
>>> BDF=0x208 bus=2 devfn=0x8 call iommu_fn with bus=0xaaaaef12c450 and devfn=0
>>>
>>> But this does not happen on x86.
>>>
>>> Jean, do you have any idea about how to fix that? Do you think we have a
>>> trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks
>>> like virtio probe and attach commands are called too early, before the
>>> bus is actually correctly numbered.
>>
>> So after further investigations looks this is not a problem of bus
>> number, which is good at the time of the virtio cmd calls but rather a
>> problem related to the devfn (0 was used when creating the IOMMU MR)
>> whereas the virtio-iommu cmds looks for the non aliased devfn. With that
>> fixed, the probe and attach at least succeeds. The device still does not
>> work for me but I will continue my investigations and send a tentative fix.
> 
> Haven't thought this deeply, just one thing in my mind and in case
> that may help:
Sorry for the delay, I did not see the follow-ups on this thread :-(,
> 
> intel-iommu doesn't use bus no as the key for hashing address spaces
> since it could be configured by the guest:
> 
> /*
>  * Note that we use pointer to PCIBus as the key, so hashing/shifting
>  * based on the pointer value is intended. Note that we deal with
>  * collisions through vtd_as_equal().
>  */
> static guint vtd_as_hash(gconstpointer v)
> {
>     const struct vtd_as_key *key = v;
>     guint value = (guint)(uintptr_t)key->bus;
> 
>     return (guint)(value << 8 | key->devfn);
> }
I think we have something similar on virtio-iommu. We use the old
flavour "as_by_busptr" whose key is the PCIBus pointer. This was
basically copied from the intel-iommu and then you replaced it with
da8d439c8048 ("intel-iommu: drop VTDBus")

Thanks

Eric
> 
> Thanks
> 
>>
>> Thanks
>>
>> Eric
>>>
>>> Thanks
>>>
>>> Eric
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group
  2023-01-13 12:39   ` Jean-Philippe Brucker
  2023-01-13 17:57     ` Alex Williamson
@ 2023-01-18 18:40     ` Eric Auger
  1 sibling, 0 replies; 11+ messages in thread
From: Eric Auger @ 2023-01-18 18:40 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: qemu list, Peter Xu, Alex Williamson, Michael S. Tsirkin,
	jasowang@redhat.com

Hi Jean,

On 1/13/23 13:39, Jean-Philippe Brucker wrote:
> Hi,
> 
> On Mon, Jan 09, 2023 at 10:11:19PM +0100, Eric Auger wrote:
>>> Jean, do you have any idea about how to fix that? Do you think we have a
>>> trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks
>>> like virtio probe and attach commands are called too early, before the
>>> bus is actually correctly numbered.
>>
>> So after further investigations looks this is not a problem of bus
>> number, which is good at the time of the virtio cmd calls but rather a
>> problem related to the devfn (0 was used when creating the IOMMU MR)
>> whereas the virtio-iommu cmds looks for the non aliased devfn. With that
>> fixed, the probe and attach at least succeeds. The device still does not
>> work for me but I will continue my investigations and send a tentative fix.
> 
> If I remember correctly VIOT can deal with bus numbers because bridges are
> assigned a range by QEMU, but I haven't tested that in detail, and I don't
> know how it holds with conventional PCI bridges. Do you have an example
> command-line I could use to experiment (and the fix you're mentioning)?

You will find command line examples in

[RFC] virtio-iommu: Take into account possible aliasing in virtio_iommu_mr()
https://lore.kernel.org/all/20230116124709.793084-1-eric.auger@redhat.com/

Please let me know if you need additional details.

Eric
> 
> Thanks,
> Jean
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group
  2023-01-18 18:28         ` Alex Williamson
@ 2023-01-18 18:48           ` Eric Auger
  2023-01-20 15:35           ` Jean-Philippe Brucker
  1 sibling, 0 replies; 11+ messages in thread
From: Eric Auger @ 2023-01-18 18:48 UTC (permalink / raw)
  To: Alex Williamson, Jean-Philippe Brucker
  Cc: qemu list, Peter Xu, Michael S. Tsirkin, jasowang@redhat.com

Hi,

On 1/18/23 19:28, Alex Williamson wrote:
> On Wed, 18 Jan 2023 18:03:13 +0000
> Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:
> 
>> On Fri, Jan 13, 2023 at 10:57:00AM -0700, Alex Williamson wrote:
>>> On Fri, 13 Jan 2023 12:39:18 +0000
>>> Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:
>>>   
>>>> Hi,
>>>>
>>>> On Mon, Jan 09, 2023 at 10:11:19PM +0100, Eric Auger wrote:  
>>>>>> Jean, do you have any idea about how to fix that? Do you think we have a
>>>>>> trouble in the acpi/viot setup or virtio-iommu probe sequence. It looks
>>>>>> like virtio probe and attach commands are called too early, before the
>>>>>> bus is actually correctly numbered.    
>>>>>
>>>>> So after further investigations looks this is not a problem of bus
>>>>> number, which is good at the time of the virtio cmd calls but rather a
>>>>> problem related to the devfn (0 was used when creating the IOMMU MR)
>>>>> whereas the virtio-iommu cmds looks for the non aliased devfn. With that
>>>>> fixed, the probe and attach at least succeeds. The device still does not
>>>>> work for me but I will continue my investigations and send a tentative fix.    
>>>>
>>>> If I remember correctly VIOT can deal with bus numbers because bridges are
>>>> assigned a range by QEMU, but I haven't tested that in detail, and I don't
>>>> know how it holds with conventional PCI bridges.  
>>>
>>> In my reading of the virtio-iommu spec,  
>>
>> Hm, is that the virtio-iommu spec or ACPI VIOT/device tree spec?
>> The virtio-iommu spec shouldn't refer to PCI buses at the moment. The
>> intent is that for PCI, the "endpoint ID" passed in an ATTACH request
>> corresponds to PCI segment and RID of PCI devices at the time of the
>> request (so after the OS renumbered the buses). If you found something in
>> the spec that contradicts this, it should be fixed. Note that "endpoint"
>> is a misnomer, it can refer to PCI bridges as well, anything that can
>> issue DMA transactions.
> 
> Sorry, the ACPI spec defining the VIOT table[1]:
> 
> 	Each node identifies one or more devices using either their PCI
> 	Handle or their base MMIO (Memory-Mapped I/O) address. A PCI
> 	Handle is a PCI Segment number and a BDF (Bus-Device-Function)
> 	with the following layout:
> 
> 	* Bits 15:8 Bus Number
> 
> 	* Bits 7:3 Device Number
> 
> 	* Bits 2:0 Function Number
> 
> 	This identifier corresponds to the one observed by the
> 	operating system when parsing the PCI configuration space for
> 	the first time after boot.
> 
>>> I noted that it specifies the
>>> bus numbers *at the time of OS handoff*, so it essentially washes its
>>> hands of the OS renumbering buses while leaving subtle dependencies on
>>> initial numbering in the guest and QEMU implementations.  
>>
>> Yes we needed to describe in the firmware tables (device-tree and ACPI
>> VIOT) which devices the IOMMU manages. And at the time we generate the
>> tables, if we want to refer to PCI devices behind bridges, we can either
>> use catch-all ranges for any possible bus numbers they will get, or
>> initialize bus numbers in bridges and pass those to the OS.
>>
>> But that's only to communicate the IOMMU topology to the OS, because we
>> couldn't come up with anything better. After it sets up PCI the OS should
>> be able to use its own configuration of the PCI topology in virtio-iommu
>> requests.
> 
> The VT-d spec[2](8.3.1) has a more elegant solution using a path
> described in a device scope, based on a root bus number (not
> susceptible to OS renumbering) and a sequence of devfns to uniquely
> describe a hierarchy or endpoint, invariant of OS bus renumbering.
> Thanks,

Independently on the potential issue raised by Alex about later bus
renumbering, I observe that the VIOT content, in my case, is correct and
properly advertises the translation of the RIDs of all my devices. So
the iommu group topology issue I have on guest is not due to the VIOT
ACPI table content.

Eric
> 
> Alex
> 
> [1]https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#virtual-i-o-translation-viot-table-header
> [2]https://cdrdv2-public.intel.com/671081/vt-directed-io-spec.pdf
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group
  2023-01-18 18:28         ` Alex Williamson
  2023-01-18 18:48           ` Eric Auger
@ 2023-01-20 15:35           ` Jean-Philippe Brucker
  1 sibling, 0 replies; 11+ messages in thread
From: Jean-Philippe Brucker @ 2023-01-20 15:35 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Eric Auger, qemu list, Peter Xu, Michael S. Tsirkin,
	jasowang@redhat.com

On Wed, Jan 18, 2023 at 11:28:32AM -0700, Alex Williamson wrote:
> The VT-d spec[2](8.3.1) has a more elegant solution using a path
> described in a device scope, based on a root bus number (not
> susceptible to OS renumbering) and a sequence of devfns to uniquely
> describe a hierarchy or endpoint, invariant of OS bus renumbering.

That's a good idea, we could describe the hierarchy using only devfns.
I think I based VIOT mostly on IORT and device-tree which don't provide
that as far as I know, but could have studied DMAR better. One problem is
that for virtio-iommu we'd need to update both device-tree and VIOT (and
neither are easy to change).

But it's worth thinking about because it would solve a problem we
currently have, that a virtio-iommu using the virtio-pci transport cannot
be placed behind a bridge, including a root port, because the firmware
tables cannot refer to it.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-01-20 15:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-09 13:24 virtio-iommu issue with VFIO device downstream to a PCIe-to-PCI bridge: VFIO devices are not assigned any iommu group Eric Auger
2023-01-09 21:11 ` Eric Auger
2023-01-11  7:14   ` Jason Wang
2023-01-18 18:38     ` Eric Auger
2023-01-13 12:39   ` Jean-Philippe Brucker
2023-01-13 17:57     ` Alex Williamson
2023-01-18 18:03       ` Jean-Philippe Brucker
2023-01-18 18:28         ` Alex Williamson
2023-01-18 18:48           ` Eric Auger
2023-01-20 15:35           ` Jean-Philippe Brucker
2023-01-18 18:40     ` Eric Auger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).