From: Stefan Hajnoczi <stefanha@redhat.com>
To: Jag Raman <jag.raman@oracle.com>
Cc: "Elena Ufimtseva" <elena.ufimtseva@oracle.com>,
"John Johnson" <john.g.johnson@oracle.com>,
"thuth@redhat.com" <thuth@redhat.com>,
"bleal@redhat.com" <bleal@redhat.com>,
"swapnil.ingle@nutanix.com" <swapnil.ingle@nutanix.com>,
"john.levon@nutanix.com" <john.levon@nutanix.com>,
"Philippe Mathieu-Daudé" <philmd@redhat.com>,
qemu-devel <qemu-devel@nongnu.org>,
"wainersm@redhat.com" <wainersm@redhat.com>,
"Alex Williamson" <alex.williamson@redhat.com>,
"thanos.makatos@nutanix.com" <thanos.makatos@nutanix.com>,
"Marc-André Lureau" <marcandre.lureau@gmail.com>,
"crosa@redhat.com" <crosa@redhat.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"alex.bennee@linaro.org" <alex.bennee@linaro.org>
Subject: Re: [PATCH v4 11/14] vfio-user: IOMMU support for remote device
Date: Mon, 20 Dec 2021 14:36:52 +0000 [thread overview]
Message-ID: <YcCVBLdd/WtLR49h@stefanha-x1.localdomain> (raw)
In-Reply-To: <A2ABC44E-0EDA-4FDB-B3A4-64CE1AA84560@oracle.com>
[-- Attachment #1: Type: text/plain, Size: 5428 bytes --]
On Fri, Dec 17, 2021 at 08:00:35PM +0000, Jag Raman wrote:
> > On Dec 16, 2021, at 9:40 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Wed, Dec 15, 2021 at 10:35:35AM -0500, Jagannathan Raman wrote:
> >> Assign separate address space for each device in the remote processes.
> >
> > If I understand correctly this isn't really an IOMMU. It's abusing the
> > IOMMU APIs to create isolated address spaces for each device. This way
> > memory regions added by the vfio-user client do not conflict when there
> > are multiple vfio-user servers.
>
> Like you already figured out, having isolated DMA address space alone is not
> sufficient for this application, we also needed to isolate the sysmem/RAM address
> space. As such, the available IOMMU APIs alone were not sufficient, so we had
> to improvise.
>
> >
> > Calling pci_root_bus_new() and keeping one PCI bus per VfuObject might
> > be a cleaner approach:
> > - Lets you isolate both PCI Memory Space and IO Space.
> > - Isolates the PCIDevices and their addresses on the bus.
> > - Isolates irqs.
> > - No more need to abuse the IOMMU API.
>
> I believe we would still need to have an IOMMU. It’s because, devices use the
> pci_dma_read()/_write() functions. These functions look up the address in DMA
> address space (via pci_get_address_space() -> PCIDevice->bus_master_as ->
> PCIDevice->bus_master_enable_region -> PCIDevice->bus_master_container_region).
> bus_master_enable_region and bus_master_container_region are effectively aliases
> to the DMA address space - without an IOMMU, the dma_as would be the shared
> global sysmem/RAM space (address_space_mem, please see pci_init_bus_master())
Good point, that code assumes there is a global address space. Creating
a fake IOMMU works around that assumption but it seems cleaner to
eliminate it:
AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
{
...
if (!pci_bus_bypass_iommu(bus) && iommu_bus && iommu_bus->iommu_fn) {
return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn);
}
return &address_space_memory;
^^^^^^^^^^^^^^^^^^^^
When creating a PCI root bus an AddressSpace argument could be provided,
just like it already does for the address_space_memory and
address_space_io MemoryRegions. Then the hardcoded return can be
changed to something like:
return bus->dma_address_space;
> >> @@ -332,6 +336,12 @@ static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
> >> vfu_object_bar_handlers[i],
> >> VFU_REGION_FLAG_RW, NULL, 0, -1, 0);
> >>
> >> + if ((o->pci_dev->io_regions[i].type & PCI_BASE_ADDRESS_SPACE) == 0) {
> >> + memory_region_unref(o->pci_dev->io_regions[i].address_space);
> >> + o->pci_dev->io_regions[i].address_space =
> >> + remote_iommu_get_ram(o->pci_dev);
> >> + }
> >
> > This looks hacky. If you create a separate PCIHost for each device
> > instead then the BARs will be created in the MemoryRegion (confusingly
> > named "address_space" in the PCI code) of your choosing.
>
> I was also not very comfortable with this - added it very grudgingly out of
> necessity. Thank god this can go away with separate bus for each device.
I talked to Kevin Wolf about having separate busses. qdev currently
requires each DeviceState to have a parent bus and each bus must have a
parent DeviceState. There is only one exception: a special check that
allows the global system bus (sysbus_get_default()) to be created
without a parent DeviceState.
This restriction probably needs to be loosened in order to support an
isolated PCIHost for each vfio-user server. The challenge is that
qdev_find_recursive() and monitor commands like device_add currently
only search the global system bus. Maybe new syntax is needed for the
multiple root bus case or the behavior of existing monitor commands
needs to be understood and extended without breaking anything.
> >
> > Also, why is PCI Memory Space isolated via VFUIOMMU but PCI IO Space is
> > not?
>
> If I understand correctly, the IO address space translates sysmem address to
> direct device access (such as I2C). Once we are inside a device, we already
> have access to all parts of the device (unlike RAM which sits outside the device).
> So didn’t think device would go via IOMMU to access IO. Also didn’t see any
> other IOMMU translating IO address space accesses.
I reviewed how BARs are configured with VFIO:
1. When the guest writes to the vfio-pci PCIDevice's Configuration Space
the write is forwarded to the VFIO device (i.e. vfio-user or VFIO
kernel ioctl).
2. The vfio-user server receives the Configuration Space write and
forwards it to pci_dev (the PCIDevice we're serving up). BAR mappings
are updated in the vfio-user server so the BAR MemoryRegions are
mapped/unmapped at the locations given by the guest.
This applies for both Memory and IO Space accesses.
Because this patch series does not isolate IO Space between VfuObject
instances the MemoryRegions will collide when two guests map IO Space
BARs of different devices at the same IO Space address. In other words,
vfu_object_bar_rw() uses the global address_space_io and that means
collisions can occur.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2021-12-20 18:11 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-15 15:35 [PATCH v4 00/14] vfio-user server in QEMU Jagannathan Raman
2021-12-15 15:35 ` [PATCH v4 01/14] configure, meson: override C compiler for cmake Jagannathan Raman
2021-12-15 15:35 ` [PATCH v4 02/14] tests/avocado: Specify target VM argument to helper routines Jagannathan Raman
2021-12-15 15:54 ` Philippe Mathieu-Daudé
2021-12-15 22:04 ` Beraldo Leal
2021-12-16 21:28 ` Jag Raman
2021-12-15 15:35 ` [PATCH v4 03/14] vfio-user: build library Jagannathan Raman
2021-12-15 15:35 ` [PATCH v4 04/14] vfio-user: define vfio-user-server object Jagannathan Raman
2021-12-16 9:33 ` Stefan Hajnoczi
2021-12-17 2:17 ` Jag Raman
2021-12-16 9:58 ` Stefan Hajnoczi
2021-12-17 2:31 ` Jag Raman
2021-12-17 8:28 ` Stefan Hajnoczi
2021-12-15 15:35 ` [PATCH v4 05/14] vfio-user: instantiate vfio-user context Jagannathan Raman
2021-12-16 9:55 ` Stefan Hajnoczi
2021-12-16 21:32 ` Jag Raman
2021-12-15 15:35 ` [PATCH v4 06/14] vfio-user: find and init PCI device Jagannathan Raman
2021-12-16 10:39 ` Stefan Hajnoczi
2021-12-17 3:12 ` Jag Raman
2021-12-15 15:35 ` [PATCH v4 07/14] vfio-user: run vfio-user context Jagannathan Raman
2021-12-16 11:17 ` Stefan Hajnoczi
2021-12-17 17:59 ` Jag Raman
2021-12-20 8:29 ` Stefan Hajnoczi
2021-12-21 3:04 ` Jag Raman
2022-01-05 10:38 ` Thanos Makatos
2022-01-06 13:35 ` Stefan Hajnoczi
2022-01-10 17:56 ` John Levon
2022-01-11 9:36 ` Stefan Hajnoczi
2022-01-11 13:12 ` Jag Raman
2021-12-15 15:35 ` [PATCH v4 08/14] vfio-user: handle PCI config space accesses Jagannathan Raman
2021-12-16 11:30 ` Stefan Hajnoczi
2021-12-16 11:47 ` John Levon
2021-12-16 16:00 ` Stefan Hajnoczi
2021-12-15 15:35 ` [PATCH v4 09/14] vfio-user: handle DMA mappings Jagannathan Raman
2021-12-16 13:24 ` Stefan Hajnoczi
2021-12-17 19:11 ` Jag Raman
2021-12-15 15:35 ` [PATCH v4 10/14] vfio-user: handle PCI BAR accesses Jagannathan Raman
2021-12-16 14:10 ` Stefan Hajnoczi
2021-12-17 19:12 ` Jag Raman
2021-12-15 15:35 ` [PATCH v4 11/14] vfio-user: IOMMU support for remote device Jagannathan Raman
2021-12-16 14:40 ` Stefan Hajnoczi
2021-12-17 20:00 ` Jag Raman
2021-12-20 14:36 ` Stefan Hajnoczi [this message]
2021-12-21 4:32 ` Jag Raman
2022-01-06 13:10 ` Stefan Hajnoczi
2021-12-15 15:35 ` [PATCH v4 12/14] vfio-user: handle device interrupts Jagannathan Raman
2021-12-16 15:56 ` Stefan Hajnoczi
2021-12-15 15:35 ` [PATCH v4 13/14] vfio-user: register handlers to facilitate migration Jagannathan Raman
2021-12-15 15:35 ` [PATCH v4 14/14] vfio-user: avocado tests for vfio-user Jagannathan Raman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YcCVBLdd/WtLR49h@stefanha-x1.localdomain \
--to=stefanha@redhat.com \
--cc=alex.bennee@linaro.org \
--cc=alex.williamson@redhat.com \
--cc=bleal@redhat.com \
--cc=crosa@redhat.com \
--cc=elena.ufimtseva@oracle.com \
--cc=jag.raman@oracle.com \
--cc=john.g.johnson@oracle.com \
--cc=john.levon@nutanix.com \
--cc=marcandre.lureau@gmail.com \
--cc=pbonzini@redhat.com \
--cc=philmd@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=swapnil.ingle@nutanix.com \
--cc=thanos.makatos@nutanix.com \
--cc=thuth@redhat.com \
--cc=wainersm@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).