From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53282) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1asoEJ-0001gJ-Mc for qemu-devel@nongnu.org; Wed, 20 Apr 2016 05:15:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1asoEE-0003Qs-V3 for qemu-devel@nongnu.org; Wed, 20 Apr 2016 05:15:27 -0400 Received: from mail-pf0-x22f.google.com ([2607:f8b0:400e:c00::22f]:33033) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1asoEE-0003QJ-Gb for qemu-devel@nongnu.org; Wed, 20 Apr 2016 05:15:22 -0400 Received: by mail-pf0-x22f.google.com with SMTP id 184so16574866pff.0 for ; Wed, 20 Apr 2016 02:15:20 -0700 (PDT) References: <1459762426-18440-1-git-send-email-aik@ozlabs.ru> <1459762426-18440-15-git-send-email-aik@ozlabs.ru> <20160407004056.GE16485@voom.fritz.box> From: Alexey Kardashevskiy Message-ID: <571748A3.4070105@ozlabs.ru> Date: Wed, 20 Apr 2016 19:15:15 +1000 MIME-Version: 1.0 In-Reply-To: <20160407004056.GE16485@voom.fritz.box> Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH qemu v15 14/17] spapr_iommu, vfio, memory: Notify IOMMU about starting/stopping being used by VFIO List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: qemu-devel@nongnu.org, qemu-ppc@nongnu.org, Alex Williamson , Alexander Graf On 04/07/2016 10:40 AM, David Gibson wrote: > On Mon, Apr 04, 2016 at 07:33:43PM +1000, Alexey Kardashevskiy wrote: >> The sPAPR TCE tables manage 2 copies when VFIO is using an IOMMU - >> a guest view of the table and a hardware TCE table. If there is no VFIO >> presense in the address space, then just the guest view is used, if >> this is the case, it is allocated in the KVM. However since there is no >> support yet for VFIO in KVM TCE hypercalls, when we start using VFIO, >> we need to move the guest view from KVM to the userspace; and we need >> to do this for every IOMMU on a bus with VFIO devices. >> >> This adds vfio_start/vfio_stop callbacks in MemoryRegionIOMMUOps to >> notifiy IOMMU about changing environment so it can reallocate the table >> to/from KVM or (when available) hook the IOMMU groups with the logical >> bus (LIOBN) in the KVM. >> >> This removes explicit spapr_tce_set_need_vfio() call from PCI hotplug >> path as the new callbacks do this better - they notify IOMMU at >> the exact moment when the configuration is changed, and this also >> includes the case of PCI hot unplug. >> >> As there can be multiple containers attached to the same PHB/LIOBN, >> this replaces the @need_vfio flag in sPAPRTCETable with the counter >> of VFIO users. >> >> Signed-off-by: Alexey Kardashevskiy > > This looks correct, but there's one remaining ugly. > >> --- >> Changes: >> v15: >> * s/need_vfio/vfio-Users/g >> --- >> hw/ppc/spapr_iommu.c | 30 ++++++++++++++++++++---------- >> hw/ppc/spapr_pci.c | 6 ------ >> hw/vfio/common.c | 9 +++++++++ >> include/exec/memory.h | 4 ++++ >> include/hw/ppc/spapr.h | 2 +- >> 5 files changed, 34 insertions(+), 17 deletions(-) >> >> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c >> index c945dba..ea09414 100644 >> --- a/hw/ppc/spapr_iommu.c >> +++ b/hw/ppc/spapr_iommu.c >> @@ -155,6 +155,16 @@ static uint64_t spapr_tce_get_page_sizes(MemoryRegion *iommu) >> return 1ULL << tcet->page_shift; >> } >> >> +static void spapr_tce_vfio_start(MemoryRegion *iommu) >> +{ >> + spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), true); >> +} >> + >> +static void spapr_tce_vfio_stop(MemoryRegion *iommu) >> +{ >> + spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), false); >> +} >> + >> static void spapr_tce_table_do_enable(sPAPRTCETable *tcet); >> static void spapr_tce_table_do_disable(sPAPRTCETable *tcet); >> >> @@ -239,6 +249,8 @@ static const VMStateDescription vmstate_spapr_tce_table = { >> static MemoryRegionIOMMUOps spapr_iommu_ops = { >> .translate = spapr_tce_translate_iommu, >> .get_page_sizes = spapr_tce_get_page_sizes, >> + .vfio_start = spapr_tce_vfio_start, >> + .vfio_stop = spapr_tce_vfio_stop, > > Ok, so AFAICT these callbacks are called whenever a VFIO context is > added / removed from the gIOMMU's address space, and it's up to the > gIOMMU code to ref count that to see if there are any current vfio > users. That makes "vfio_start" and "vfio_stop" not great names. > > But.. better than changing the names would be to move the refcounting > to the generic code if you can manage it, so the individual gIOMMU > backends don't need to - they just told when they need to start / stop > providing VFIO support. Everything is manageable... This referencing is needed for the case of >=2 containers so 2xvfio_listener_region_add will create 2xVFIOGuestIOMMU as they are per VFIOContainer so VFIOGuestIOMMU is not the right place for the reference counting, VFIOAddressSpace seems to be that place (=> add list of IOMMU MRs with refcounter). Or even IOMMU MR. Or move VFIOGuestIOMMU list from VFIOContainer to VFIOAddressSpace and then gIOMMU can handle refcounting? > >> }; >> >> static int spapr_tce_table_realize(DeviceState *dev) >> @@ -248,7 +260,7 @@ static int spapr_tce_table_realize(DeviceState *dev) >> char tmp[32]; >> >> tcet->fd = -1; >> - tcet->need_vfio = false; >> + tcet->vfio_users = 0; >> snprintf(tmp, sizeof(tmp), "tce-root-%x", tcet->liobn); >> memory_region_init(&tcet->root, tcetobj, tmp, UINT64_MAX); >> >> @@ -268,20 +280,18 @@ void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio) >> size_t table_size = tcet->nb_table * sizeof(uint64_t); >> void *newtable; >> >> - if (need_vfio == tcet->need_vfio) { >> - /* Nothing to do */ >> - return; >> - } >> + tcet->vfio_users += need_vfio ? 1 : -1; >> + g_assert(tcet->vfio_users >= 0); >> + g_assert(tcet->table); >> >> - if (!need_vfio) { >> + if (!tcet->vfio_users) { >> /* FIXME: We don't support transition back to KVM accelerated >> * TCEs yet */ >> return; >> } >> >> - tcet->need_vfio = true; >> - >> - if (tcet->fd < 0) { >> + if (tcet->vfio_users > 1) { >> + g_assert(tcet->fd < 0); >> /* Table is already in userspace, nothing to be do */ >> return; >> } >> @@ -327,7 +337,7 @@ static void spapr_tce_table_do_enable(sPAPRTCETable *tcet) >> tcet->page_shift, >> tcet->nb_table, >> &tcet->fd, >> - tcet->need_vfio); >> + tcet->vfio_users != 0); >> >> memory_region_set_size(&tcet->iommu, >> (uint64_t)tcet->nb_table << tcet->page_shift); >> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c >> index 5497a18..f864fde 100644 >> --- a/hw/ppc/spapr_pci.c >> +++ b/hw/ppc/spapr_pci.c >> @@ -1083,12 +1083,6 @@ static void spapr_phb_add_pci_device(sPAPRDRConnector *drc, >> void *fdt = NULL; >> int fdt_start_offset = 0, fdt_size; >> >> - if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) { >> - sPAPRTCETable *tcet = spapr_tce_find_by_liobn(phb->dma_liobn); >> - >> - spapr_tce_set_need_vfio(tcet, true); >> - } >> - >> if (dev->hotplugged) { >> fdt = create_device_tree(&fdt_size); >> fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0); >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >> index ea79311..5e5b77c 100644 >> --- a/hw/vfio/common.c >> +++ b/hw/vfio/common.c >> @@ -421,6 +421,9 @@ static void vfio_listener_region_add(MemoryListener *listener, >> QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next); >> >> memory_region_register_iommu_notifier(giommu->iommu, &giommu->n); >> + if (section->mr->iommu_ops && section->mr->iommu_ops->vfio_start) { >> + section->mr->iommu_ops->vfio_start(section->mr); >> + } >> memory_region_iommu_replay(giommu->iommu, &giommu->n, >> false); >> >> @@ -466,6 +469,7 @@ static void vfio_listener_region_del(MemoryListener *listener, >> VFIOContainer *container = container_of(listener, VFIOContainer, listener); >> hwaddr iova, end; >> int ret; >> + MemoryRegion *iommu = NULL; >> >> if (vfio_listener_skipped_section(section)) { >> trace_vfio_listener_region_del_skip( >> @@ -487,6 +491,7 @@ static void vfio_listener_region_del(MemoryListener *listener, >> QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) { >> if (giommu->iommu == section->mr) { >> memory_region_unregister_iommu_notifier(&giommu->n); >> + iommu = giommu->iommu; >> QLIST_REMOVE(giommu, giommu_next); >> g_free(giommu); >> break; >> @@ -519,6 +524,10 @@ static void vfio_listener_region_del(MemoryListener *listener, >> "0x%"HWADDR_PRIx") = %d (%m)", >> container, iova, end - iova, ret); >> } >> + >> + if (iommu && iommu->iommu_ops && iommu->iommu_ops->vfio_stop) { >> + iommu->iommu_ops->vfio_stop(section->mr); >> + } >> } >> >> static const MemoryListener vfio_memory_listener = { >> diff --git a/include/exec/memory.h b/include/exec/memory.h >> index eb5ce67..f1de133f 100644 >> --- a/include/exec/memory.h >> +++ b/include/exec/memory.h >> @@ -152,6 +152,10 @@ struct MemoryRegionIOMMUOps { >> IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool is_write); >> /* Returns supported page sizes */ >> uint64_t (*get_page_sizes)(MemoryRegion *iommu); >> + /* Called when VFIO starts using this */ >> + void (*vfio_start)(MemoryRegion *iommu); >> + /* Called when VFIO stops using this */ >> + void (*vfio_stop)(MemoryRegion *iommu); >> }; >> >> typedef struct CoalescedMemoryRange CoalescedMemoryRange; >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h >> index 471eb4a..5c00e38 100644 >> --- a/include/hw/ppc/spapr.h >> +++ b/include/hw/ppc/spapr.h >> @@ -548,7 +548,7 @@ struct sPAPRTCETable { >> uint32_t mig_nb_table; >> uint64_t *mig_table; >> bool bypass; >> - bool need_vfio; >> + int vfio_users; >> int fd; >> MemoryRegion root, iommu; >> struct VIOsPAPRDevice *vdev; /* for @bypass migration compatibility only */ > -- Alexey