From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36852) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aiFkC-0002ur-OG for qemu-devel@nongnu.org; Tue, 22 Mar 2016 02:24:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aiFk7-0003Yg-U0 for qemu-devel@nongnu.org; Tue, 22 Mar 2016 02:24:44 -0400 Received: from mail-pf0-x244.google.com ([2607:f8b0:400e:c00::244]:36827) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aiFk7-0003Ya-HL for qemu-devel@nongnu.org; Tue, 22 Mar 2016 02:24:39 -0400 Received: by mail-pf0-x244.google.com with SMTP id q129so34564373pfb.3 for ; Mon, 21 Mar 2016 23:24:39 -0700 (PDT) References: <1458546426-26222-1-git-send-email-aik@ozlabs.ru> <1458546426-26222-17-git-send-email-aik@ozlabs.ru> <20160322044559.GE23586@voom.redhat.com> From: Alexey Kardashevskiy Message-ID: <56F0E521.6060506@ozlabs.ru> Date: Tue, 22 Mar 2016 17:24:33 +1100 MIME-Version: 1.0 In-Reply-To: <20160322044559.GE23586@voom.redhat.com> Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH qemu v14 16/18] spapr_iommu, vfio, memory: Notify IOMMU about starting/stopping being used by VFIO List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: Alex Williamson , qemu-ppc@nongnu.org, qemu-devel@nongnu.org On 03/22/2016 03:45 PM, David Gibson wrote: > On Mon, Mar 21, 2016 at 06:47:04PM +1100, Alexey Kardashevskiy wrote: >> The sPAPR TCE tables manage 2 copies when VFIO is using an IOMMU - >> a guest view of the table and a hardware TCE table. If there is no VFIO >> presense in the address space, then just the guest view is used, if >> this is the case, it is allocated in the KVM. However since there is no >> support yet for VFIO in KVM TCE hypercalls, when we start using VFIO, >> we need to move the guest view from KVM to the userspace; and we need >> to do this for every IOMMU on a bus with VFIO devices. >> >> This adds vfio_start/vfio_stop callbacks in MemoryRegionIOMMUOps to >> notifiy IOMMU about changing environment so it can reallocate the table >> to/from KVM or (when available) hook the IOMMU groups with the logical >> bus (LIOBN) in the KVM. >> >> This removes explicit spapr_tce_set_need_vfio() call from PCI hotplug >> path as the new callbacks do this better - they notify IOMMU at >> the exact moment when the configuration is changed, and this also >> includes the case of PCI hot unplug. >> >> TODO: split into 2 or 3 patches, per maintainership area. >> >> Signed-off-by: Alexey Kardashevskiy > > I'm finding this one much easier to follow than the previous revision. > >> --- >> hw/ppc/spapr_iommu.c | 12 ++++++++++++ >> hw/ppc/spapr_pci.c | 6 ------ >> hw/vfio/common.c | 9 +++++++++ >> include/exec/memory.h | 4 ++++ >> 4 files changed, 25 insertions(+), 6 deletions(-) >> >> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c >> index 6dc3c45..702075d 100644 >> --- a/hw/ppc/spapr_iommu.c >> +++ b/hw/ppc/spapr_iommu.c >> @@ -151,6 +151,16 @@ static uint64_t spapr_tce_get_page_sizes(MemoryRegion *iommu) >> return 1ULL << tcet->page_shift; >> } >> >> +static void spapr_tce_vfio_start(MemoryRegion *iommu) >> +{ >> + spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), true); >> +} >> + >> +static void spapr_tce_vfio_stop(MemoryRegion *iommu) >> +{ >> + spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), false); >> +} > > Wonder if a single callback which takes a boolean might be a little > less clunky. I have a feeling that at least once I was asked to do the opposite and now we have take_ownership/release_ownership. This does not seem to be much different and the existing names are more self-documenting than the previous vfio_notify() or whatever name I could think of. >> static void spapr_tce_table_do_enable(sPAPRTCETable *tcet); >> static void spapr_tce_table_do_disable(sPAPRTCETable *tcet); >> >> @@ -211,6 +221,8 @@ static const VMStateDescription vmstate_spapr_tce_table = { >> static MemoryRegionIOMMUOps spapr_iommu_ops = { >> .translate = spapr_tce_translate_iommu, >> .get_page_sizes = spapr_tce_get_page_sizes, >> + .vfio_start = spapr_tce_vfio_start, >> + .vfio_stop = spapr_tce_vfio_stop, >> }; >> >> static int spapr_tce_table_realize(DeviceState *dev) >> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c >> index bfcafdf..af99a36 100644 >> --- a/hw/ppc/spapr_pci.c >> +++ b/hw/ppc/spapr_pci.c >> @@ -1121,12 +1121,6 @@ static void spapr_phb_add_pci_device(sPAPRDRConnector *drc, >> void *fdt = NULL; >> int fdt_start_offset = 0, fdt_size; >> >> - if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) { >> - sPAPRTCETable *tcet = spapr_tce_find_by_liobn(phb->dma_liobn); >> - >> - spapr_tce_set_need_vfio(tcet, true); >> - } >> - >> if (dev->hotplugged) { >> fdt = create_device_tree(&fdt_size); >> fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0); >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >> index b257655..4e873b7 100644 >> --- a/hw/vfio/common.c >> +++ b/hw/vfio/common.c >> @@ -421,6 +421,9 @@ static void vfio_listener_region_add(MemoryListener *listener, >> QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next); >> >> memory_region_register_iommu_notifier(giommu->iommu, &giommu->n); >> + if (section->mr->iommu_ops && section->mr->iommu_ops->vfio_start) { >> + section->mr->iommu_ops->vfio_start(section->mr); >> + } >> memory_region_iommu_replay(giommu->iommu, &giommu->n, >> false); >> >> @@ -466,6 +469,7 @@ static void vfio_listener_region_del(MemoryListener *listener, >> VFIOContainer *container = container_of(listener, VFIOContainer, listener); >> hwaddr iova, end; >> int ret; >> + MemoryRegion *iommu = NULL; >> >> if (vfio_listener_skipped_section(section)) { >> trace_vfio_listener_region_del_skip( >> @@ -487,6 +491,7 @@ static void vfio_listener_region_del(MemoryListener *listener, >> QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) { >> if (giommu->iommu == section->mr) { >> memory_region_unregister_iommu_notifier(&giommu->n); >> + iommu = giommu->iommu; >> QLIST_REMOVE(giommu, giommu_next); >> g_free(giommu); >> break; >> @@ -519,6 +524,10 @@ static void vfio_listener_region_del(MemoryListener *listener, >> "0x%"HWADDR_PRIx") = %d (%m)", >> container, iova, end - iova, ret); >> } >> + >> + if (iommu && iommu->iommu_ops && iommu->iommu_ops->vfio_stop) { >> + iommu->iommu_ops->vfio_stop(section->mr); >> + } > > IIRC there can be multiple containers listening on the same PCI > address space. In that case, this won't be correct, because once one > of the VFIO containers is removed, it will call vfio_stop, even though > the other VFIO container still needs the guest IOMMU to support it. > > So I think you need some sort of refcounting here. Right, missed this bit, good finding. > >> } >> >> static const MemoryListener vfio_memory_listener = { >> diff --git a/include/exec/memory.h b/include/exec/memory.h >> index eb5ce67..f1de133f 100644 >> --- a/include/exec/memory.h >> +++ b/include/exec/memory.h >> @@ -152,6 +152,10 @@ struct MemoryRegionIOMMUOps { >> IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool is_write); >> /* Returns supported page sizes */ >> uint64_t (*get_page_sizes)(MemoryRegion *iommu); >> + /* Called when VFIO starts using this */ >> + void (*vfio_start)(MemoryRegion *iommu); >> + /* Called when VFIO stops using this */ >> + void (*vfio_stop)(MemoryRegion *iommu); >> }; >> >> typedef struct CoalescedMemoryRange CoalescedMemoryRange; > -- Alexey