From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41242) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aht0y-0007bq-Pm for qemu-devel@nongnu.org; Mon, 21 Mar 2016 02:08:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aht0w-0006Ey-8j for qemu-devel@nongnu.org; Mon, 21 Mar 2016 02:08:32 -0400 Received: from mail-pf0-x244.google.com ([2607:f8b0:400e:c00::244]:35941) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aht0v-0006EO-Tj for qemu-devel@nongnu.org; Mon, 21 Mar 2016 02:08:30 -0400 Received: by mail-pf0-x244.google.com with SMTP id q129so28936652pfb.3 for ; Sun, 20 Mar 2016 23:08:29 -0700 (PDT) References: <1456823441-46757-1-git-send-email-aik@ozlabs.ru> <1456823441-46757-12-git-send-email-aik@ozlabs.ru> <20160303063013.GL1620@voom.redhat.com> <56E7793C.3000209@ozlabs.ru> <20160315054230.GH15272@voom.fritz.box> <56EA3ADD.8010704@ozlabs.ru> <20160317061034.GN9032@voom> <56EA7797.5060706@ozlabs.ru> <20160321045358.GH23586@voom.redhat.com> From: Alexey Kardashevskiy Message-ID: <56EF8FD6.5070308@ozlabs.ru> Date: Mon, 21 Mar 2016 17:08:22 +1100 MIME-Version: 1.0 In-Reply-To: <20160321045358.GH23586@voom.redhat.com> Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu v13 11/16] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: Alex Williamson , qemu-ppc@nongnu.org, qemu-devel@nongnu.org On 03/21/2016 03:53 PM, David Gibson wrote: > On Thu, Mar 17, 2016 at 08:23:35PM +1100, Alexey Kardashevskiy wrote: >> On 03/17/2016 05:10 PM, David Gibson wrote: >>> On Thu, Mar 17, 2016 at 04:04:29PM +1100, Alexey Kardashevskiy wrote: >>>> On 03/15/2016 04:42 PM, David Gibson wrote: >>>>> On Tue, Mar 15, 2016 at 01:53:48PM +1100, Alexey Kardashevskiy wrote: >>>>>> On 03/03/2016 05:30 PM, David Gibson wrote: >>>>>>> On Tue, Mar 01, 2016 at 08:10:36PM +1100, Alexey Kardashevskiy wrote: >>>>>>>> This makes use of the new "memory registering" feature. The idea is >>>>>>>> to provide the userspace ability to notify the host kernel about pages >>>>>>>> which are going to be used for DMA. Having this information, the host >>>>>>>> kernel can pin them all once per user process, do locked pages >>>>>>>> accounting (once) and not spent time on doing that in real time with >>>>>>>> possible failures which cannot be handled nicely in some cases. >>>>>>>> >>>>>>>> This adds a prereg memory listener which listens on address_space_memory >>>>>>>> and notifies a VFIO container about memory which needs to be >>>>>>>> pinned/unpinned. VFIO MMIO regions (i.e. "skip dump" regions) are skipped. >>>>>>>> >>>>>>>> As there is no per-IOMMU-type release() callback anymore, this stores >>>>>>>> the IOMMU type in the container so vfio_listener_release() can device >>>>>>>> if it needs to unregister @prereg_listener. >>>>>>>> >>>>>>>> The feature is only enabled for SPAPR IOMMU v2. The host kernel changes >>>>>>>> are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does >>>>>>>> not call it when v2 is detected and enabled. >>>>>>>> >>>>>>>> This does not change the guest visible interface. >>>>>>>> >>>>>>>> Signed-off-by: Alexey Kardashevskiy >>>>>>>> --- >>>>>>>> hw/vfio/Makefile.objs | 1 + >>>>>>>> hw/vfio/common.c | 39 +++++++++--- >>>>>>>> hw/vfio/prereg.c | 138 ++++++++++++++++++++++++++++++++++++++++++ >>>>>>>> include/hw/vfio/vfio-common.h | 4 ++ >>>>>>>> trace-events | 2 + >>>>>>>> 5 files changed, 175 insertions(+), 9 deletions(-) >>>>>>>> create mode 100644 hw/vfio/prereg.c >>>>>>>> >>>>>>>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs >>>>>>>> index ceddbb8..5800e0e 100644 >>>>>>>> --- a/hw/vfio/Makefile.objs >>>>>>>> +++ b/hw/vfio/Makefile.objs >>>>>>>> @@ -4,4 +4,5 @@ obj-$(CONFIG_PCI) += pci.o pci-quirks.o >>>>>>>> obj-$(CONFIG_SOFTMMU) += platform.o >>>>>>>> obj-$(CONFIG_SOFTMMU) += calxeda-xgmac.o >>>>>>>> obj-$(CONFIG_SOFTMMU) += amd-xgbe.o >>>>>>>> +obj-$(CONFIG_SOFTMMU) += prereg.o >>>>>>>> endif >>>>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >>>>>>>> index 3aaa6b5..f2a03e0 100644 >>>>>>>> --- a/hw/vfio/common.c >>>>>>>> +++ b/hw/vfio/common.c >>>>>>>> @@ -531,6 +531,9 @@ static const MemoryListener vfio_iommu_listener = { >>>>>>>> static void vfio_listener_release(VFIOContainer *container) >>>>>>>> { >>>>>>>> memory_listener_unregister(&container->iommu_listener.listener); >>>>>>>> + if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) { >>>>>>>> + memory_listener_unregister(&container->prereg_listener.listener); >>>>>>>> + } >>>>>>>> } >>>>>>>> >>>>>>>> int vfio_mmap_region(Object *obj, VFIORegion *region, >>>>>>>> @@ -722,8 +725,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as) >>>>>>>> goto free_container_exit; >>>>>>>> } >>>>>>>> >>>>>>>> - ret = ioctl(fd, VFIO_SET_IOMMU, >>>>>>>> - v2 ? VFIO_TYPE1v2_IOMMU : VFIO_TYPE1_IOMMU); >>>>>>>> + container->iommu_type = v2 ? VFIO_TYPE1v2_IOMMU : VFIO_TYPE1_IOMMU; >>>>>>>> + ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type); >>>>>>>> if (ret) { >>>>>>>> error_report("vfio: failed to set iommu for container: %m"); >>>>>>>> ret = -errno; >>>>>>>> @@ -748,8 +751,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as) >>>>>>>> if ((ret == 0) && (info.flags & VFIO_IOMMU_INFO_PGSIZES)) { >>>>>>>> container->iova_pgsizes = info.iova_pgsizes; >>>>>>>> } >>>>>>>> - } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) { >>>>>>>> + } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) || >>>>>>>> + ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU)) { >>>>>>>> struct vfio_iommu_spapr_tce_info info; >>>>>>>> + bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU); >>>>>>>> >>>>>>>> ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd); >>>>>>>> if (ret) { >>>>>>>> @@ -757,7 +762,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as) >>>>>>>> ret = -errno; >>>>>>>> goto free_container_exit; >>>>>>>> } >>>>>>>> - ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU); >>>>>>>> + container->iommu_type = >>>>>>>> + v2 ? VFIO_SPAPR_TCE_v2_IOMMU : VFIO_SPAPR_TCE_IOMMU; >>>>>>>> + ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type); >>>>>>> >>>>>>> It'd be nice to consolidate the setting of container->iommu_type and >>>>>>> then the SET_IOMMU ioctl() rather than having more or less duplicated >>>>>>> logic for Type1 and SPAPR, but it's not a big deal. >>>>>> >>>>>> >>>>>> May be but I cannot think of any nice way of doing this though. >>>>>> >>>>>> >>>>>>> >>>>>>>> if (ret) { >>>>>>>> error_report("vfio: failed to set iommu for container: %m"); >>>>>>>> ret = -errno; >>>>>>>> @@ -769,11 +776,25 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as) >>>>>>>> * when container fd is closed so we do not call it explicitly >>>>>>>> * in this file. >>>>>>>> */ >>>>>>>> - ret = ioctl(fd, VFIO_IOMMU_ENABLE); >>>>>>>> - if (ret) { >>>>>>>> - error_report("vfio: failed to enable container: %m"); >>>>>>>> - ret = -errno; >>>>>>>> - goto free_container_exit; >>>>>>>> + if (!v2) { >>>>>>>> + ret = ioctl(fd, VFIO_IOMMU_ENABLE); >>>>>>>> + if (ret) { >>>>>>>> + error_report("vfio: failed to enable container: %m"); >>>>>>>> + ret = -errno; >>>>>>>> + goto free_container_exit; >>>>>>>> + } >>>>>>>> + } else { >>>>>>>> + container->prereg_listener.container = container; >>>>>>>> + container->prereg_listener.listener = vfio_prereg_listener; >>>>>>>> + >>>>>>>> + memory_listener_register(&container->prereg_listener.listener, >>>>>>>> + &address_space_memory); >>>>>>> >>>>>>> This assumes that the target address space of the (guest) IOMMU is >>>>>>> address_space_memory. Which is fine - vfio already assumes that - but >>>>>>> it reminds me that it'd be nice to have an explicit check for that (I >>>>>>> guess it would have to go in vfio_iommu_map_notify()). So that if >>>>>>> someone constructs a machine where that's not the case, it'll at least >>>>>>> be obvious why VFIO isn't working. >>>>>> >>>>>> Ok, I'll add a small patch for this in the next respin. >>>>> >>>>> Ok. >>>>> >>>>>>>> + if (container->error) { >>>>>>>> + error_report("vfio: RAM memory listener initialization failed for container"); >>>>>>>> + memory_listener_unregister( >>>>>>>> + &container->prereg_listener.listener); >>>>>>>> + goto free_container_exit; >>>>>>>> + } >>>>>>>> } >>>>>>> >>>>>>> Looks like you don't have an error path which will handle the case >>>>>>> where the prereg listener is registered, but registering the normal >>>>>>> PCI AS listener fails - I believe you will fail to unregister the >>>>>>> prereg listener in that case. >>>>>> >>>>>> >>>>>> In this case, the control goes to listener_release_exit: which calls >>>>>> vfio_listener_release() which unregisters both listeners (it is a few chunks >>>>>> above). >>>>> >>>>> Ah.. yes. In which case this could also jump to listener_release_exit >>>>> and avoid the explicit unreg(), yes? >>>> >>>> >>>> Sorry, I do not follow you here. It does jump to >>>> listener_release_exit already. >>> >>> I mean you can use the listener_release_exit label on failure of the >>> prereg listener as well as failure of the regular listener. >> >> >> When vfio_prereg_listener fails, vfio_memory_listener is not registered - it >> just looks cleaner to jump further to free_container_exit than to >> listener_release_exit, does not it? > > If I was writing from scratch, I'd probably do it like that. But the > existing failure path for the PCI address space listener goes to the > label which (optionally) cleans up the PCI address space listener, > which should not be necessary if registration has failed. vfio_listener_release() unconditionally calls memory_listener_unregister() which unconditionally calls QTAILQ_REMOVE. Is it considered safe (the code looks ok) to call QTAILQ_REMOVE() on something which has not been QTAILQ_INSERT_BEFORE()'d? -- Alexey