From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50985) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1agQ6y-0002d0-Qj for qemu-devel@nongnu.org; Thu, 17 Mar 2016 01:04:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1agQ6v-0000oW-Gv for qemu-devel@nongnu.org; Thu, 17 Mar 2016 01:04:40 -0400 Received: from mail-pf0-x242.google.com ([2607:f8b0:400e:c00::242]:33356) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1agQ6v-0000oM-1E for qemu-devel@nongnu.org; Thu, 17 Mar 2016 01:04:37 -0400 Received: by mail-pf0-x242.google.com with SMTP id x3so9449113pfb.0 for ; Wed, 16 Mar 2016 22:04:36 -0700 (PDT) References: <1456823441-46757-1-git-send-email-aik@ozlabs.ru> <1456823441-46757-12-git-send-email-aik@ozlabs.ru> <20160303063013.GL1620@voom.redhat.com> <56E7793C.3000209@ozlabs.ru> <20160315054230.GH15272@voom.fritz.box> From: Alexey Kardashevskiy Message-ID: <56EA3ADD.8010704@ozlabs.ru> Date: Thu, 17 Mar 2016 16:04:29 +1100 MIME-Version: 1.0 In-Reply-To: <20160315054230.GH15272@voom.fritz.box> Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu v13 11/16] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: Alex Williamson , qemu-ppc@nongnu.org, qemu-devel@nongnu.org On 03/15/2016 04:42 PM, David Gibson wrote: > On Tue, Mar 15, 2016 at 01:53:48PM +1100, Alexey Kardashevskiy wrote: >> On 03/03/2016 05:30 PM, David Gibson wrote: >>> On Tue, Mar 01, 2016 at 08:10:36PM +1100, Alexey Kardashevskiy wrote: >>>> This makes use of the new "memory registering" feature. The idea is >>>> to provide the userspace ability to notify the host kernel about pages >>>> which are going to be used for DMA. Having this information, the host >>>> kernel can pin them all once per user process, do locked pages >>>> accounting (once) and not spent time on doing that in real time with >>>> possible failures which cannot be handled nicely in some cases. >>>> >>>> This adds a prereg memory listener which listens on address_space_memory >>>> and notifies a VFIO container about memory which needs to be >>>> pinned/unpinned. VFIO MMIO regions (i.e. "skip dump" regions) are skipped. >>>> >>>> As there is no per-IOMMU-type release() callback anymore, this stores >>>> the IOMMU type in the container so vfio_listener_release() can device >>>> if it needs to unregister @prereg_listener. >>>> >>>> The feature is only enabled for SPAPR IOMMU v2. The host kernel changes >>>> are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does >>>> not call it when v2 is detected and enabled. >>>> >>>> This does not change the guest visible interface. >>>> >>>> Signed-off-by: Alexey Kardashevskiy >>>> --- >>>> hw/vfio/Makefile.objs | 1 + >>>> hw/vfio/common.c | 39 +++++++++--- >>>> hw/vfio/prereg.c | 138 ++++++++++++++++++++++++++++++++++++++++++ >>>> include/hw/vfio/vfio-common.h | 4 ++ >>>> trace-events | 2 + >>>> 5 files changed, 175 insertions(+), 9 deletions(-) >>>> create mode 100644 hw/vfio/prereg.c >>>> >>>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs >>>> index ceddbb8..5800e0e 100644 >>>> --- a/hw/vfio/Makefile.objs >>>> +++ b/hw/vfio/Makefile.objs >>>> @@ -4,4 +4,5 @@ obj-$(CONFIG_PCI) += pci.o pci-quirks.o >>>> obj-$(CONFIG_SOFTMMU) += platform.o >>>> obj-$(CONFIG_SOFTMMU) += calxeda-xgmac.o >>>> obj-$(CONFIG_SOFTMMU) += amd-xgbe.o >>>> +obj-$(CONFIG_SOFTMMU) += prereg.o >>>> endif >>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >>>> index 3aaa6b5..f2a03e0 100644 >>>> --- a/hw/vfio/common.c >>>> +++ b/hw/vfio/common.c >>>> @@ -531,6 +531,9 @@ static const MemoryListener vfio_iommu_listener = { >>>> static void vfio_listener_release(VFIOContainer *container) >>>> { >>>> memory_listener_unregister(&container->iommu_listener.listener); >>>> + if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) { >>>> + memory_listener_unregister(&container->prereg_listener.listener); >>>> + } >>>> } >>>> >>>> int vfio_mmap_region(Object *obj, VFIORegion *region, >>>> @@ -722,8 +725,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as) >>>> goto free_container_exit; >>>> } >>>> >>>> - ret = ioctl(fd, VFIO_SET_IOMMU, >>>> - v2 ? VFIO_TYPE1v2_IOMMU : VFIO_TYPE1_IOMMU); >>>> + container->iommu_type = v2 ? VFIO_TYPE1v2_IOMMU : VFIO_TYPE1_IOMMU; >>>> + ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type); >>>> if (ret) { >>>> error_report("vfio: failed to set iommu for container: %m"); >>>> ret = -errno; >>>> @@ -748,8 +751,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as) >>>> if ((ret == 0) && (info.flags & VFIO_IOMMU_INFO_PGSIZES)) { >>>> container->iova_pgsizes = info.iova_pgsizes; >>>> } >>>> - } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) { >>>> + } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) || >>>> + ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU)) { >>>> struct vfio_iommu_spapr_tce_info info; >>>> + bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU); >>>> >>>> ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd); >>>> if (ret) { >>>> @@ -757,7 +762,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as) >>>> ret = -errno; >>>> goto free_container_exit; >>>> } >>>> - ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU); >>>> + container->iommu_type = >>>> + v2 ? VFIO_SPAPR_TCE_v2_IOMMU : VFIO_SPAPR_TCE_IOMMU; >>>> + ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type); >>> >>> It'd be nice to consolidate the setting of container->iommu_type and >>> then the SET_IOMMU ioctl() rather than having more or less duplicated >>> logic for Type1 and SPAPR, but it's not a big deal. >> >> >> May be but I cannot think of any nice way of doing this though. >> >> >>> >>>> if (ret) { >>>> error_report("vfio: failed to set iommu for container: %m"); >>>> ret = -errno; >>>> @@ -769,11 +776,25 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as) >>>> * when container fd is closed so we do not call it explicitly >>>> * in this file. >>>> */ >>>> - ret = ioctl(fd, VFIO_IOMMU_ENABLE); >>>> - if (ret) { >>>> - error_report("vfio: failed to enable container: %m"); >>>> - ret = -errno; >>>> - goto free_container_exit; >>>> + if (!v2) { >>>> + ret = ioctl(fd, VFIO_IOMMU_ENABLE); >>>> + if (ret) { >>>> + error_report("vfio: failed to enable container: %m"); >>>> + ret = -errno; >>>> + goto free_container_exit; >>>> + } >>>> + } else { >>>> + container->prereg_listener.container = container; >>>> + container->prereg_listener.listener = vfio_prereg_listener; >>>> + >>>> + memory_listener_register(&container->prereg_listener.listener, >>>> + &address_space_memory); >>> >>> This assumes that the target address space of the (guest) IOMMU is >>> address_space_memory. Which is fine - vfio already assumes that - but >>> it reminds me that it'd be nice to have an explicit check for that (I >>> guess it would have to go in vfio_iommu_map_notify()). So that if >>> someone constructs a machine where that's not the case, it'll at least >>> be obvious why VFIO isn't working. >> >> Ok, I'll add a small patch for this in the next respin. > > Ok. > >>>> + if (container->error) { >>>> + error_report("vfio: RAM memory listener initialization failed for container"); >>>> + memory_listener_unregister( >>>> + &container->prereg_listener.listener); >>>> + goto free_container_exit; >>>> + } >>>> } >>> >>> Looks like you don't have an error path which will handle the case >>> where the prereg listener is registered, but registering the normal >>> PCI AS listener fails - I believe you will fail to unregister the >>> prereg listener in that case. >> >> >> In this case, the control goes to listener_release_exit: which calls >> vfio_listener_release() which unregisters both listeners (it is a few chunks >> above). > > Ah.. yes. In which case this could also jump to listener_release_exit > and avoid the explicit unreg(), yes? Sorry, I do not follow you here. It does jump to listener_release_exit already. > >>>> /* >>>> diff --git a/hw/vfio/prereg.c b/hw/vfio/prereg.c >>>> new file mode 100644 >>>> index 0000000..66cd696 >>>> --- /dev/null >>>> +++ b/hw/vfio/prereg.c >>>> @@ -0,0 +1,138 @@ >>>> +/* >>>> + * DMA memory preregistration >>>> + * >>>> + * Authors: >>>> + * Alexey Kardashevskiy >>>> + * >>>> + * This work is licensed under the terms of the GNU GPL, version 2. See >>>> + * the COPYING file in the top-level directory. >>>> + */ >>>> + >>>> +#include "qemu/osdep.h" >>>> +#include >>>> +#include >>>> + >>>> +#include "hw/vfio/vfio-common.h" >>>> +#include "hw/vfio/vfio.h" >>>> +#include "qemu/error-report.h" >>>> +#include "trace.h" >>>> + >>>> +static bool vfio_prereg_listener_skipped_section(MemoryRegionSection *section) >>>> +{ >>>> + return (!memory_region_is_ram(section->mr) && >>>> + !memory_region_is_iommu(section->mr)) || >>>> + memory_region_is_skip_dump(section->mr); >>>> +} >>>> + >>>> +static void vfio_prereg_listener_region_add(MemoryListener *listener, >>>> + MemoryRegionSection *section) >>>> +{ >>>> + VFIOMemoryListener *vlistener = container_of(listener, VFIOMemoryListener, >>>> + listener); >>>> + VFIOContainer *container = vlistener->container; >>>> + hwaddr iova; >>>> + Int128 llend; >>>> + int ret; >>>> + hwaddr page_mask = qemu_real_host_page_mask; >>>> + struct vfio_iommu_spapr_register_memory reg = { >>>> + .argsz = sizeof(reg), >>>> + .flags = 0, >>>> + }; >>>> + >>>> + if (vfio_prereg_listener_skipped_section(section)) { >>>> + trace_vfio_listener_region_add_skip( >>>> + section->offset_within_address_space, >>>> + section->offset_within_address_space + >>>> + int128_get64(int128_sub(section->size, int128_one()))); >>>> + return; >>>> + } >>> >>> You should probably explicitly check for IOMMU regions and abort if >>> you find one. An IOMMU AS appearing within address_space_memory would >>> be bad news. >> >> >> Oh, vfio_prereg_listener_skipped_section() allows memory_region_is_iommu(), >> I'll remove it. > > Well, that's part. > > But IOMMU regions appearing here shouldn't just be ignored - they > should be treated as a fatal error. Is this always an error when they appear in &address_space_memory? Because if it is so, we should do this check somewhere in memory_region_transaction_commit() (memory_region_add_subregion() cannot check this, there is no AS). -- Alexey