From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46795) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZCGl8-0005eP-8E for qemu-devel@nongnu.org; Mon, 06 Jul 2015 20:29:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZCGl6-0008Sm-CX for qemu-devel@nongnu.org; Mon, 06 Jul 2015 20:29:14 -0400 Date: Tue, 7 Jul 2015 10:29:07 +1000 From: David Gibson Message-ID: <20150707002907.GH17857@voom.redhat.com> References: <1436148670-6592-1-git-send-email-aik@ozlabs.ru> <1436148670-6592-14-git-send-email-aik@ozlabs.ru> <1436190148.3909.55.camel@redhat.com> <559A9FEA.1040609@ozlabs.ru> <1436199187.3909.90.camel@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="CEUtFxTsmBsHRLs3" Content-Disposition: inline In-Reply-To: <1436199187.3909.90.camel@redhat.com> Subject: Re: [Qemu-devel] [PATCH qemu v10 13/14] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: Alexey Kardashevskiy , Michael Roth , qemu-ppc@nongnu.org, qemu-devel@nongnu.org, Gavin Shan --CEUtFxTsmBsHRLs3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jul 06, 2015 at 10:13:07AM -0600, Alex Williamson wrote: > On Tue, 2015-07-07 at 01:34 +1000, Alexey Kardashevskiy wrote: > > On 07/06/2015 11:42 PM, Alex Williamson wrote: > > > On Mon, 2015-07-06 at 12:11 +1000, Alexey Kardashevskiy wrote: > > >> This makes use of the new "memory registering" feature. The idea is > > >> to provide the userspace ability to notify the host kernel about pag= es > > >> which are going to be used for DMA. Having this information, the host > > >> kernel can pin them all once per user process, do locked pages > > >> accounting (once) and not spent time on doing that in real time with > > >> possible failures which cannot be handled nicely in some cases. > > >> > > >> This adds a guest RAM memory listener which notifies a VFIO container > > >> about memory which needs to be pinned/unpinned. VFIO MMIO regions > > >> (i.e. "skip dump" regions) are skipped. > > >> > > >> The feature is only enabled for SPAPR IOMMU v2. The host kernel chan= ges > > >> are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this= does > > >> not call it when v2 is detected and enabled. > > >> > > >> This does not change the guest visible interface. > > >> > > >> Signed-off-by: Alexey Kardashevskiy > > >> Reviewed-by: David Gibson > > >> --- > > >> Changes: > > >> v9: > > >> * since there is no more SPAPR-specific data in container::iommu_dat= a, > > >> the memory preregistration fields are common and potentially can be = used > > >> by other architectures > > >> > > >> v7: > > >> * in vfio_spapr_ram_listener_region_del(), do unref() after ioctl() > > >> * s'ramlistener'register_listener' > > >> > > >> v6: > > >> * fixed commit log (s/guest/userspace/), added note about no guest v= isible > > >> change > > >> * fixed error checking if ram registration failed > > >> * added alignment check for section->offset_within_region > > >> > > >> v5: > > >> * simplified the patch > > >> * added trace points > > >> * added round_up() for the size > > >> * SPAPR IOMMU v2 used > > >> --- > > >> hw/vfio/common.c | 109 ++++++++++++++++++++++++++++++= ++++++++---- > > >> include/hw/vfio/vfio-common.h | 3 ++ > > >> trace-events | 1 + > > >> 3 files changed, 104 insertions(+), 9 deletions(-) > > >> > > >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c > > >> index 8eacfd7..0c7ba8c 100644 > > >> --- a/hw/vfio/common.c > > >> +++ b/hw/vfio/common.c > > >> @@ -488,6 +488,76 @@ static void vfio_listener_release(VFIOContainer= *container) > > >> memory_listener_unregister(&container->iommu_data.type1.listen= er); > > >> } > > >> > > >> +static void vfio_ram_do_region(VFIOContainer *container, > > >> + MemoryRegionSection *section, unsigne= d long req) > > >> +{ > > >> + int ret; > > >> + struct vfio_iommu_spapr_register_memory reg =3D { .argsz =3D si= zeof(reg) }; > > > > > > This function is not as general as the name would imply, it's spapr > > > specific due to this. How about vfio_spapr_register_memory() with a > > > bool parameter toggling register vs unregister so we're not passing an > > > arbitrary ioctl number? > >=20 > > Ok. Although I am quite often asked not to do such a thing and rather a= dd 2=20 > > helpers (reg/unreg, do/undo, etc) instead and reuse common bits. >=20 > I'm not a fan of functions that do the reverse process based on a bool > arg either, but I dislike them less than passing an arbitrary ioctl > number for a parameter. The former is ugly, but the latter is difficult > to use and difficult to maintain because it would be subtle later to > spot an unsupported ioctl being passed to the function. >=20 > > >> + > > >> + if (!memory_region_is_ram(section->mr) || > > >> + memory_region_is_skip_dump(section->mr)) { > > >> + return; > > >> + } > > >> + > > >> + if (unlikely((section->offset_within_region & (getpagesize() - = 1)))) { > > > > > > s/getpagesize()/qemu_real_host_page_size/? > >=20 > >=20 > > Oh, right, I guess it reached upstream now. > >=20 > >=20 > > >> + error_report("%s received unaligned region", __func__); > > >> + return; > > >> + } > > >> + > > >> + reg.vaddr =3D (__u64) memory_region_get_ram_ptr(section->mr) + > > >> + section->offset_within_region; > > >> + reg.size =3D ROUND_UP(int128_get64(section->size), TARGET_PAGE_= SIZE); > > >> + > > >> + ret =3D ioctl(container->fd, req, ®); > > >> + trace_vfio_ram_register(_IOC_NR(req) - VFIO_BASE, reg.vaddr, re= g.size, > > >> + ret ? -errno : 0); > > >> + if (!ret) { > > >> + return; > > >> + } > > >> + > > >> + /* > > >> + * On the initfn path, store the first error in the container s= o we > > >> + * can gracefully fail. Runtime, there's not much we can do ot= her > > >> + * than throw a hardware error. > > >> + */ > > >> + if (!container->iommu_data.ram_reg_initialized) { > > >> + if (!container->iommu_data.ram_reg_error) { > > >> + container->iommu_data.ram_reg_error =3D -errno; > > >> + } > > >> + } else { > > >> + hw_error("vfio: RAM registering failed, unable to continue"= ); > > >> + } > > > > > > I'd rather see: > > > > > > if (ret) { > > > if (!container...) { > > > ... > > > } else { > > > ... > > > } > > > } > > > > > > Exiting early on success and otherwise falling into error handling is= a > > > strange code flow. > >=20 > > Ok... vfio_dma_map() does not follow this rule so I thought it is not t= hat=20 > > strict :) >=20 > It would be nice to clean it up there too. >=20 > > >> +} > > >> + > > >> +static void vfio_ram_listener_region_add(MemoryListener *listener, > > >> + MemoryRegionSection *secti= on) > > >> +{ > > >> + VFIOContainer *container =3D container_of(listener, VFIOContain= er, > > >> + iommu_data.register_lis= tener); > > >> + memory_region_ref(section->mr); > > >> + vfio_ram_do_region(container, section, VFIO_IOMMU_SPAPR_REGISTE= R_MEMORY); > > > > > > vfio_spapr_register_memory(container, section, true); > > > > > >> +} > > >> + > > >> +static void vfio_ram_listener_region_del(MemoryListener *listener, > > >> + MemoryRegionSection *secti= on) > > >> +{ > > >> + VFIOContainer *container =3D container_of(listener, VFIOContain= er, > > >> + iommu_data.register_lis= tener); > > >> + vfio_ram_do_region(container, section, VFIO_IOMMU_SPAPR_UNREGIS= TER_MEMORY); > > > > > > vfio_spapr_register_memory(container, section, false); > > > > > >> + memory_region_unref(section->mr); > > >> +} > > >> + > > >> +static const MemoryListener vfio_ram_memory_listener =3D { > > >> + .region_add =3D vfio_ram_listener_region_add, > > >> + .region_del =3D vfio_ram_listener_region_del, > > >> +}; > > > > > > These are all spapr specific, please reflect that in the name; > > > vfio_spapr_v2_memory_listener, vfio_spapr_v2_listener_add/del. > >=20 > > ok. > >=20 > >=20 > > > Actually, can't we determine what type of IOMMU we have and make the > > > existing MemoryListener handle either type1 or spapr or spapr-v2? > >=20 > >=20 > > Sorry, I do not follow you here. How? The existing listener listens on = PCI=20 > > address space (at least, on pseries), new one listens on RAM address sp= ace=20 > > (address_space_memory). What do I miss? >=20 > Isn't that simply a difference of the address space the listener is > attached to? Type1 maps RAM, spapr-v1 maps guest IOMMU space and these > are already both handled by the same listener. I think what you're missing is that the spapr code now needs to listen on *both* the RAM and PCI address spaces. On RAM so it can do the preregistration, and on PCI so it can do the actual IOMMU mappings. What might make sense, although it might be better as a later cleanup is to bake into the common code the idea of two listeners - one for new RAM regions, one for new PCI mappings, with the actual actions for each case dependent on the IOMMU type. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --CEUtFxTsmBsHRLs3 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJVmx1TAAoJEGw4ysog2bOSaUYP/iqlnisufo/69VNZB49DlatQ GFnLZykdk8CyPZqjq5IzmQMHwT+7EXVOfBdSZLjTbVFCw5xDAPCjcmHbUb9ps7Up HE4u5mFiVlIPL2XoLGODNcDgCAjnt1kaOHNR3i5tUcrVMEDqg5K4ZcXg1yk6x2iS +S3a5pf5WJafet7mWbzoQsM9KLULSeTlscTcAmkVrjRmL3gLuY4vf/Oub+Jmm2/q Z8Evmgl0q5bHHtiGz2TDRBsqApGMkr/RdmSeTpkzR8Eo12BVKTY+D2ZyJNEZjnPS tsseuRoZ4v26WD7t+JMvQikON1Xq5AgOMxLacJ/j9Sj/jXFmBD74TL50I49znMY4 9IA4vXXrBrFDfFCxxTpQUC0/B2I/aBnQRterUt78tYVXYa+QITeCRIlm5R2ITfbj mxNs0k8EV1DMgY6HwslnIZ+hj/GKLu2Rx03jyr4tp3Ya9+w9cB1jycRYDQG2JksN 8QoD/zi3YrqMGusc6wVeQS0ZOStTAqHbea457Dbl93j6l/BjcezWdpiqWZLsSREw IoRe+bT0sxdtsBDLeipOAKtU9oAwUv+BNk/Ijz5MtPwEBSkos/zg1dOK0acsXKAw 4vYagh3p1g1xL5Lvur7jMmDl55gtCS/xBpVuJyWsT7MSNCPaaLdq4vYynPwCjvP7 yuv4g7WgbsHPlmDVoPq9 =JdWg -----END PGP SIGNATURE----- --CEUtFxTsmBsHRLs3--