From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49026) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1at5rS-0003as-CS for qemu-devel@nongnu.org; Thu, 21 Apr 2016 00:05:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1at5rQ-0005KL-ON for qemu-devel@nongnu.org; Thu, 21 Apr 2016 00:05:02 -0400 Date: Thu, 21 Apr 2016 14:03:59 +1000 From: David Gibson Message-ID: <20160421040359.GI1133@voom> References: <1459762426-18440-1-git-send-email-aik@ozlabs.ru> <1459762426-18440-17-git-send-email-aik@ozlabs.ru> <20160407011024.GG16485@voom.fritz.box> <57174F4D.9010003@ozlabs.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="kH8JNVvasRCCW1Oz" Content-Disposition: inline In-Reply-To: <57174F4D.9010003@ozlabs.ru> Subject: Re: [Qemu-devel] [PATCH qemu v15 16/17] vfio/spapr: Use VFIO_SPAPR_TCE_v2_IOMMU List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: qemu-devel@nongnu.org, qemu-ppc@nongnu.org, Alex Williamson , Alexander Graf --kH8JNVvasRCCW1Oz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Apr 20, 2016 at 07:43:41PM +1000, Alexey Kardashevskiy wrote: > On 04/07/2016 11:10 AM, David Gibson wrote: > >Subject doesn't seem quite right, since you added at least minimal > >support for the SPAPRv2 IOMMU in the prereg patch. > > > >On Mon, Apr 04, 2016 at 07:33:45PM +1000, Alexey Kardashevskiy wrote: > >>New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management. > >>This adds ability to VFIO common code to dynamically allocate/remove > >>DMA windows in the host kernel when new VFIO container is added/removed. > >> > >>This adds VFIO_IOMMU_SPAPR_TCE_CREATE ioctl to vfio_listener_region_add > >>and adds just created IOMMU into the host IOMMU list; the opposite > >>action is taken in vfio_listener_region_del. > >> > >>When creating a new window, this uses euristic to decide on the TCE tab= le > >>levels number. > > > >"heuristic" has an 'h' (yes, English spelling is stupid[0]). > > > >[0] The historical reasons for that are kind of fascinating, though. >=20 > Tried googling, could not spot quickly the reasoning, any hints what to > google for? Or just a link with an explanation? :) That wasn't a comment about heuristic specifically, but english spelling in general. I got most of the fascinating stuff I've seen =66rom: http://www.amazon.com/Spell-Out-Enthralling-Extraordinary-Spelling/dp/12500= 56128/ref=3Dla_B000AP940C_1_5/189-2461131-2789630?s=3Dbooks&ie=3DUTF8&qid= =3D1461211129&sr=3D1-5 > >>This should cause no guest visible change in behavior. > >> > >>Signed-off-by: Alexey Kardashevskiy > >>--- > >>Changes: > >>v14: > >>* new to the series > >> > >>--- > >>TODO: > >>* export levels to PHB > >>--- > >> hw/vfio/common.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++= ++++----- > >> trace-events | 2 + > >> 2 files changed, 107 insertions(+), 10 deletions(-) > >> > >>diff --git a/hw/vfio/common.c b/hw/vfio/common.c > >>index 5e5b77c..57a51df 100644 > >>--- a/hw/vfio/common.c > >>+++ b/hw/vfio/common.c > >>@@ -279,6 +279,14 @@ static int vfio_host_win_add(VFIOContainer *contai= ner, > >> return 0; > >> } > >> > >>+static void vfio_host_win_del(VFIOContainer *container, hwaddr min_iov= a) > >>+{ > >>+ VFIOHostDMAWindow *hostwin =3D vfio_host_win_lookup(container, min= _iova, 1); > >>+ > >>+ g_assert(hostwin); > >>+ QLIST_REMOVE(hostwin, hiommu_next); > >>+} > >>+ > >> static bool vfio_listener_skipped_section(MemoryRegionSection *sectio= n) > >> { > >> return (!memory_region_is_ram(section->mr) && > >>@@ -392,6 +400,63 @@ static void vfio_listener_region_add(MemoryListene= r *listener, > >> } > >> end =3D int128_get64(llend); > >> > >>+ if (container->iommu_type =3D=3D VFIO_SPAPR_TCE_v2_IOMMU) { > > > >I think the "add region" path could do with being split out into a > >different function - vfio_listener_region_add() is getting pretty > >huge. >=20 > It is big but not huge and I am trying to avoid having functions with > "spapr" in their names in common.c as once they appear, we will start hav= ing > a discussion if they should move to a separate file and if they do, then = may > be some other code should too, etc... I don't think it needs to be a "spapr" named function. The spapr backend is the only one to support it for now, but other iommus could support dynamic windows in future (for example, if I ever get time to implement the "type2" converged interface I was thinking about, I'd look to include that). > >>+ unsigned entries, pages, pagesize =3D qemu_real_host_page_size; > >>+ struct vfio_iommu_spapr_tce_create create =3D { .argsz =3D siz= eof(create) }; > >>+ > >>+ trace_vfio_listener_region_add_iommu(iova, end - 1); > >>+ if (section->mr->iommu_ops) { > >>+ pagesize =3D section->mr->iommu_ops->get_page_sizes(sectio= n->mr); > > > >Since you're querying the guest IOMMU here, I assume pagesize is > >supposed to represent *guest* IOMMU pagesizes, in which case it should > >default to TARGET_PAGE_SIZE, instead of qemu_real_host_page_size. > >(didn't you already have a function which implemented that fallback?) >=20 > Yes, will use the helper. >=20 >=20 > >>+ } > >>+ /* > >>+ * FIXME: For VFIO iommu types which have KVM acceleration to > >>+ * avoid bouncing all map/unmaps through qemu this way, this > >>+ * would be the right place to wire that up (tell the KVM > >>+ * device emulation the VFIO iommu handles to use). > >>+ */ > >>+ create.window_size =3D int128_get64(section->size); > >>+ create.page_shift =3D ctz64(pagesize); > >>+ /* > >>+ * SPAPR host supports multilevel TCE tables, there is some > >>+ * euristic to decide how many levels we want for our table: > > > >s/some euristic/a heuristic/ > > > >>+ * 0..64 =3D 1; 65..4096 =3D 2; 4097..262144 =3D 3; 262145.. = =3D 4 > >>+ */ > >>+ entries =3D create.window_size >> create.page_shift; > >>+ pages =3D (entries * sizeof(uint64_t)) / getpagesize(); > >>+ create.levels =3D ctz64(pow2ceil(pages) - 1) / 6 + 1; > >>+ > >>+ if (vfio_host_win_lookup(container, create.start_addr, > >>+ create.start_addr + create.window_siz= e - 1)) { > >>+ goto fail; > > > >Hmm.. if you successfully look up a host window, it seems to me you > >shouldn't fail, but in fact don't even need to create a new window > >(the removal path gets harder though, because you need to check if any > >guest window requires that host window). >=20 >=20 > At the moment if the window is there, it is failure in the environment I = am > testing it in. And, having a host kernel which cannot allocate and map > windows randomly, it is unlikely that I'll have a setup where this spot > won't mean that something went wrong. x86 case needs lot more than this > anyway. Hm, I suppose so. >=20 >=20 > >Requiring that the host windows exactly match the guest windows is > >probably ok for a first version - except that in that case any overlap > >should cause a failure, not just a complete inclusion. >=20 > Yes, for now this should fail too, I'll fix it. >=20 >=20 > > > >>+ } > >>+ > >>+ ret =3D ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_CREATE, &cre= ate); > >>+ if (ret) { > >>+ error_report("Failed to create a window, ret =3D %d (%m)",= ret); > >>+ goto fail; > >>+ } > >>+ > >>+ if (create.start_addr !=3D section->offset_within_address_spac= e) { > >>+ struct vfio_iommu_spapr_tce_remove remove =3D { > >>+ .argsz =3D sizeof(remove), > >>+ .start_addr =3D create.start_addr > >>+ }; > >>+ error_report("Host doesn't support DMA window at %"HWADDR_= PRIx", must be %"PRIx64, > >>+ section->offset_within_address_space, > >>+ create.start_addr); > >>+ ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove); > >>+ ret =3D -EINVAL; > >>+ goto fail; > >>+ } > >>+ trace_vfio_spapr_create_window(create.page_shift, > >>+ create.window_size, > >>+ create.start_addr); > >>+ > >>+ vfio_host_win_add(container, create.start_addr, > >>+ create.start_addr + create.window_size - 1, > >>+ 1ULL << create.page_shift); > >>+ } > >>+ > >> if (!vfio_host_win_lookup(container, iova, end - 1)) { > >> error_report("vfio: IOMMU container %p can't map guest IOVA r= egion" > >> " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, > >>@@ -525,6 +590,22 @@ static void vfio_listener_region_del(MemoryListene= r *listener, > >> container, iova, end - iova, ret); > >> } > >> > >>+ if (container->iommu_type =3D=3D VFIO_SPAPR_TCE_v2_IOMMU) { > >>+ struct vfio_iommu_spapr_tce_remove remove =3D { > >>+ .argsz =3D sizeof(remove), > >>+ .start_addr =3D section->offset_within_address_space, > >>+ }; > >>+ ret =3D ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_REMOVE, &rem= ove); > >>+ if (ret) { > >>+ error_report("Failed to remove window at %"PRIx64, > >>+ remove.start_addr); > >>+ } > >>+ > >>+ vfio_host_win_del(container, section->offset_within_address_sp= ace); > >>+ > >>+ trace_vfio_spapr_remove_window(remove.start_addr); > >>+ } > >>+ > >> if (iommu && iommu->iommu_ops && iommu->iommu_ops->vfio_stop) { > >> iommu->iommu_ops->vfio_stop(section->mr); > >> } > >>@@ -915,11 +996,6 @@ static int vfio_connect_container(VFIOGroup *group= , AddressSpace *as) > >> } > >> } > >> > >>- /* > >>- * This only considers the host IOMMU's 32-bit window. At > >>- * some point we need to add support for the optional 64-bit > >>- * window and dynamic windows > >>- */ > >> info.argsz =3D sizeof(info); > >> ret =3D ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info); > >> if (ret) { > >>@@ -928,11 +1004,30 @@ static int vfio_connect_container(VFIOGroup *gro= up, AddressSpace *as) > >> goto listener_release_exit; > >> } > >> > >>- /* The default table uses 4K pages */ > >>- vfio_host_win_add(container, info.dma32_window_start, > >>- info.dma32_window_start + > >>- info.dma32_window_size - 1, > >>- 0x1000); > >>+ if (v2) { > >>+ /* > >>+ * There is a default window in just created container. > >>+ * To make region_add/del simpler, we better remove this > >>+ * window now and let those iommu_listener callbacks > >>+ * create/remove them when needed. > >>+ */ > >>+ struct vfio_iommu_spapr_tce_remove remove =3D { > >>+ .argsz =3D sizeof(remove), > >>+ .start_addr =3D info.dma32_window_start, > >>+ }; > >>+ ret =3D ioctl(fd, VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove); > >>+ if (ret) { > >>+ error_report("vfio: VFIO_IOMMU_SPAPR_TCE_REMOVE failed= : %m"); > >>+ ret =3D -errno; > >>+ goto free_container_exit; > >>+ } > >>+ } else { > >>+ /* The default table uses 4K pages */ > >>+ vfio_host_win_add(container, info.dma32_window_start, > >>+ info.dma32_window_start + > >>+ info.dma32_window_size - 1, > >>+ 0x1000); > >>+ } > >> } else { > >> error_report("vfio: No available IOMMU models"); > >> ret =3D -EINVAL; > >>diff --git a/trace-events b/trace-events > >>index 23ca0b9..5c651fa 100644 > >>--- a/trace-events > >>+++ b/trace-events > >>@@ -1738,6 +1738,8 @@ vfio_region_finalize(const char *name, int index)= "Device %s, region %d" > >> vfio_region_mmaps_set_enabled(const char *name, bool enabled) "Region= %s mmaps enabled: %d" > >> vfio_ram_register(uint64_t va, uint64_t size, int ret) "va=3D%"PRIx64= " size=3D%"PRIx64" ret=3D%d" > >> vfio_ram_unregister(uint64_t va, uint64_t size, int ret) "va=3D%"PRIx= 64" size=3D%"PRIx64" ret=3D%d" > >>+vfio_spapr_create_window(int ps, uint64_t ws, uint64_t off) "pageshift= =3D0x%x winsize=3D0x%"PRIx64" offset=3D0x%"PRIx64 > >>+vfio_spapr_remove_window(uint64_t off) "offset=3D%"PRIx64 > >> > >> # hw/vfio/platform.c > >> vfio_platform_base_device_init(char *name, int groupid) "%s belongs t= o group #%d" > > >=20 >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --kH8JNVvasRCCW1Oz Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJXGFEvAAoJEGw4ysog2bOSm0sP/0angPGBzXWdDHCrvEBwo7FF SNAi1TufJ27QCP9x8xhenlCYJxECDZrc2HS4CPM7Uc8QddCFq0w75r+IEtOe4aPA XzaYoKuYJle65shYa5Pjmi8so+G4pvIzRDVIakFlgEcFAzQu3Z0u/ct0JzSRc88v st1LOrUCqibseiHkjAfCEH6mmjEUxeQhZeZ5nKBWpEzLt0QVYXh5KphcrbeS+wOc OH1rLro0BeEaVLgx5kECan3UDvEujYpk3DIJqcC2kMJmxsulKiWla1SPV6V4FGy6 LB3laF+f+scJxI1aD3ajFamr3crctL+0VEWcvoDGiWhcT7gnIixSW9lIq80kBEnz GeoKzzg3zSHyb43uRU4BFWieto0he6M0nx6Aq0FNPPqbsC1FnDaMHkc9MmnpNMNM XA3qPeGYeYEfL5wsw3IwOqvG9V724xUL8OqQIrtW1u+RvoiFyhmmUi0vxDPdZRI9 1E+fBjp4KHZgJqQqU70kOQq2oMY1UKHQmmZV9H7f6KSmtHZmwzfdujMUsWwfd7rx UJMBGg+FndZLumJ9Fg1cMmdgHhJyg/WSMvYWRrfB47f8rNK7/1Z/8euhS5gTYW+s rjtoMQTjJE+XHRoCNS6vWUItwLbOywMOPO7dJtSoOggRucNVA4O4WD8N8ek0LGB8 p3/wRtm+VC+6q9M9zpyT =MGia -----END PGP SIGNATURE----- --kH8JNVvasRCCW1Oz--