From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42564) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YfiOY-0000r1-LR for qemu-devel@nongnu.org; Wed, 08 Apr 2015 01:19:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YfiOV-0000Il-BV for qemu-devel@nongnu.org; Wed, 08 Apr 2015 01:19:22 -0400 Date: Wed, 8 Apr 2015 15:11:00 +1000 From: David Gibson Message-ID: <20150408051100.GJ28909@voom.redhat.com> References: <1427779727-13353-1-git-send-email-aik@ozlabs.ru> <1427779727-13353-7-git-send-email-aik@ozlabs.ru> <20150408021528.GD28909@voom.redhat.com> <5524A908.2020108@ozlabs.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="IJFRpmOek+ZRSQoz" Content-Disposition: inline In-Reply-To: <5524A908.2020108@ozlabs.ru> Subject: Re: [Qemu-devel] [PATCH qemu v5 06/12] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: Alex Williamson , qemu-ppc@nongnu.org, qemu-devel@nongnu.org, Alexander Graf --IJFRpmOek+ZRSQoz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Apr 08, 2015 at 02:05:28PM +1000, Alexey Kardashevskiy wrote: > On 04/08/2015 12:15 PM, David Gibson wrote: > >On Tue, Mar 31, 2015 at 04:28:41PM +1100, Alexey Kardashevskiy wrote: > >>This makes use of the new "memory registering" feature. The idea is > >>to provide the guest ability to notify the host kernel about pages which > > > >AFAICT it's not really the guest informing the host, just qemu > >informing the host. If I'm reading the code correctly, qemu registers > >all RAM regionns, without guest intervention. >=20 > Right. Wrong commit log, I'll fix it. >=20 >=20 > >>are going to be used for DMA. Having this information, the host kernel > >>can pin them all once per user process, do locked pages accounting (onc= e) > >>and not spent time on doing that in real time with possible failures > >>which cannot be handled nicely in some cases. > >> > >>This adds a guest RAM memory listener which notifies a VFIO container > >>about memory which needs to be pinned/unpinned. VFIO MMIO regions > >>(i.e. "skip dump" regions) are skipped. > >> > >>The feature is only enabled for SPAPR IOMMU v2. The host kernel changes > >>are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this do= es > >>not call it when v2 is detected and enabled. > > > >So I'm clear, the v2 just represents a new userspace<->host kernel > >interface for controlling the IOMMU, doesn't it? It doesn't change > >the guest visible IOMMU interface and doesn't represent different > >IOMMU hardware? >=20 >=20 > Exactly true. >=20 >=20 > >>Signed-off-by: Alexey Kardashevskiy > >>--- > >>Changes: > >>v5: > >>* simplified the patch > >>* added trace points > >>* added round_up() for the size > >>* SPAPR IOMMU v2 used > >>--- > >> hw/vfio/common.c | 26 +++++++++----- > >> hw/vfio/spapr.c | 79 ++++++++++++++++++++++++++++++++++= +++++++-- > >> include/hw/vfio/vfio-common.h | 5 ++- > >> trace-events | 1 + > >> 4 files changed, 100 insertions(+), 11 deletions(-) > >> > >>diff --git a/hw/vfio/common.c b/hw/vfio/common.c > >>index a71f881..e35e478 100644 > >>--- a/hw/vfio/common.c > >>+++ b/hw/vfio/common.c > >>@@ -577,14 +577,18 @@ static int vfio_connect_container(VFIOGroup *grou= p, AddressSpace *as) > >> > >> container->iommu_data.type1.initialized =3D true; > >> > >>- } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) { > >>+ } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) || > >>+ ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU= )) { > >>+ bool v2 =3D !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v= 2_IOMMU); > >>+ > >> ret =3D ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd); > >> if (ret) { > >> error_report("vfio: failed to set group container: %m"); > >> ret =3D -errno; > >> goto free_container_exit; > >> } > >>- ret =3D ioctl(fd, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU); > >>+ ret =3D ioctl(fd, VFIO_SET_IOMMU, > >>+ v2 ? VFIO_SPAPR_TCE_v2_IOMMU : VFIO_SPAPR_TCE_IOMMU); > >> if (ret) { > >> error_report("vfio: failed to set iommu for container: %m= "); > >> ret =3D -errno; > >>@@ -596,14 +600,20 @@ static int vfio_connect_container(VFIOGroup *grou= p, AddressSpace *as) > >> * when container fd is closed so we do not call it explicitly > >> * in this file. > >> */ > >>- ret =3D ioctl(fd, VFIO_IOMMU_ENABLE); > >>- if (ret) { > >>- error_report("vfio: failed to enable container: %m"); > >>- ret =3D -errno; > >>- goto free_container_exit; > >>+ if (!v2) { > >>+ ret =3D ioctl(fd, VFIO_IOMMU_ENABLE); > >>+ if (ret) { > >>+ error_report("vfio: failed to enable container: %m"); > >>+ ret =3D -errno; > >>+ goto free_container_exit; > >>+ } > >> } > >> > >>- spapr_memory_listener_register(container); > >>+ ret =3D spapr_memory_listener_register(container, v2 ? 2 : 1); > >>+ if (ret) { > >>+ error_report("vfio: RAM memory listener initialization fai= led for container"); > >>+ goto listener_release_exit; > >>+ } > >> > >> } else { > >> error_report("vfio: No available IOMMU models"); > >>diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c > >>index 5f79194..a670907 100644 > >>--- a/hw/vfio/spapr.c > >>+++ b/hw/vfio/spapr.c > >>@@ -17,6 +17,9 @@ > >> * along with this program; if not, see . > >> */ > >> > >>+#include > >>+#include > >>+ > >> #include "hw/vfio/vfio-common.h" > >> #include "qemu/error-report.h" > >> #include "trace.h" > >>@@ -211,16 +214,88 @@ static const MemoryListener vfio_spapr_memory_lis= tener =3D { > >> .region_del =3D vfio_spapr_listener_region_del, > >> }; > >> > >>+static void vfio_ram_do_region(VFIOContainer *container, > >>+ MemoryRegionSection *section, unsigned l= ong req) > >>+{ > >>+ int ret; > >>+ struct vfio_iommu_spapr_register_memory reg =3D { .argsz =3D sizeo= f(reg) }; > >>+ > >>+ if (!memory_region_is_ram(section->mr) || > >>+ memory_region_is_skip_dump(section->mr)) { > >>+ return; > >>+ } > >>+ > >>+ reg.vaddr =3D (__u64) memory_region_get_ram_ptr(section->mr) + > >>+ section->offset_within_region; > > > >Is section->offset_within_region always page aligned? >=20 >=20 > I think so as it is RAM. qemu_ram_alloc_from_file() and > qemu_ram_alloc_internal() (called from memory_region_init_ram()) align. Ok. It just seems a but odd to me that offset is guaranteed page aligned, but size is not. An assert here might help to document the assumptions you're making. > >>+ reg.size =3D ROUND_UP(int128_get64(section->size), TARGET_PAGE_SIZ= E); > >>+ > >>+ ret =3D ioctl(container->fd, req, ®); > >>+ trace_vfio_ram_register(_IOC_NR(req) - VFIO_BASE, reg.vaddr, reg.s= ize, ret); > >>+ > >>+ /* > >>+ * On the initfn path, store the first error in the container so we > >>+ * can gracefully fail. Runtime, there's not much we can do other > >>+ * than throw a hardware error. > >>+ */ > >>+ if (!container->iommu_data.spapr.ram_reg_initialized) { > >>+ if (!container->iommu_data.spapr.ram_reg_error) { > >>+ container->iommu_data.spapr.ram_reg_error =3D ret; > >>+ } > > > >This is pretty clunky, but I don't immediately see a better way. > > > >Also.. won't the return value of ioctl() just be -1 on error, which > >won't tell you much. Do you want to store errno, instead? >=20 > Yes, I do, I'll fix it. Thanks. >=20 >=20 > >>+ } else { > >>+ hw_error("vfio: RAM registering failed, unable to continue"); > >>+ } > >>+} > >>+ > >>+static void vfio_spapr_ram_listener_region_add(MemoryListener *listene= r, > >>+ MemoryRegionSection *se= ction) > >>+{ > >>+ VFIOContainer *container =3D container_of(listener, VFIOContainer, > >>+ iommu_data.spapr.ramlisten= er); > >>+ memory_region_ref(section->mr); > >>+ vfio_ram_do_region(container, section, VFIO_IOMMU_SPAPR_REGISTER_M= EMORY); > >>+} > >>+ > >>+static void vfio_spapr_ram_listener_region_del(MemoryListener *listene= r, > >>+ MemoryRegionSection *se= ction) > >>+{ > >>+ VFIOContainer *container =3D container_of(listener, VFIOContainer, > >>+ iommu_data.spapr.ramlisten= er); > >>+ memory_region_unref(section->mr); > >>+ vfio_ram_do_region(container, section, VFIO_IOMMU_SPAPR_UNREGISTER= _MEMORY); > >>+} > >>+ > >>+static const MemoryListener vfio_spapr_ram_memory_listener =3D { > >>+ .region_add =3D vfio_spapr_ram_listener_region_add, > >>+ .region_del =3D vfio_spapr_ram_listener_region_del, > >>+}; > >>+ > >> static void vfio_spapr_listener_release(VFIOContainer *container) > >> { > >> memory_listener_unregister(&container->iommu_data.spapr.listener); > >> } > >> > >>-void spapr_memory_listener_register(VFIOContainer *container) > >>+static void vfio_spapr_listener_release_v2(VFIOContainer *container) > >>+{ > >>+ memory_listener_unregister(&container->iommu_data.spapr.listener); > >>+ vfio_spapr_listener_release(container); > >>+} > >>+ > >>+int spapr_memory_listener_register(VFIOContainer *container, int ver) > >> { > >> container->iommu_data.spapr.listener =3D vfio_spapr_memory_listen= er; > >> container->iommu_data.release =3D vfio_spapr_listener_release; > >>- > >> memory_listener_register(&container->iommu_data.spapr.listener, > >> container->space->as); > >>+ if (ver < 2) { > >>+ return 0; > >>+ } > >>+ > >>+ container->iommu_data.spapr.ramlistener =3D vfio_spapr_ram_memory_= listener; > >>+ container->iommu_data.release =3D vfio_spapr_listener_release_v2; > >>+ memory_listener_register(&container->iommu_data.spapr.ramlistener, > >>+ &address_space_memory); > >>+ > >>+ container->iommu_data.spapr.ram_reg_initialized =3D true; > >>+ > >>+ return container->iommu_data.spapr.ram_reg_error; > >> } > >>diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-commo= n.h > >>index d0b831c..b5ef446 100644 > >>--- a/include/hw/vfio/vfio-common.h > >>+++ b/include/hw/vfio/vfio-common.h > >>@@ -71,6 +71,9 @@ typedef struct VFIOType1 { > >> > >> typedef struct VFIOSPAPR { > >> MemoryListener listener; > >>+ MemoryListener ramlistener; > >>+ int ram_reg_error; > >>+ bool ram_reg_initialized; > >> } VFIOSPAPR; > >> > >> typedef struct VFIOContainer { > >>@@ -156,6 +159,6 @@ extern int vfio_dma_unmap(VFIOContainer *container, > >> hwaddr iova, ram_addr_t size); > >> bool vfio_listener_skipped_section(MemoryRegionSection *section); > >> > >>-extern void spapr_memory_listener_register(VFIOContainer *container); > >>+extern int spapr_memory_listener_register(VFIOContainer *container, in= t ver); > >> > >> #endif /* !HW_VFIO_VFIO_COMMON_H */ > >>diff --git a/trace-events b/trace-events > >>index 1231ba4..2739140 100644 > >>--- a/trace-events > >>+++ b/trace-events > >>@@ -1563,6 +1563,7 @@ vfio_disconnect_container(int fd) "close containe= r->fd=3D%d" > >> vfio_put_group(int fd) "close group->fd=3D%d" > >> vfio_get_device(const char * name, unsigned int flags, unsigned int n= um_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs:= %u" > >> vfio_put_base_device(int fd) "close vdev->fd=3D%d" > >>+vfio_ram_register(int req, uint64_t va, uint64_t size, int ret) "req= =3D%d va=3D%"PRIx64" size=3D%"PRIx64" ret=3D%d" > >> > >> #hw/acpi/memory_hotplug.c > >> mhp_acpi_invalid_slot_selected(uint32_t slot) "0x%"PRIx32 > > >=20 >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --IJFRpmOek+ZRSQoz Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJVJLhkAAoJEGw4ysog2bOSrNEQAKjoPH02oMgO7mIJZjkboRJs hjc5IrXmKxQYjmysxuveFYJ5OMnEq9p3GufY6SWRfGKswRYjdPOSpO9UV+0TFNQq /2beEvya0hbmojm2nyIGsXcF7pEjyNhc7Qpu1kzn0hbJex5yCTlhJJ+dDo78YQLg 01lwaD4T0QsY7k/juf7JdEb2ZOTQgNTYDEKi2T3BywkznIxWe+xH2JR26Y0iCaCP 2e6hGkiVgxrRUunzC9RCwaZhLeZ4T9G1UvDLTGJ5W+oN30e4F2l+m5hXTbQev76X LSoh5vXcvFTyqCiOTfIRWv5N+4ZRTs0aihN0t/sq9T8dLesILVu41Fx345P0sbT9 bGg5/mUX9z09cWW3F09jguMxavT3mYmHuvOxuv69iqLEY96Pl8JgJNs2nESezwl+ 2k7nGaiMHcPxTI9zl4oXyHckgHVtbm1cjWaiUeitJa4BgPB4kbcuFPitMgLJXOTs IVCuvpn3/2EB1TPfPyJzsU/PmyUgCG2BBfelrZNkQ9PrBadRhoQqRUgmMyJ0La71 ZE2u3i9qljqn6qstw//yOL/NRNiVDJpdZmKipw/IhAX7AUWmllaifvrqkjUuX32m LF8b69zZ3eso4ea5XCP/cbZATwLPxwAGMSmx0tZlMPG41nK+e3KLCTUlLRuPvR0i 4eZ0ASZNrPqc6Xm7+VYp =xSOL -----END PGP SIGNATURE----- --IJFRpmOek+ZRSQoz--