From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ozlabs.org (ozlabs.org [IPv6:2401:3900:2:1::2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3qPPpq5mL2zDq7K for ; Tue, 15 Mar 2016 17:29:23 +1100 (AEDT) Date: Tue, 15 Mar 2016 17:04:00 +1100 From: David Gibson To: Alexey Kardashevskiy Cc: linuxppc-dev@lists.ozlabs.org, Paul Mackerras , Alex Williamson , kvm-ppc@vger.kernel.org, kvm@vger.kernel.org Subject: Re: [PATCH kernel 9/9] KVM: PPC: VFIO device: support SPAPR TCE Message-ID: <20160315060400.GK15272@voom.fritz.box> References: <1457322077-26640-1-git-send-email-aik@ozlabs.ru> <1457322077-26640-10-git-send-email-aik@ozlabs.ru> <20160309054544.GM22546@voom.fritz.box> <56DFEACC.7030508@ozlabs.ru> <20160310052108.GV22546@voom.fritz.box> <56E1FEBE.5090602@ozlabs.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="7AwgMNpd3VkAVXjS" In-Reply-To: <56E1FEBE.5090602@ozlabs.ru> List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , --7AwgMNpd3VkAVXjS Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Mar 11, 2016 at 10:09:50AM +1100, Alexey Kardashevskiy wrote: > On 03/10/2016 04:21 PM, David Gibson wrote: > >On Wed, Mar 09, 2016 at 08:20:12PM +1100, Alexey Kardashevskiy wrote: > >>On 03/09/2016 04:45 PM, David Gibson wrote: > >>>On Mon, Mar 07, 2016 at 02:41:17PM +1100, Alexey Kardashevskiy wrote: > >>>>sPAPR TCE IOMMU is para-virtualized and the guest does map/unmap > >>>>via hypercalls which take a logical bus id (LIOBN) as a target IOMMU > >>>>identifier. LIOBNs are made up, advertised to guest systems and > >>>>linked to IOMMU groups by the user space. > >>>>In order to enable acceleration for IOMMU operations in KVM, we need > >>>>to tell KVM the information about the LIOBN-to-group mapping. > >>>> > >>>>For that, a new KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN parameter > >>>>is added which accepts: > >>>>- a VFIO group fd and IO base address to find the actual hardware > >>>>TCE table; > >>>>- a LIOBN to assign to the found table. > >>>> > >>>>Before notifying KVM about new link, this check the group for being > >>>>registered with KVM device in order to release them at unexpected KVM > >>>>finish. > >>>> > >>>>This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user > >>>>space. > >>>> > >>>>While we are here, this also fixes VFIO KVM device compiling to let it > >>>>link to a KVM module. > >>>> > >>>>Signed-off-by: Alexey Kardashevskiy > >>>>--- > >>>> Documentation/virtual/kvm/devices/vfio.txt | 21 +++++- > >>>> arch/powerpc/kvm/Kconfig | 1 + > >>>> arch/powerpc/kvm/Makefile | 5 +- > >>>> arch/powerpc/kvm/powerpc.c | 1 + > >>>> include/uapi/linux/kvm.h | 9 +++ > >>>> virt/kvm/vfio.c | 106 ++++++++++++++++++= +++++++++++ > >>>> 6 files changed, 140 insertions(+), 3 deletions(-) > >>>> > >>>>diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentati= on/virtual/kvm/devices/vfio.txt > >>>>index ef51740..c0d3eb7 100644 > >>>>--- a/Documentation/virtual/kvm/devices/vfio.txt > >>>>+++ b/Documentation/virtual/kvm/devices/vfio.txt > >>>>@@ -16,7 +16,24 @@ Groups: > >>>> > >>>> KVM_DEV_VFIO_GROUP attributes: > >>>> KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device track= ing > >>>>+ kvm_device_attr.addr points to an int32_t file descriptor > >>>>+ for the VFIO group. > >>> > >>>AFAICT these changes are accurate for VFIO as it is already, in which > >>>case it might be clearer to put them in a separate patch. > >>> > >>>> KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device = tracking > >>>>+ kvm_device_attr.addr points to an int32_t file descriptor > >>>>+ for the VFIO group. > >>>> > >>>>-For each, kvm_device_attr.addr points to an int32_t file descriptor > >>>>-for the VFIO group. > >>>>+ KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN: sets a liobn for a VFIO gr= oup > >>>>+ kvm_device_attr.addr points to a struct: > >>>>+ struct kvm_vfio_spapr_tce_liobn { > >>>>+ __u32 argsz; > >>>>+ __s32 fd; > >>>>+ __u32 liobn; > >>>>+ __u8 pad[4]; > >>>>+ __u64 start_addr; > >>>>+ }; > >>>>+ where > >>>>+ @argsz is the size of kvm_vfio_spapr_tce_liobn; > >>>>+ @fd is a file descriptor for a VFIO group; > >>>>+ @liobn is a logical bus id to be associated with the group; > >>>>+ @start_addr is a DMA window offset on the IO (PCI) bus > >>> > >>>For the cause of DDW and multiple windows, I'm assuming you can call > >>>this multiple times with different LIOBNs and the same IOMMU group? > >> > >> > >>Yes. It is called twice per each group (when DDW is activated) - for 32= bit > >>and 64bit windows, this is why @start_addr is there. > >> > >> > >>>>diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig > >>>>index 1059846..dfa3488 100644 > >>>>--- a/arch/powerpc/kvm/Kconfig > >>>>+++ b/arch/powerpc/kvm/Kconfig > >>>>@@ -65,6 +65,7 @@ config KVM_BOOK3S_64 > >>>> select KVM > >>>> select KVM_BOOK3S_PR_POSSIBLE if !KVM_BOOK3S_HV_POSSIBLE > >>>> select SPAPR_TCE_IOMMU if IOMMU_SUPPORT > >>>>+ select KVM_VFIO if VFIO > >>>> ---help--- > >>>> Support running unmodified book3s_64 and book3s_32 guest kernels > >>>> in virtual machines on book3s_64 host processors. > >>>>diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile > >>>>index 7f7b6d8..71f577c 100644 > >>>>--- a/arch/powerpc/kvm/Makefile > >>>>+++ b/arch/powerpc/kvm/Makefile > >>>>@@ -8,7 +8,7 @@ ccflags-y :=3D -Ivirt/kvm -Iarch/powerpc/kvm > >>>> KVM :=3D ../../../virt/kvm > >>>> > >>>> common-objs-y =3D $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ > >>>>- $(KVM)/eventfd.o $(KVM)/vfio.o > >>>>+ $(KVM)/eventfd.o > >>> > >>>Please don't disable the VFIO device for the non-book3s case. I added > >>>it (even though it didn't do anything until now) so that libvirt > >>>wouldn't choke when it finds it's not available. Obviously the new > >>>ioctl needs to be only for the right IOMMU setup, but the device > >>>itself should be available always. > >> > >>Ah. Ok, I'll fix this. I just wanted to be able to compile kvm as a mod= ule. > >> > >> > >>>> CFLAGS_e500_mmu.o :=3D -I. > >>>> CFLAGS_e500_mmu_host.o :=3D -I. > >>>>@@ -87,6 +87,9 @@ endif > >>>> kvm-book3s_64-objs-$(CONFIG_KVM_XICS) +=3D \ > >>>> book3s_xics.o > >>>> > >>>>+kvm-book3s_64-objs-$(CONFIG_KVM_VFIO) +=3D \ > >>>>+ $(KVM)/vfio.o \ > >>>>+ > >>>> kvm-book3s_64-module-objs +=3D \ > >>>> $(KVM)/kvm_main.o \ > >>>> $(KVM)/eventfd.o \ > >>>>diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > >>>>index 19aa59b..63f188d 100644 > >>>>--- a/arch/powerpc/kvm/powerpc.c > >>>>+++ b/arch/powerpc/kvm/powerpc.c > >>>>@@ -521,6 +521,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm,= long ext) > >>>> #ifdef CONFIG_PPC_BOOK3S_64 > >>>> case KVM_CAP_SPAPR_TCE: > >>>> case KVM_CAP_SPAPR_TCE_64: > >>>>+ case KVM_CAP_SPAPR_TCE_VFIO: > >>>> case KVM_CAP_PPC_ALLOC_HTAB: > >>>> case KVM_CAP_PPC_RTAS: > >>>> case KVM_CAP_PPC_FIXUP_HCALL: > >>>>diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > >>>>index 080ffbf..f1abbea 100644 > >>>>--- a/include/uapi/linux/kvm.h > >>>>+++ b/include/uapi/linux/kvm.h > >>>>@@ -1056,6 +1056,7 @@ struct kvm_device_attr { > >>>> #define KVM_DEV_VFIO_GROUP 1 > >>>> #define KVM_DEV_VFIO_GROUP_ADD 1 > >>>> #define KVM_DEV_VFIO_GROUP_DEL 2 > >>>>+#define KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN 3 > >>>> > >>>> enum kvm_device_type { > >>>> KVM_DEV_TYPE_FSL_MPIC_20 =3D 1, > >>>>@@ -1075,6 +1076,14 @@ enum kvm_device_type { > >>>> KVM_DEV_TYPE_MAX, > >>>> }; > >>>> > >>>>+struct kvm_vfio_spapr_tce_liobn { > >>>>+ __u32 argsz; > >>>>+ __s32 fd; > >>>>+ __u32 liobn; > >>>>+ __u8 pad[4]; > >>>>+ __u64 start_addr; > >>>>+}; > >>>>+ > >>>> /* > >>>> * ioctls for VM fds > >>>> */ > >>>>diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c > >>>>index 1dd087d..87c771e 100644 > >>>>--- a/virt/kvm/vfio.c > >>>>+++ b/virt/kvm/vfio.c > >>>>@@ -20,6 +20,10 @@ > >>>> #include > >>>> #include "vfio.h" > >>>> > >>>>+#ifdef CONFIG_SPAPR_TCE_IOMMU > >>>>+#include > >>>>+#endif > >>>>+ > >>>> struct kvm_vfio_group { > >>>> struct list_head node; > >>>> struct vfio_group *vfio_group; > >>>>@@ -60,6 +64,22 @@ static void kvm_vfio_group_put_external_user(struc= t vfio_group *vfio_group) > >>>> symbol_put(vfio_group_put_external_user); > >>>> } > >>>> > >>>>+static int kvm_vfio_external_user_iommu_id(struct vfio_group *vfio_g= roup) > >>>>+{ > >>>>+ int (*fn)(struct vfio_group *); > >>>>+ int ret =3D -1; > >>> > >>>Should this be -ESOMETHING? > >>> > >>>>+ fn =3D symbol_get(vfio_external_user_iommu_id); > >>>>+ if (!fn) > >>>>+ return ret; > >>>>+ > >>>>+ ret =3D fn(vfio_group); > >>>>+ > >>>>+ symbol_put(vfio_external_user_iommu_id); > >>>>+ > >>>>+ return ret; > >>>>+} > >>>>+ > >>>> static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_grou= p) > >>>> { > >>>> long (*fn)(struct vfio_group *, unsigned long); > >>>>@@ -110,6 +130,22 @@ static void kvm_vfio_update_coherency(struct kvm= _device *dev) > >>>> mutex_unlock(&kv->lock); > >>>> } > >>>> > >>>>+#ifdef CONFIG_SPAPR_TCE_IOMMU > >>>>+static void kvm_vfio_spapr_detach_iommu_group(struct kvm *kvm, > >>>>+ struct vfio_group *vfio_group) > >>> > >>> > >>>Shouldn't this go in the same patch that introduced the attach > >>>function? > >> > >>Having less patches which touch different maintainers areas is better. I > >>cannot avoid touching both PPC KVM and VFIO in this patch but I can in > >>"[PATCH kernel 6/9] KVM: PPC: Associate IOMMU group with guest view of = TCE > >>table". > >> > >> > >>> > >>>>+{ > >>>>+ int group_id; > >>>>+ struct iommu_group *grp; > >>>>+ > >>>>+ group_id =3D kvm_vfio_external_user_iommu_id(vfio_group); > >>>>+ grp =3D iommu_group_get_by_id(group_id); > >>>>+ if (grp) { > >>>>+ kvm_spapr_tce_detach_iommu_group(kvm, grp); > >>>>+ iommu_group_put(grp); > >>>>+ } > >>>>+} > >>>>+#endif > >>>>+ > >>>> static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u6= 4 arg) > >>>> { > >>>> struct kvm_vfio *kv =3D dev->private; > >>>>@@ -186,6 +222,10 @@ static int kvm_vfio_set_group(struct kvm_device = *dev, long attr, u64 arg) > >>>> continue; > >>>> > >>>> list_del(&kvg->node); > >>>>+#ifdef CONFIG_SPAPR_TCE_IOMMU > >>> > >>>Better to make a no-op version of the call than have to #ifdef at the > >>>callsite. > >> > >>It is questionable. A x86 reader may decide that > >>KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN is implemented for x86 and get > >>confused. > >> > >> > >>> > >>>>+ kvm_vfio_spapr_detach_iommu_group(dev->kvm, > >>>>+ kvg->vfio_group); > >>>>+#endif > >>>> kvm_vfio_group_put_external_user(kvg->vfio_group); > >>>> kfree(kvg); > >>>> ret =3D 0; > >>>>@@ -201,6 +241,69 @@ static int kvm_vfio_set_group(struct kvm_device = *dev, long attr, u64 arg) > >>>> kvm_vfio_update_coherency(dev); > >>>> > >>>> return ret; > >>>>+ > >>>>+#ifdef CONFIG_SPAPR_TCE_IOMMU > >>>>+ case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN: { > >>>>+ struct kvm_vfio_spapr_tce_liobn param; > >>>>+ unsigned long minsz; > >>>>+ struct kvm_vfio *kv =3D dev->private; > >>>>+ struct vfio_group *vfio_group; > >>>>+ struct kvm_vfio_group *kvg; > >>>>+ struct fd f; > >>>>+ > >>>>+ minsz =3D offsetofend(struct kvm_vfio_spapr_tce_liobn, > >>>>+ start_addr); > >>>>+ > >>>>+ if (copy_from_user(¶m, (void __user *)arg, minsz)) > >>>>+ return -EFAULT; > >>>>+ > >>>>+ if (param.argsz < minsz) > >>>>+ return -EINVAL; > >>>>+ > >>>>+ f =3D fdget(param.fd); > >>>>+ if (!f.file) > >>>>+ return -EBADF; > >>>>+ > >>>>+ vfio_group =3D kvm_vfio_group_get_external_user(f.file); > >>>>+ fdput(f); > >>>>+ > >>>>+ if (IS_ERR(vfio_group)) > >>>>+ return PTR_ERR(vfio_group); > >>>>+ > >>>>+ ret =3D -ENOENT; > >>> > >>>Shouldn't there be some runtime test for the type of the IOMMU? It's > >>>possible a kernel could be built for a platform supporting multiple > >>>IOMMU types. > >> > >>Well, may make sense but I do not know to test that. The IOMMU type is a > >>VFIO container property, not a group property and here (KVM) we only ha= ve > >>groups. > > > >Which, as mentioned previously, is broken. >=20 > Which I am failing to follow you on this. >=20 > What I am trying to achieve here is pretty much referencing a group so it > cannot be reused. Plus LIOBNs. "Plus LIOBNs" is not a trivial change. You are establishing a linkage =66rom LIOBNs to groups. But that doesn't make sense; if mapping in one (guest) LIOBN affects a group it must affect all groups in the container. i.e. LIOBN->container is the natural mapping, *not* LIOBN to group. > Passing a container fd does not make much > sense here as the VFIO device would walk through groups, reference them a= nd > that is it, there is no locking on VFIO containters and so far there was = no > need to teach KVM about containers. >=20 > What do I miss now? Referencing the groups is essentially just a useful side effect. The important functionality is informing VFIO of the guest LIOBNs; and LIOBNs map to containers, not groups. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --7AwgMNpd3VkAVXjS Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJW56XQAAoJEGw4ysog2bOSOWAQALAVeBIWtoxFNZwk/Lndgh85 t1xLKYD57AMCyVotgl4lCDHK4GpC6pglSdN4K14gF5dho6Z9FJHkW6G2bvqilWTs HA7ItPS4ZIoS3KHudX27XhQWtI/Y1hWz2M910/OJrOQzGPrwb5HPbzj/YXBca4MG JZSsXP456ygEYbokoUVd4JH6TvYNp+OIhXl3IBydz0hiFEUGd0JYPF5C3GIUiQkE kf99BkfoNjhp8svpKNUcesFqyzlawMKrsv3O8n/qit89b4wwgiNX/B6WQ4CBXxdW n2QUxerYpTs9H6YyX9kHWULMaEQCe0QkItaT24EnEz5UkLsR++2rsfPpQ1ycjrcs Z80IBmtS1Ur91IX8WQiM75Af7ewofJJCiwhu3TwrxakDQQMx4RLFP//lQUKAuCtK 07OHcB2Rhj/OBbumfKATspfhh1xeqTgp0RQPMNEjEi09ahaNcjtkTilRsdRv2Y8v QLljrtMCgPhGQG5vOd8zFfzIggIZ8jWOl7f4c7+wiJAfuIMLuyvN9FInUcl9bQP1 94Ryy2CpP1ewQm1ET63M0hG6ObuEDhhYZps/vov1ENtqEOrlxOxTM8pJ4RybeWn6 Ut4Y2FJL5UMr7q54muBbH6a+6V3k8Vc5JqsRN8uxPex4SnR086IOvejHZ9MZXi/z dmh1aL+NLwmZr4ArEckC =QQ+O -----END PGP SIGNATURE----- --7AwgMNpd3VkAVXjS--