From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D028EC10F0E for ; Fri, 12 Apr 2019 06:34:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9509020643 for ; Fri, 12 Apr 2019 06:34:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.b="BwEOfXJ1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726923AbfDLGeR (ORCPT ); Fri, 12 Apr 2019 02:34:17 -0400 Received: from bilbo.ozlabs.org ([203.11.71.1]:58347 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726720AbfDLGeR (ORCPT ); Fri, 12 Apr 2019 02:34:17 -0400 Received: by ozlabs.org (Postfix, from userid 1007) id 44gSm52jhnz9sBr; Fri, 12 Apr 2019 16:34:13 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gibson.dropbear.id.au; s=201602; t=1555050853; bh=u0Xx5CeVSTrgxXcJg7O2WAWi68RPG8H7M6FQFddFOlM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=BwEOfXJ1294w9m6yxQE3oyO1muD2Q475FX78LOxuy5D3KcdGU8LLrYaxbyYLgBfnr 1vt+N5Xwziris30EpwCA7ZDtxZhhRgUdCxM1hzj8WQVL4zDvkjAp6QmC6w2AS+iafn oCZOVBE4mX1VfzhuXLUXVeUo0k5kC9Qx4OukSFdA= Date: Fri, 12 Apr 2019 16:34:03 +1000 From: David Gibson To: =?iso-8859-1?Q?C=E9dric?= Le Goater Cc: kvm-ppc@vger.kernel.org, Paul Mackerras , kvm@vger.kernel.org Subject: Re: [PATCH 2/2] KVM: PPC: Book3S HV: XIVE: replace the 'destroy' method by 'release' Message-ID: <20190412063402.GH8005@umbus.fritz.box> References: <20190411135302.27509-1-clg@kaod.org> <20190411135302.27509-3-clg@kaod.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="smOfPzt+Qjm5bNGJ" Content-Disposition: inline In-Reply-To: <20190411135302.27509-3-clg@kaod.org> User-Agent: Mutt/1.11.3 (2019-02-01) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org --smOfPzt+Qjm5bNGJ Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Apr 11, 2019 at 03:53:02PM +0200, C=E9dric Le Goater wrote: > When a P9 sPAPR VM boots, the CAS negotiation process determines which > interrupt mode to use (XICS legacy or XIVE native) and invokes a > machine reset to activate the chosen mode. >=20 > We introduce 'release' methods for the XICS-on-XIVE and the XIVE > native KVM devices which are called when the file descriptor of the > device is closed after the TIMA and ESB pages have been unmapped. > They perform the necessary cleanups : clear the vCPU interrupt > presenters that could be attached and then destroy the device. The > 'release' methods replace the 'destroy' methods as 'destroy' is not > called anymore once 'release' is. Compatibility with older QEMU is > nevertheless maintained. >=20 > This is not considered as a safe operation as the vCPUs are still > running and could be referencing the KVM device through their > presenters. To protect the system from any breakage, the kvmppc_xive > objects representing both KVM devices are now stored in an array under > the VM. Allocation is performed on first usage and memory is freed > only when the VM exits. >=20 > Signed-off-by: C=E9dric Le Goater Reviewed-by: David Gibson Although a nit.. > --- > arch/powerpc/include/asm/kvm_host.h | 1 + > arch/powerpc/kvm/book3s_xive.h | 1 + > arch/powerpc/kvm/book3s_xive.c | 82 +++++++++++++++++++++++++-- > arch/powerpc/kvm/book3s_xive_native.c | 30 ++++++++-- > arch/powerpc/kvm/powerpc.c | 9 +++ > 5 files changed, 112 insertions(+), 11 deletions(-) >=20 > diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/a= sm/kvm_host.h > index 9cc6abdce1b9..ed059c95e56a 100644 > --- a/arch/powerpc/include/asm/kvm_host.h > +++ b/arch/powerpc/include/asm/kvm_host.h > @@ -314,6 +314,7 @@ struct kvm_arch { > #ifdef CONFIG_KVM_XICS > struct kvmppc_xics *xics; > struct kvmppc_xive *xive; > + struct kvmppc_xive *xive_devices[2]; This is essentially a bool-indexed array, which is a fairly confusing thing. It thing using separate fields with meaningful names would be better. It'll mean slightly more code in get_device() but not really that much, and I think the intent will be clearer. > struct kvmppc_passthru_irqmap *pimap; > #endif > struct kvmppc_ops *kvm_ops; > diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xiv= e.h > index e011622dc038..426146332984 100644 > --- a/arch/powerpc/kvm/book3s_xive.h > +++ b/arch/powerpc/kvm/book3s_xive.h > @@ -283,6 +283,7 @@ void kvmppc_xive_free_sources(struct kvmppc_xive_src_= block *sb); > int kvmppc_xive_select_target(struct kvm *kvm, u32 *server, u8 prio); > int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio, > bool single_escalation); > +struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type); > =20 > #endif /* CONFIG_KVM_XICS */ > #endif /* _KVM_PPC_BOOK3S_XICS_H */ > diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xiv= e.c > index 480a3fc6b9fd..7dec0c350a14 100644 > --- a/arch/powerpc/kvm/book3s_xive.c > +++ b/arch/powerpc/kvm/book3s_xive.c > @@ -1100,11 +1100,19 @@ void kvmppc_xive_disable_vcpu_interrupts(struct k= vm_vcpu *vcpu) > void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu) > { > struct kvmppc_xive_vcpu *xc =3D vcpu->arch.xive_vcpu; > - struct kvmppc_xive *xive =3D xc->xive; > + struct kvmppc_xive *xive; > int i; > =20 > + if (!kvmppc_xics_enabled(vcpu)) > + return; > + > + if (!xc) > + return; > + > pr_devel("cleanup_vcpu(cpu=3D%d)\n", xc->server_num); > =20 > + xive =3D xc->xive; > + > /* Ensure no interrupt is still routed to that VP */ > xc->valid =3D false; > kvmppc_xive_disable_vcpu_interrupts(vcpu); > @@ -1141,6 +1149,10 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcp= u) > } > /* Free the VP */ > kfree(xc); > + > + /* Cleanup the vcpu */ > + vcpu->arch.irq_type =3D KVMPPC_IRQ_DEFAULT; > + vcpu->arch.xive_vcpu =3D NULL; > } > =20 > int kvmppc_xive_connect_vcpu(struct kvm_device *dev, > @@ -1158,7 +1170,7 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev, > } > if (xive->kvm !=3D vcpu->kvm) > return -EPERM; > - if (vcpu->arch.irq_type) > + if (vcpu->arch.irq_type !=3D KVMPPC_IRQ_DEFAULT) > return -EBUSY; > if (kvmppc_xive_find_server(vcpu->kvm, cpu)) { > pr_devel("Duplicate !\n"); > @@ -1824,12 +1836,39 @@ void kvmppc_xive_free_sources(struct kvmppc_xive_= src_block *sb) > } > } > =20 > -static void kvmppc_xive_free(struct kvm_device *dev) > +/* > + * Called when device fd is closed > + */ > +static void kvmppc_xive_release(struct kvm_device *dev) > { > struct kvmppc_xive *xive =3D dev->private; > struct kvm *kvm =3D xive->kvm; > + struct kvm_vcpu *vcpu; > int i; > =20 > + pr_devel("Releasing xive device\n"); > + > + /* > + * When releasing the KVM device fd, the vCPUs can still be > + * running and we should clean up the vCPU interrupt > + * presenters first. > + */ > + if (atomic_read(&kvm->online_vcpus) !=3D 0) { > + /* > + * call kick_all_cpus_sync() to ensure that all CPUs > + * have executed any pending interrupts > + */ > + if (is_kvmppc_hv_enabled(kvm)) > + kick_all_cpus_sync(); > + > + /* > + * TODO: There is still a race window with the early > + * checks in kvmppc_native_connect_vcpu() > + */ > + kvm_for_each_vcpu(i, vcpu, kvm) > + kvmppc_xive_cleanup_vcpu(vcpu); > + } > + > debugfs_remove(xive->dentry); > =20 > if (kvm) > @@ -1846,11 +1885,42 @@ static void kvmppc_xive_free(struct kvm_device *d= ev) > if (xive->vp_base !=3D XIVE_INVALID_VP) > xive_native_free_vp_block(xive->vp_base); > =20 > + /* > + * A reference of the kvmppc_xive pointer is now kept under > + * the xive_devices[] array of the machine for reuse. It is > + * freed when the VM is destroyed for now until we fix all the > + * execution paths. > + */ > =20 > - kfree(xive); > kfree(dev); > } > =20 > +/* > + * When the guest chooses the interrupt mode (XICS legacy or XIVE > + * native), the VM will switch of KVM device. The previous device will > + * be "released" before the new one is created. > + * > + * Until we are sure all execution paths are well protected, provide a > + * fail safe (transitional) method for device destruction, in which > + * the XIVE device pointer is recycled and not directly freed. > + */ > +struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type) > +{ > + struct kvmppc_xive *xive; > + bool xive_native_index =3D type =3D=3D KVM_DEV_TYPE_XIVE; > + > + xive =3D kvm->arch.xive_devices[xive_native_index]; > + > + if (!xive) { > + xive =3D kzalloc(sizeof(*xive), GFP_KERNEL); > + kvm->arch.xive_devices[xive_native_index] =3D xive; > + } else { > + memset(xive, 0, sizeof(*xive)); > + } > + > + return xive; > +} > + > static int kvmppc_xive_create(struct kvm_device *dev, u32 type) > { > struct kvmppc_xive *xive; > @@ -1859,7 +1929,7 @@ static int kvmppc_xive_create(struct kvm_device *de= v, u32 type) > =20 > pr_devel("Creating xive for partition\n"); > =20 > - xive =3D kzalloc(sizeof(*xive), GFP_KERNEL); > + xive =3D kvmppc_xive_get_device(kvm, type); > if (!xive) > return -ENOMEM; > =20 > @@ -2024,7 +2094,7 @@ struct kvm_device_ops kvm_xive_ops =3D { > .name =3D "kvm-xive", > .create =3D kvmppc_xive_create, > .init =3D kvmppc_xive_init, > - .destroy =3D kvmppc_xive_free, > + .release =3D kvmppc_xive_release, > .set_attr =3D xive_set_attr, > .get_attr =3D xive_get_attr, > .has_attr =3D xive_has_attr, > diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/boo= k3s_xive_native.c > index 62648f833adf..a99af2766ce3 100644 > --- a/arch/powerpc/kvm/book3s_xive_native.c > +++ b/arch/powerpc/kvm/book3s_xive_native.c > @@ -964,15 +964,29 @@ static int kvmppc_xive_native_has_attr(struct kvm_d= evice *dev, > return -ENXIO; > } > =20 > -static void kvmppc_xive_native_free(struct kvm_device *dev) > +/* > + * Called when device fd is closed > + */ > +static void kvmppc_xive_native_release(struct kvm_device *dev) > { > struct kvmppc_xive *xive =3D dev->private; > struct kvm *kvm =3D xive->kvm; > + struct kvm_vcpu *vcpu; > int i; > =20 > debugfs_remove(xive->dentry); > =20 > - pr_devel("Destroying xive native device\n"); > + pr_devel("Releasing xive native device\n"); > + > + /* > + * When releasing the KVM device fd, the vCPUs can still be > + * running and we should clean up the vCPU interrupt > + * presenters first. > + */ > + if (atomic_read(&kvm->online_vcpus) !=3D 0) { > + kvm_for_each_vcpu(i, vcpu, kvm) > + kvmppc_xive_native_cleanup_vcpu(vcpu); > + } > =20 > if (kvm) > kvm->arch.xive =3D NULL; > @@ -987,7 +1001,13 @@ static void kvmppc_xive_native_free(struct kvm_devi= ce *dev) > if (xive->vp_base !=3D XIVE_INVALID_VP) > xive_native_free_vp_block(xive->vp_base); > =20 > - kfree(xive); > + /* > + * A reference of the kvmppc_xive pointer is now kept under > + * the xive_devices[] array of the machine for reuse. It is > + * freed when the VM is destroyed for now until we fix all the > + * execution paths. > + */ > + > kfree(dev); > } > =20 > @@ -1002,7 +1022,7 @@ static int kvmppc_xive_native_create(struct kvm_dev= ice *dev, u32 type) > if (kvm->arch.xive) > return -EEXIST; > =20 > - xive =3D kzalloc(sizeof(*xive), GFP_KERNEL); > + xive =3D kvmppc_xive_get_device(kvm, type); > if (!xive) > return -ENOMEM; > =20 > @@ -1182,7 +1202,7 @@ struct kvm_device_ops kvm_xive_native_ops =3D { > .name =3D "kvm-xive-native", > .create =3D kvmppc_xive_native_create, > .init =3D kvmppc_xive_native_init, > - .destroy =3D kvmppc_xive_native_free, > + .release =3D kvmppc_xive_native_release, > .set_attr =3D kvmppc_xive_native_set_attr, > .get_attr =3D kvmppc_xive_native_get_attr, > .has_attr =3D kvmppc_xive_native_has_attr, > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > index f54926c78320..9b9c8a9f28b5 100644 > --- a/arch/powerpc/kvm/powerpc.c > +++ b/arch/powerpc/kvm/powerpc.c > @@ -501,6 +501,15 @@ void kvm_arch_destroy_vm(struct kvm *kvm) > =20 > mutex_unlock(&kvm->lock); > =20 > + /* > + * Free the XIVE devices which are not directly freed by the > + * device 'release' method > + */ > + for (i =3D 0; i < ARRAY_SIZE(kvm->arch.xive_devices); i++) { > + kfree(kvm->arch.xive_devices[i]); > + kvm->arch.xive_devices[i] =3D NULL; > + } > + > /* drop the module reference */ > module_put(kvm->arch.kvm_ops->owner); > } --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --smOfPzt+Qjm5bNGJ Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlywMVoACgkQbDjKyiDZ s5IV+Q//R1gQc2bQKdZsYtN9hX3clib36r5aJgPiAVzQe6Ly+2fG1uKaAPo3N0k0 O/RRLNYCNJprmAEliXMYq58hr0nXkMHjgoQdpHBRT0L9cvRdpBtUb8AEU/2Q3Nn+ Ntw/VpeGkO+oSd1dPknS5ul2xQJXnQPrZFmpRmRjcpBP0Zet3bzzU7iaTNcRmcl3 Xro857/w7i0KMNAdoXxU9YVeq/4S8BeIWDww3DKxGsdvRHe7okVNDKueuqH6AqEL bYUEB/HakC00MKNIAIl7czYobzXyGAQcnLBW4Dp5HDexn4PmFr2zAJ3VUaQtwJ0D feNYRMPhH4k7SSLyx/IrArXBFTZvJKExpF0g939IWpr7Mz2MEGfjF75Vz8N6Qout 1AsthJU/vxv1RlUy078TLh2IqMWVHb73n3Saqpbi+IpP4wejokuz5q8PLZNFKO+O iq+zXZApuKbooE8qePGTnc8/DAj7kN2POLzJyDoVu8jOm27wn0wwUszmblpy4Lfa dEBr8bfSub7rc+8lCTYBLNo55nmTFYLKsehFzlHjv1C68fc8YYBs0z1BFt5029BK GYYNTtd3EbW0WbOIdk7fnW86bAyWdSdzyZO1B5kuBTzO3YlnhJRvZEmwImfuDbmX MdZKPzF7uwVDIJkQm38qvrI1jKdft6tleDLXfc1f4UQV4s0Q1r8= =+zsL -----END PGP SIGNATURE----- --smOfPzt+Qjm5bNGJ--