From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87E83C10F14 for ; Wed, 17 Apr 2019 03:05:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 112C821773 for ; Wed, 17 Apr 2019 03:05:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.b="AuGDvD+a" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728966AbfDQDFO (ORCPT ); Tue, 16 Apr 2019 23:05:14 -0400 Received: from ozlabs.org ([203.11.71.1]:57403 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728032AbfDQDFO (ORCPT ); Tue, 16 Apr 2019 23:05:14 -0400 Received: by ozlabs.org (Postfix, from userid 1007) id 44kRtZ2MgKz9s4Y; Wed, 17 Apr 2019 13:05:10 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gibson.dropbear.id.au; s=201602; t=1555470310; bh=ynZOenM3cgfCfVgVtlUZiX8iZTaOaX96h3VgUx1x8iU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=AuGDvD+aRFoZawGiLwnW5UNngt1sB5/O87DoXlZYh69OO5Sve0A0+Zu+HvAeYthhK MSUTD+Ru2OraZ+HXgK/2wouC1HoOQD7cRAGb0XiXPhAamOuPHRx7bVUR2JLtNp9q3S neeUJcYwc0hYJfZdypG4PtSpCx7pXo0m2kHaVGFA= Date: Wed, 17 Apr 2019 12:05:29 +1000 From: David Gibson To: =?iso-8859-1?Q?C=E9dric?= Le Goater Cc: kvm-ppc@vger.kernel.org, Paul Mackerras , kvm@vger.kernel.org Subject: Re: [RFC PATCH v4 17/17] KVM: PPC: Book3S HV: XIVE: introduce a 'release' device operation Message-ID: <20190417020529.GI32705@umbus.fritz.box> References: <20190320083751.27001-1-clg@kaod.org> <20190409141347.3029-1-clg@kaod.org> <20190409141347.3029-2-clg@kaod.org> <20190415033219.GC32705@umbus.fritz.box> <2d04c5ba-4965-2f1d-ec98-cae7cf8c7cff@kaod.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="y96v7rNg6HAoELs5" Content-Disposition: inline In-Reply-To: <2d04c5ba-4965-2f1d-ec98-cae7cf8c7cff@kaod.org> User-Agent: Mutt/1.11.3 (2019-02-01) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org --y96v7rNg6HAoELs5 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Apr 15, 2019 at 03:48:58PM +0200, C=E9dric Le Goater wrote: > On 4/15/19 5:32 AM, David Gibson wrote: > > On Tue, Apr 09, 2019 at 04:13:47PM +0200, C=E9dric Le Goater wrote: > >> When the VM boots, the CAS negotiation process determines which > >> interrupt mode to use and invokes a machine reset. At that time, any > >> links to the previous KVM interrupt device should be 'destroyed' > >> before the new chosen one is created. > >> > >> To perform the necessary cleanups in KVM, we extend the KVM device > >> interface with a new 'release' operation which is called when the file > >> descriptor of the device is closed. > >> > >> Such operations are defined for the XICS-on-XIVE and the XIVE native > >> KVM devices. They clear the vCPU interrupt presenters that could be > >> attached and then destroy the device. > >> > >> Signed-off-by: C=E9dric Le Goater > >> --- > >> include/linux/kvm_host.h | 1 + > >> arch/powerpc/kvm/book3s_xive.c | 50 +++++++++++++++++++++++++-- > >> arch/powerpc/kvm/book3s_xive_native.c | 23 ++++++++++++ > >> virt/kvm/kvm_main.c | 13 +++++++ > >> 4 files changed, 85 insertions(+), 2 deletions(-) > >> > >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > >> index 831d963451d8..3b444620d8fc 100644 > >> --- a/include/linux/kvm_host.h > >> +++ b/include/linux/kvm_host.h > >> @@ -1246,6 +1246,7 @@ struct kvm_device_ops { > >> long (*ioctl)(struct kvm_device *dev, unsigned int ioctl, > >> unsigned long arg); > >> int (*mmap)(struct kvm_device *dev, struct vm_area_struct *vma); > >> + void (*release)(struct kvm_device *dev); > >> }; > >> =20 > >> void kvm_device_get(struct kvm_device *dev); > >> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_= xive.c > >> index 4d4e1730de84..ba777db849d7 100644 > >> --- a/arch/powerpc/kvm/book3s_xive.c > >> +++ b/arch/powerpc/kvm/book3s_xive.c > >> @@ -1100,11 +1100,19 @@ void kvmppc_xive_disable_vcpu_interrupts(struc= t kvm_vcpu *vcpu) > >> void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu) > >> { > >> struct kvmppc_xive_vcpu *xc =3D vcpu->arch.xive_vcpu; > >> - struct kvmppc_xive *xive =3D xc->xive; > >> + struct kvmppc_xive *xive; > >> int i; > >> =20 > >> + if (!kvmppc_xics_enabled(vcpu)) > >> + return; > >> + > >> + if (!xc) > >> + return; > >> + > >> pr_devel("cleanup_vcpu(cpu=3D%d)\n", xc->server_num); > >> =20 > >> + xive =3D xc->xive; > >> + > >> /* Ensure no interrupt is still routed to that VP */ > >> xc->valid =3D false; > >> kvmppc_xive_disable_vcpu_interrupts(vcpu); > >> @@ -1141,6 +1149,10 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *= vcpu) > >> } > >> /* Free the VP */ > >> kfree(xc); > >> + > >> + /* Cleanup the vcpu */ > >> + vcpu->arch.irq_type =3D KVMPPC_IRQ_DEFAULT; > >> + vcpu->arch.xive_vcpu =3D NULL; > >> } > >> =20 > >> int kvmppc_xive_connect_vcpu(struct kvm_device *dev, > >> @@ -1158,7 +1170,7 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *= dev, > >> } > >> if (xive->kvm !=3D vcpu->kvm) > >> return -EPERM; > >> - if (vcpu->arch.irq_type) > >> + if (vcpu->arch.irq_type !=3D KVMPPC_IRQ_DEFAULT) > >> return -EBUSY; > >> if (kvmppc_xive_find_server(vcpu->kvm, cpu)) { > >> pr_devel("Duplicate !\n"); > >> @@ -1855,6 +1867,39 @@ static void kvmppc_xive_free(struct kvm_device = *dev) > >> kfree(dev); > >> } > >> =20 > >> +static void kvmppc_xive_release(struct kvm_device *dev) > >> +{ > >> + struct kvmppc_xive *xive =3D dev->private; > >> + struct kvm *kvm =3D xive->kvm; > >> + struct kvm_vcpu *vcpu; > >> + int i; > >> + > >> + pr_devel("Releasing xive device\n"); > >> + > >> + /* > >> + * When releasing the KVM device fd, the vCPUs can still be > >> + * running and we should clean up the vCPU interrupt > >> + * presenters first. > >> + */ > >> + if (atomic_read(&kvm->online_vcpus) !=3D 0) { > >=20 > > What prevents online_vcpus from becoming non-zero after this test, but > > before the kvmppc_xive_free()? >=20 > I am not sure what you mean. kvmppc_xive_free() is gone with this patch.= =20 > It has been replaced by kvmppc_xive_release(). >=20 > > Is the test actually necessary? The operations below should be safe > > even if there are no online cpus, yes? >=20 > ah, yes. kvm_for_each_vcpu() should be safe to use anyhow. >=20 > >> + /* > >> + * call kick_all_cpus_sync() to ensure that all CPUs > >> + * have executed any pending interrupts > >> + */ > >> + if (is_kvmppc_hv_enabled(kvm)) > >> + kick_all_cpus_sync();>> + /* > >> + * TODO: There is still a race window with the early > >> + * checks in kvmppc_native_connect_vcpu() > >> + */ > >=20 > > That's... not reassuring. What are the consequences of that race,=20 >=20 > a bogus ->xive pointer under the XIVE vCPU >=20 > > and what do you plan to do about it? >=20 > I don't think this is true any more with the release operation > which will be called by the last user of the device file. Ok, so the comment needs updating. > Anyhow, xc->xive does not seem very useful (just like xc->valid)=20 > We should try to use only vcpu->kvm->arch.xive instead. >=20 > I will propose some preliminary cleanups before introducing the > new release operation. >=20 > >> + kvm_for_each_vcpu(i, vcpu, kvm) > >> + kvmppc_xive_cleanup_vcpu(vcpu); > >> + } > >> + > >> + kvmppc_xive_free(dev); > >> +} > >> + > >> struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type) > >> { > >> struct kvmppc_xive *xive; > >> @@ -2043,6 +2088,7 @@ struct kvm_device_ops kvm_xive_ops =3D { > >> .name =3D "kvm-xive", > >> .create =3D kvmppc_xive_create, > >> .init =3D kvmppc_xive_init, > >> + .release =3D kvmppc_xive_release, > >> .destroy =3D kvmppc_xive_free, > >> .set_attr =3D xive_set_attr, > >> .get_attr =3D xive_get_attr, > >> diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/= book3s_xive_native.c > >> index 092db0efe628..629da7bf2a89 100644 > >> --- a/arch/powerpc/kvm/book3s_xive_native.c > >> +++ b/arch/powerpc/kvm/book3s_xive_native.c > >> @@ -996,6 +996,28 @@ static void kvmppc_xive_native_free(struct kvm_de= vice *dev) > >> kfree(dev); > >> } > >> =20 > >> +static void kvmppc_xive_native_release(struct kvm_device *dev) > >> +{ > >> + struct kvmppc_xive *xive =3D dev->private; > >> + struct kvm *kvm =3D xive->kvm; > >> + struct kvm_vcpu *vcpu; > >> + int i; > >> + > >> + pr_devel("Releasing xive native device\n"); > >> + > >> + /* > >> + * When releasing the KVM device fd, the vCPUs can still be > >> + * running and we should clean up the vCPU interrupt > >> + * presenters first. > >> + */ > >> + if (atomic_read(&kvm->online_vcpus) !=3D 0) { > >=20 > > Likewise here. > >=20 > >> + kvm_for_each_vcpu(i, vcpu, kvm) > >> + kvmppc_xive_native_cleanup_vcpu(vcpu); > >> + } > >> + > >> + kvmppc_xive_native_free(dev); > >> +} > >> + > >> static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type) > >> { > >> struct kvmppc_xive *xive; > >> @@ -1187,6 +1209,7 @@ struct kvm_device_ops kvm_xive_native_ops =3D { > >> .name =3D "kvm-xive-native", > >> .create =3D kvmppc_xive_native_create, > >> .init =3D kvmppc_xive_native_init, > >> + .release =3D kvmppc_xive_native_release, > >> .destroy =3D kvmppc_xive_native_free, > >> .set_attr =3D kvmppc_xive_native_set_attr, > >> .get_attr =3D kvmppc_xive_native_get_attr, > >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > >> index ea2018ae1cd7..ea2619d5ca98 100644 > >> --- a/virt/kvm/kvm_main.c > >> +++ b/virt/kvm/kvm_main.c > >> @@ -2938,6 +2938,19 @@ static int kvm_device_release(struct inode *ino= de, struct file *filp) > >> struct kvm_device *dev =3D filp->private_data; > >> struct kvm *kvm =3D dev->kvm; > >> =20 > >> + if (!dev) > >> + return -ENODEV; > >> + > >> + if (dev->kvm !=3D kvm) > >> + return -EPERM; > >> + > >> + if (dev->ops->release) { > >> + mutex_lock(&kvm->lock); > >> + list_del(&dev->vm_node); > >> + dev->ops->release(dev); > >> + mutex_unlock(&kvm->lock); > >> + } > >> + > >=20 > > Wasn't there a big comment that explained that release replaced > > destroy somewhere? >=20 > Yes. I did add a comment in the "V5 errata" series.=20 >=20 > I should be sending a v6 this week, to clarify all these attempts=20 > to solve the device switching. >=20 > Thanks, >=20 > C.=20 >=20 > >=20 > >> kvm_put_kvm(kvm); > >> return 0; > >> } > >=20 >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --y96v7rNg6HAoELs5 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAly2iekACgkQbDjKyiDZ s5KKEQ/+MDiRPWlVgW1HzxXlT+yt7ZXRMu3NMN7yOOMjTI0JTN6TYMJ3hnY9ET/r CJm9rJB5aE06VthCl0dzvubLxYmPE5NYNnfbP1OXZnWWJLaMwfxFbGNvRWIZH1Kw Ngwmk9RIjHodcSSFXf/imWXt+pkAR2b8pR/ITVyDra57KSTIilcP5SSvtqEEjWFg lUPQK/wjq/14qIHrFev3bvUzzsNIbW1phjjUOAG6idlBU/5FEpNI6g8M3e1YLQ4j OEi5NSPCZPnwNmeOxthEu2fxD8k5XISRFgod8GhauBOrp5eAf8HROWK0KsrrHlcV Dic6H+m4f+OGZuyoV3pDJqx9vzWqD/Mh87N1KrDSHWcSRobyywFJNxhy5pXa3T35 xd+2HTz8h4TCwPkd247hhmvd2t/vK8csL4Cqb0uPe3Z830yElu5bvazFG3y/wxzp oJuz5XBPhUR0hd4K0C495O1eA9d/FNyJ3kg+pp/Fg5mMhFQuqWeb1R72Cy9lDmxd UDvjYgbBOY8mK7HSKYmLyBRwLGp0AofOjSRGs3A+S/QmlCdHu5arGxQD93S5YJGd JdTq8JhwloOXWD4CeIF9lty4oE/Axss579X8JszvtP0vvQpI4yAui3c9YMFwW0ax CJgAA0v/WEw/dMdB9zwF9mtnHWSdSLswtAR/TcnOtH557TuacPM= =eRD6 -----END PGP SIGNATURE----- --y96v7rNg6HAoELs5--