From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31ADCC10F14 for ; Tue, 23 Apr 2019 06:46:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DFBDE20843 for ; Tue, 23 Apr 2019 06:46:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=gibson.dropbear.id.au header.i=@gibson.dropbear.id.au header.b="VloxGMZj" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726291AbfDWGqY (ORCPT ); Tue, 23 Apr 2019 02:46:24 -0400 Received: from ozlabs.org ([203.11.71.1]:42159 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725935AbfDWGqW (ORCPT ); Tue, 23 Apr 2019 02:46:22 -0400 Received: by ozlabs.org (Postfix, from userid 1007) id 44pDVy32kTz9sB3; Tue, 23 Apr 2019 16:46:18 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gibson.dropbear.id.au; s=201602; t=1556001978; bh=0SkOzgwdEyJMzGISdUdi8bnX64kEGVomaw6JljgNwNQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=VloxGMZj3FGhWoimhqdYBnbJjRSUurIhA9qXbSu181aECVUgNjZNGQgHNl7gSiNK7 jx5zjvZHi0S30QVBfYnoIb8A8XFv82uQTqASvNPd58tIPKV7jzw7PQjQe+n/PBSCi7 J5IVcume7jXXOgLcTXJudE29lem4rdaT67ymmq8s= Date: Tue, 23 Apr 2019 16:04:15 +1000 From: David Gibson To: =?iso-8859-1?Q?C=E9dric?= Le Goater Cc: kvm-ppc@vger.kernel.org, Paul Mackerras , kvm@vger.kernel.org Subject: Re: [PATCH v6 17/17] KVM: PPC: Book3S HV: XIVE: replace the 'destroy' method by a 'release' method Message-ID: <20190423060415.GC31496@umbus.fritz.box> References: <20190418103942.2883-1-clg@kaod.org> <20190418103942.2883-18-clg@kaod.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="vOmOzSkFvhd7u8Ms" Content-Disposition: inline In-Reply-To: <20190418103942.2883-18-clg@kaod.org> User-Agent: Mutt/1.11.3 (2019-02-01) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org --vOmOzSkFvhd7u8Ms Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Apr 18, 2019 at 12:39:42PM +0200, C=E9dric Le Goater wrote: > When a P9 sPAPR VM boots, the CAS negotiation process determines which > interrupt mode to use (XICS legacy or XIVE native) and invokes a > machine reset to activate the chosen mode. >=20 > We introduce 'release' methods for the XICS-on-XIVE and the XIVE > native KVM devices which are called when the file descriptor of the > device is closed after the TIMA and ESB pages have been unmapped. > They perform the necessary cleanups : clear the vCPU interrupt > presenters that could be attached and then destroy the device. The > 'release' methods replace the 'destroy' methods as 'destroy' is not > called anymore once 'release' is. Compatibility with older QEMU is > nevertheless maintained. >=20 > This is not considered as a safe operation as the vCPUs are still > running and could be referencing the KVM device through their > presenters. To protect the system from any breakage, the kvmppc_xive > objects representing both KVM devices are now stored in an array under > the VM. Allocation is performed on first usage and memory is freed > only when the VM exits. >=20 > Signed-off-by: C=E9dric Le Goater Reviewed-by: David Gibson > --- >=20 > Changes since v5 : >=20 > - replaced the kvmppc_xive device array with named kvmppc_xive structs. > - in kvmppc_xive_cleanup_vcpu(), replaced xc->xive by vcpu->kvm->arch.xi= ve > which is always valid in this context > - in kvmppc_xive_release(), removed the comment on the race. The > device being destroyed when the fd is closed, dev->private should > always be valid when used. Also removed the kick_all_cpus_sync() to > flush the IPIs. =20 > - in the both release operation, removed the test on &kvm->online_vcpus. > - devices are now freed when the machine exits. > =20 > arch/powerpc/include/asm/kvm_host.h | 6 ++- > arch/powerpc/kvm/book3s_xive.h | 1 + > arch/powerpc/kvm/book3s_xive.c | 67 ++++++++++++++++++++++++--- > arch/powerpc/kvm/book3s_xive_native.c | 28 +++++++++-- > arch/powerpc/kvm/powerpc.c | 9 ++++ > 5 files changed, 99 insertions(+), 12 deletions(-) >=20 > diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/a= sm/kvm_host.h > index 9cc6abdce1b9..c7abc7c3c6cd 100644 > --- a/arch/powerpc/include/asm/kvm_host.h > +++ b/arch/powerpc/include/asm/kvm_host.h > @@ -313,7 +313,11 @@ struct kvm_arch { > #endif > #ifdef CONFIG_KVM_XICS > struct kvmppc_xics *xics; > - struct kvmppc_xive *xive; > + struct kvmppc_xive *xive; /* Current XIVE device in use */ > + struct { > + struct kvmppc_xive *native; > + struct kvmppc_xive *xics_on_xive; > + } xive_devices; > struct kvmppc_passthru_irqmap *pimap; > #endif > struct kvmppc_ops *kvm_ops; > diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xiv= e.h > index e011622dc038..426146332984 100644 > --- a/arch/powerpc/kvm/book3s_xive.h > +++ b/arch/powerpc/kvm/book3s_xive.h > @@ -283,6 +283,7 @@ void kvmppc_xive_free_sources(struct kvmppc_xive_src_= block *sb); > int kvmppc_xive_select_target(struct kvm *kvm, u32 *server, u8 prio); > int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio, > bool single_escalation); > +struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type); > =20 > #endif /* CONFIG_KVM_XICS */ > #endif /* _KVM_PPC_BOOK3S_XICS_H */ > diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xiv= e.c > index 480a3fc6b9fd..922689b768e6 100644 > --- a/arch/powerpc/kvm/book3s_xive.c > +++ b/arch/powerpc/kvm/book3s_xive.c > @@ -1100,9 +1100,15 @@ void kvmppc_xive_disable_vcpu_interrupts(struct kv= m_vcpu *vcpu) > void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu) > { > struct kvmppc_xive_vcpu *xc =3D vcpu->arch.xive_vcpu; > - struct kvmppc_xive *xive =3D xc->xive; > + struct kvmppc_xive *xive =3D vcpu->kvm->arch.xive; > int i; > =20 > + if (!kvmppc_xics_enabled(vcpu)) > + return; > + > + if (!xc) > + return; > + > pr_devel("cleanup_vcpu(cpu=3D%d)\n", xc->server_num); > =20 > /* Ensure no interrupt is still routed to that VP */ > @@ -1141,6 +1147,10 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcp= u) > } > /* Free the VP */ > kfree(xc); > + > + /* Cleanup the vcpu */ > + vcpu->arch.irq_type =3D KVMPPC_IRQ_DEFAULT; > + vcpu->arch.xive_vcpu =3D NULL; > } > =20 > int kvmppc_xive_connect_vcpu(struct kvm_device *dev, > @@ -1158,7 +1168,7 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev, > } > if (xive->kvm !=3D vcpu->kvm) > return -EPERM; > - if (vcpu->arch.irq_type) > + if (vcpu->arch.irq_type !=3D KVMPPC_IRQ_DEFAULT) > return -EBUSY; > if (kvmppc_xive_find_server(vcpu->kvm, cpu)) { > pr_devel("Duplicate !\n"); > @@ -1824,12 +1834,26 @@ void kvmppc_xive_free_sources(struct kvmppc_xive_= src_block *sb) > } > } > =20 > -static void kvmppc_xive_free(struct kvm_device *dev) > +/* > + * Called when device fd is closed > + */ > +static void kvmppc_xive_release(struct kvm_device *dev) > { > struct kvmppc_xive *xive =3D dev->private; > struct kvm *kvm =3D xive->kvm; > + struct kvm_vcpu *vcpu; > int i; > =20 > + pr_devel("Releasing xive device\n"); > + > + /* > + * When releasing the KVM device fd, the vCPUs can still be > + * running and we should clean up the vCPU interrupt > + * presenters first. > + */ > + kvm_for_each_vcpu(i, vcpu, kvm) > + kvmppc_xive_cleanup_vcpu(vcpu); > + > debugfs_remove(xive->dentry); > =20 > if (kvm) > @@ -1846,11 +1870,42 @@ static void kvmppc_xive_free(struct kvm_device *d= ev) > if (xive->vp_base !=3D XIVE_INVALID_VP) > xive_native_free_vp_block(xive->vp_base); > =20 > + /* > + * A reference of the kvmppc_xive pointer is now kept under > + * the xive_devices struct of the machine for reuse. It is > + * freed when the VM is destroyed for now until we fix all the > + * execution paths. > + */ > =20 > - kfree(xive); > kfree(dev); > } > =20 > +/* > + * When the guest chooses the interrupt mode (XICS legacy or XIVE > + * native), the VM will switch of KVM device. The previous device will > + * be "released" before the new one is created. > + * > + * Until we are sure all execution paths are well protected, provide a > + * fail safe (transitional) method for device destruction, in which > + * the XIVE device pointer is recycled and not directly freed. > + */ > +struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type) > +{ > + struct kvmppc_xive **kvm_xive_device =3D type =3D=3D KVM_DEV_TYPE_XIVE ? > + &kvm->arch.xive_devices.native : > + &kvm->arch.xive_devices.xics_on_xive; > + struct kvmppc_xive *xive =3D *kvm_xive_device; > + > + if (!xive) { > + xive =3D kzalloc(sizeof(*xive), GFP_KERNEL); > + *kvm_xive_device =3D xive; > + } else { > + memset(xive, 0, sizeof(*xive)); > + } > + > + return xive; > +} > + > static int kvmppc_xive_create(struct kvm_device *dev, u32 type) > { > struct kvmppc_xive *xive; > @@ -1859,7 +1914,7 @@ static int kvmppc_xive_create(struct kvm_device *de= v, u32 type) > =20 > pr_devel("Creating xive for partition\n"); > =20 > - xive =3D kzalloc(sizeof(*xive), GFP_KERNEL); > + xive =3D kvmppc_xive_get_device(kvm, type); > if (!xive) > return -ENOMEM; > =20 > @@ -2024,7 +2079,7 @@ struct kvm_device_ops kvm_xive_ops =3D { > .name =3D "kvm-xive", > .create =3D kvmppc_xive_create, > .init =3D kvmppc_xive_init, > - .destroy =3D kvmppc_xive_free, > + .release =3D kvmppc_xive_release, > .set_attr =3D xive_set_attr, > .get_attr =3D xive_get_attr, > .has_attr =3D xive_has_attr, > diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/boo= k3s_xive_native.c > index 62648f833adf..0497272a72fa 100644 > --- a/arch/powerpc/kvm/book3s_xive_native.c > +++ b/arch/powerpc/kvm/book3s_xive_native.c > @@ -964,15 +964,27 @@ static int kvmppc_xive_native_has_attr(struct kvm_d= evice *dev, > return -ENXIO; > } > =20 > -static void kvmppc_xive_native_free(struct kvm_device *dev) > +/* > + * Called when device fd is closed > + */ > +static void kvmppc_xive_native_release(struct kvm_device *dev) > { > struct kvmppc_xive *xive =3D dev->private; > struct kvm *kvm =3D xive->kvm; > + struct kvm_vcpu *vcpu; > int i; > =20 > debugfs_remove(xive->dentry); > =20 > - pr_devel("Destroying xive native device\n"); > + pr_devel("Releasing xive native device\n"); > + > + /* > + * When releasing the KVM device fd, the vCPUs can still be > + * running and we should clean up the vCPU interrupt > + * presenters first. > + */ > + kvm_for_each_vcpu(i, vcpu, kvm) > + kvmppc_xive_native_cleanup_vcpu(vcpu); > =20 > if (kvm) > kvm->arch.xive =3D NULL; > @@ -987,7 +999,13 @@ static void kvmppc_xive_native_free(struct kvm_devic= e *dev) > if (xive->vp_base !=3D XIVE_INVALID_VP) > xive_native_free_vp_block(xive->vp_base); > =20 > - kfree(xive); > + /* > + * A reference of the kvmppc_xive pointer is now kept under > + * the xive_devices struct of the machine for reuse. It is > + * freed when the VM is destroyed for now until we fix all the > + * execution paths. > + */ > + > kfree(dev); > } > =20 > @@ -1002,7 +1020,7 @@ static int kvmppc_xive_native_create(struct kvm_dev= ice *dev, u32 type) > if (kvm->arch.xive) > return -EEXIST; > =20 > - xive =3D kzalloc(sizeof(*xive), GFP_KERNEL); > + xive =3D kvmppc_xive_get_device(kvm, type); > if (!xive) > return -ENOMEM; > =20 > @@ -1182,7 +1200,7 @@ struct kvm_device_ops kvm_xive_native_ops =3D { > .name =3D "kvm-xive-native", > .create =3D kvmppc_xive_native_create, > .init =3D kvmppc_xive_native_init, > - .destroy =3D kvmppc_xive_native_free, > + .release =3D kvmppc_xive_native_release, > .set_attr =3D kvmppc_xive_native_set_attr, > .get_attr =3D kvmppc_xive_native_get_attr, > .has_attr =3D kvmppc_xive_native_has_attr, > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > index f54926c78320..7f41a68d8094 100644 > --- a/arch/powerpc/kvm/powerpc.c > +++ b/arch/powerpc/kvm/powerpc.c > @@ -501,6 +501,15 @@ void kvm_arch_destroy_vm(struct kvm *kvm) > =20 > mutex_unlock(&kvm->lock); > =20 > + /* > + * Free the XIVE devices which are not directly freed by the > + * device 'release' method > + */ > + kfree(kvm->arch.xive_devices.native); > + kvm->arch.xive_devices.native =3D NULL; > + kfree(kvm->arch.xive_devices.xics_on_xive); > + kvm->arch.xive_devices.xics_on_xive =3D NULL; > + > /* drop the module reference */ > module_put(kvm->arch.kvm_ops->owner); > } --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --vOmOzSkFvhd7u8Ms Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAly+qt0ACgkQbDjKyiDZ s5IfjA/9E2vYKaTcJvVa9eAk+3kELq6LznSTibCd8H4I890moce7rbl0Y9PnZ4EJ cNgcNpAGLtZeiqDgeMhQ1Nb8Fw3vsz74eg3s3hjwZsfdkHaYwL0fh51fRv5/bz/m p+2HM+ksVmucVmnImz1xmr4kSFw//ga3VHMU850Mo1Zaalja4AtRWacobSbWVUMa 4ZSTGhqTZDjM05PGW95mbP1f8lgZ9XQvrWb9P63Lh2BTW/xO7BH9sP26cR4cXkEd prg+3Utu11FZe6gZ+qC5PbAJkq43yfiTkRk7nbA/ilPcL5rt4msPQ2O4MuWX5EAO urEjg2SWvtCUvwyY0a2menWJwYz4x1O/UQovAGt+i6HhnoYCk6fxG0WW978UqNFm AR37FwP7JSfqhpL/cGubyk64rSRqugi1/0woWuj9TWEoWopmniot6GTCqQGQeg3N NihnPuPLaowIdZq6aL6XIUpbG+flWcf4T2QPlKPG0K2b3PE37+AmCWs2Hbfgvhn+ 2gyXrP6FusEwNn0sVMW7YxGRmXShci4jaYkEcLisL6wGSiqTRPlWctfpoZ9yVsk7 hoKpFCdxP4FDSg5pInEdGCpMSwqcuNvGhYicg5EdyojE2fnjDQrqT42QTK43cjKO ohBZcAzH3srY+5Nuiy+Sgr9EvCmpZAVSs35Gs5yoVBu+43ZX3wc= =ei7l -----END PGP SIGNATURE----- --vOmOzSkFvhd7u8Ms--