From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths Date: Thu, 09 Nov 2017 12:01:26 +0100 Message-ID: <1510225286.4517.204.camel@linux.it> References: <58A596C0020000780013AA84@prv-mh.provo.novell.com> <58A597D8020000780013AAAF@prv-mh.provo.novell.com> <5ca9f140-a574-a8d0-1231-4ce0aec0e124@citrix.com> <5A0177B8020000780018CCC9@prv-mh.provo.novell.com> <1510221291.4517.170.camel@linux.it> <5A043944020000780018D7B6@prv-mh.provo.novell.com> <1510223780.3654.1.camel@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6365577995576675679==" Return-path: Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eCkaS-0003kK-9Z for xen-devel@lists.xenproject.org; Thu, 09 Nov 2017 11:01:32 +0000 Received: by mail-wr0-f173.google.com with SMTP id p96so5194393wrb.7 for ; Thu, 09 Nov 2017 03:01:30 -0800 (PST) In-Reply-To: <1510223780.3654.1.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Sergey Dyasli , "JBeulich@suse.com" Cc: Igor Druzhinin , Kevin Tian , Andrew Cooper , anshul makkar , "jun.nakajima@intel.com" , "xen-devel@lists.xenproject.org" List-Id: xen-devel@lists.xenproject.org --===============6365577995576675679== Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-01lt0qRh8O+OcXPoLrj+" --=-01lt0qRh8O+OcXPoLrj+ Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, 2017-11-09 at 10:36 +0000, Sergey Dyasli wrote: > On Thu, 2017-11-09 at 03:17 -0700, Jan Beulich wrote: > > > > > On 09.11.17 at 10:54, wrote: > > >=20 > > > On Tue, 2017-11-07 at 14:24 +0000, Igor Druzhinin wrote: > > > > Perhaps I should improve my diagram: > > > >=20 > > > > pCPU1: vCPUx of domain X -> migrate to pCPU2 -> switch to idle > > > > context > > > > -> RCU callbacks -> vcpu_destroy(vCPUy of domain Y) -> > > > > vmx_vcpu_disable_pml() -> vmx_vmcs_clear() (VMCS is trashed at > > > > this > > > > point on pCPU1) > > > >=20 > > > > pCPU2: context switch into vCPUx -> vCPUx.is_running =3D 1 -> TLB > > > > flush > > > > from context switch to clean TLB on pCPU1 > > > >=20 > > >=20 > > > Sorry, there must be something I'm missing (or misunderstanding). > > >=20 > > > What is this code that checks is_running and triggers the TLB > > > flush? > >=20 > > I don't see where Igor said is_running is being checked around a > > TLB flush. The TLB flush itself is what happens first thing in > > context_switch() (and it's really using the TLB flush interface to > > mainly effect the state flush, with the TLB flush being an implied > > side effect; I've already got a series of further patches to make > > this less implicit). > >=20 > > > But, more important, how come you are context switching to > > > something > > > that has is_running =3D=3D 1 ? That should not be possible. > >=20 > > That's not what Igor's diagram says - it's indicating the fact that > > is_running is being set to 1 in the process of context switching > > into vCPUx. >=20 > Jan, Dario, >=20 Hi, > Igor was referring to the following situation: >=20 >=20 > pCPU1 pCPU2 > =3D=3D=3D=3D=3D =3D=3D=3D=3D=3D > current =3D=3D vCPU1 > context_switch(next =3D=3D idle) > !! __context_switch() is skipped > vcpu_migrate(vCPU1) > RCU callbacks > vmx_vcpu_destroy() > vmx_vcpu_disable_pml() > current_vmcs =3D 0 >=20 > schedule(next =3D=3D vCPU1) > vCPU1->is_running =3D 1; > context_switch(next =3D=3D vCPU1) > flush_tlb_mask(&dirty_mask); > =20 > <--- IPI >=20 > __sync_local_execstate() > __context_switch(prev =3D=3D vCPU1) > vmx_ctxt_switch_from(vCPU1) > vCPU1->is_running =3D=3D 1 > !! vmx_vmcs_reload() is skipped >=20 > I hope that this better illustrates the root cause. >=20 Yes, I think this is clearer, and easier to understand. But that's probably a mater of habit and personal taste, so sorry again for misunderstanding it in its other form. Anyway, as I was trying to explain replaying to Jan, although in this situation the issue manifests as a consequence of vCPU migration, I think it is indeed more general, as in, without even the need to consider a second pCPU: pCPU1 =3D=3D=3D=3D=3D current =3D=3D vCPU1 context_switch(next =3D=3D idle) !! __context_switch() is skipped vcpu_migrate(vCPU1) anything_that_uses_or_touches_context() So, it must be anything_that_uses_or_touches_context() --knowing it will be reading or touching the pCPU context-- that syncs the state, before using or touching it (which is, e.g., what Jan's patch does). The only other solution I see, is to get rid of lazy context switch. Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli --=-01lt0qRh8O+OcXPoLrj+ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEES5ssOj3Vhr0WPnOLFkJ4iaW4c+4FAloENYYACgkQFkJ4iaW4 c+5L6g//YwryMX7zaBdzSQW7k2+WKblu/Js6ijN0nABJVg/gAMIbTWMa9AyMnIRZ oC5SxPh4raiGJceXjFj1UyCFZ8YdcCcobaFmEUV8KoHjfpjs9e6kJf04zclc4Wg/ uieKYccOsUv8FBoZcYYBxJfK3mh256DAKL9IzztIIwLPK0NCs+PsEFER5ct0GuJ+ Lc3xXVgZCLD9599vpI+lIl4KiUeQiK8+J8DFRZXoL76vInve2bJpsV9+KSKIEPYw E89ACz3vrGxVPzpIKmMoKPsFD/rK9VULDenUH7cjbl1R5i7068LU7n/vSVwfUQqN QdtuhvIKS0VZ3NqWVFd2BN+9i5JvXYDjPg2WXeAgrKmdI6AH2vDWF4M3LNMjkmxZ 9aGFaYj1ejVrHRtpoJ4UXyKVywSM6iA02iGYzic3cuMYfN878nzTWRydOHsyf3Ca 6N447tyvElJIckm63Af0GmWn4lwPt0rZ5S3QBYz6w19bUO9KJIx9MgRNUQlQNu4L dfBIpvoYyo13QcXHUGgC8Altg12x6nd++xf8xiMrNJtdAenwuUPDDdDgw4QBwfMR nxTFsFB38KIBlBR7xXMZauRNYWXr59aEIOx5KArin3jb379UxFSBJU8gbzFu1fHy d3IKLxiA4l4GMkYgQzf63jEcq4RRaHstekDRN+zLUVYrVFc7VO0= =+JAW -----END PGP SIGNATURE----- --=-01lt0qRh8O+OcXPoLrj+-- --===============6365577995576675679== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============6365577995576675679==--