From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH v2 1/2] VMX: fix VMCS race on context-switch paths Date: Thu, 09 Nov 2017 15:16:02 +0100 Message-ID: <1510236962.4517.220.camel@linux.it> References: <58A596C0020000780013AA84@prv-mh.provo.novell.com> <58A597D8020000780013AAAF@prv-mh.provo.novell.com> <5ca9f140-a574-a8d0-1231-4ce0aec0e124@citrix.com> <5A0177B8020000780018CCC9@prv-mh.provo.novell.com> <1510221291.4517.170.camel@linux.it> <5A043944020000780018D7B6@prv-mh.provo.novell.com> <1510223780.3654.1.camel@citrix.com> <1510225286.4517.204.camel@linux.it> <5A04616F020000780018D92D@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6142068573641463827==" Return-path: Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eCncu-00047Z-Is for xen-devel@lists.xenproject.org; Thu, 09 Nov 2017 14:16:16 +0000 Received: by mail-wr0-f181.google.com with SMTP id y9so5744009wrb.2 for ; Thu, 09 Nov 2017 06:16:07 -0800 (PST) In-Reply-To: <5A04616F020000780018D92D@prv-mh.provo.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Jan Beulich Cc: Sergey Dyasli , Kevin Tian , Igor Druzhinin , AndrewCooper , anshul makkar , "jun.nakajima@intel.com" , "xen-devel@lists.xenproject.org" List-Id: xen-devel@lists.xenproject.org --===============6142068573641463827== Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-Fn5nLDWlKSqjqoS35xSE" --=-Fn5nLDWlKSqjqoS35xSE Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, 2017-11-09 at 06:08 -0700, Jan Beulich wrote: > > > > On 09.11.17 at 12:01, wrote: > >=20 > > pCPU1 > > =3D=3D=3D=3D=3D > > current =3D=3D vCPU1 > > context_switch(next =3D=3D idle) > > !! __context_switch() is skipped > > vcpu_migrate(vCPU1) > > anything_that_uses_or_touches_context() > >=20 > > So, it must be anything_that_uses_or_touches_context() --knowing it > > will be reading or touching the pCPU context-- that syncs the > > state, > > before using or touching it (which is, e.g., what Jan's patch > > does). >=20 > Well, generally after the vcpu_migrate(vCPU1) above we expect > nothing to happen at all on the pCPU, until another (non-idle) > vCPU gets scheduled onto it.=20 > Ah, yes, my bad! What if I take vcpu_migrate() out of the above exec- trace (which is what I wanted to do in my email already)? pCPU1 =3D=3D=3D=3D=3D current =3D=3D vCPU1 context_switch(next =3D=3D idle) !! __context_switch() is skipped anything_that_uses_or_touches_context() Point being, is the underlying and general "issue" here, really bound to migrations, or is it something intrinsic of lazy context switch? I'm saying it's the latter. That being said, sure it makes sense to assume that, if we migrated the vCPU from pCPU1 to pCPU2, it's highly unlikely that it will resume the execution on pCPU1, and hence there is no point in leaving its context there, and we could very well sync. It's a reasonable best-effort measure, but can we rely on it for correctness? I don't think we can. And generalizing the idea enough that we could then rely on it for correctness, tends to be close enough to not doing lazy context switch at all, I think. > The problem is with tasklet / softirq > (and hence also RCU) work.=20 > Yes. > Tasklets already take care of this by > calling sync_local_execstate() before calling the handler. But > for softirqs this isn't really an option; I'm surprised to see that > tasklet code does this independently of what kind of tasklet that > is.=20 > Good point. Weird indeed. > Softirq tasklets aren't used very often, so I wonder if we have > a latent bug here. Otoh, if this was actually fine, adding a similar > call to rcu_do_batch() (or its caller) would be an option, I think. >=20 We can have a look. What about the effect on performance, though? I mean, assuming that lazy context switch does a good job, with respect to that, by avoiding synching in enough case where it is actually not necessary, how do things change if we start to sync at any softirq, even when the handler would have not required that? Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli --=-Fn5nLDWlKSqjqoS35xSE Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEES5ssOj3Vhr0WPnOLFkJ4iaW4c+4FAloEYyIACgkQFkJ4iaW4 c+5jsw/9FmFMZJOydIXnNywLmmAaqtVE4lyhSeoX3QlC/vlqO5FMgnr9b2E/c2M6 W4x72R41vc4rMNq4Oc14yvkZ0oEua58hbKV4Swdx3u22gIbuhJMFnoydCQHgqC9A t+1fJg4e9sA9JLJI7F6A5mirTZC59xdlIAIDC+ye7IJZG8PH89IAirC5ybde1ULy r+zmrh8Ht2YFiz6iJOwDluhGIQ+r+nt+ChPEXvx5c9NiaK0nI7iw1abu3IfchfwR 6yAD7uqfnIhH4IhBAC3UioOB8F5SuOSQtV0zkx8plVnG4i1c4+mY1ILmX75pVLg3 b6IGlQRFvpb0Nf0t8HA475ME0EaiW5RoDAlGg/THrT5B6l3+C0jihVCVZTf3Hpsn 54Y6pdU/FpcirgyS6MGgJgSG1HdhNs0vp1/BKBhh05A2Slb9XzwSdu9kCbbwPu4a JHkMLTq2AW3r7xWuvg2m300OQe8E43Bqt0r6DxZ88AnrGlX1yo7adLC2G6wW8Pai XteeH7zEP4ikCR/1oYVATmwyDSBau0Sas0oL76G47tjpH3l/s0QFJEEwmDU2v5UP MOM5+FGsP6DuI2Pt4VRK3j7oUcyQ9ex6O/uKJ1avA9E+FIliPTuaWOLirWHs64oa y1ZVJCE5byie+y8nRP/XmY9h4nMk3eTTV54I1ZFVB4HacU17m0o= =Pgn+ -----END PGP SIGNATURE----- --=-Fn5nLDWlKSqjqoS35xSE-- --===============6142068573641463827== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============6142068573641463827==--