From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH v2 1/2] xen: credit1: fix a race when picking initial pCPU for a vCPU Date: Fri, 19 Aug 2016 16:02:26 +0200 Message-ID: <1471615346.6806.108.camel@citrix.com> References: <147151337343.29674.1081345215393715232.stgit@Solace.fritz.box> <147151445223.29674.955105994328843699.stgit@Solace.fritz.box> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1263966729532062997==" Return-path: Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bakNY-0005TB-SE for xen-devel@lists.xenproject.org; Fri, 19 Aug 2016 14:02:36 +0000 In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: George Dunlap , xen-devel@lists.xenproject.org Cc: George Dunlap , Andrew Cooper , Jan Beulich List-Id: xen-devel@lists.xenproject.org --===============1263966729532062997== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-8xAPdvl0WG82tzkWjT4y" --=-8xAPdvl0WG82tzkWjT4y Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, 2016-08-19 at 13:23 +0100, George Dunlap wrote: > On 18/08/16 11:00, Dario Faggioli wrote: > > @@ -248,6 +245,33 @@ __runq_elem(struct list_head *elem) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return list_entry(elem, struct csched_vcp= u, runq_elem); > > =C2=A0} > > =C2=A0 > > +/* Is the first element of cpu's runq (if any) cpu's idle vcpu? */ > > +static inline bool_t is_runq_idle(unsigned int cpu) > > +{ > > +=C2=A0=C2=A0=C2=A0=C2=A0/* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* If we are on cpu, and we are peeking a= t our own runq while > > cpu itself > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* is not idle, that's fine even if we do= n't hold the runq > > lock. In fact, > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* the fact that there is a (non idle!) v= cpu running means > > that at least > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* the idle vcpu is in the runq. And sinc= e only cpu itself > > (via work > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* stealing) can add stuff to the runq, a= nd no other cpu will > > ever steal > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* our idle vcpu, that maks the runq mani= pulations done below > > safe, even > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* without locks. > Thanks for investigating this and figuring out why the lockless > access > hasn't caused a problem before.=C2=A0=C2=A0But relying on this behavior g= oing > forward doesn't really seem like a great idea if we can avoid it. >=20 I totally agree. > We can't grab the pcpu scheduler lock in csched_tick(), or in the > whole > of csched_vcpu_acct() because we grab the private lock in > __csched_vcpu_acct_start() (and that violates the locking > order).=C2=A0=C2=A0But > is there a reason we can't grab the pcpu lock just around the call to > _csched_cpu_pick? >=20 The first version of this patch, here in my stgit patchqueue, looked exactly like that. ISTR I even tested it, and it works. Then I thought that, since in this case it's all about making an ASSERT() happy, it may be a good thing to avoid introducing more contention. Also, I see your point on robustness/reliability. My view is that locking on this path (if not on Credit1 in general) is already so bad, that I don't think it's possible to make it any worse (and hence wans't feeling guilty about taking going the way I did). :-) *BUT* I don't have a too strong opinion, and if you prefer 'take lock' approach, I'm fine with that. I'll send v3. Thanks and Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-8xAPdvl0WG82tzkWjT4y Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXtxFyAAoJEBZCeImluHPuH+0QANHAFqGT/yiOXgkcRdIUpVK3 D3kl8dtHWJuByu7FJMHCX7G5aiXkgUl+vwPw3RqiPU4/VJAFpENtXc4PSJzrUV0B BIVNO8CclDcTrl+Pei3qhSo/fTbfnurRBxsmEcQ6Qf+NoN6+77NEn71s61bjJWqe 8SPWidoKOGYadlLIsvp6+bOQPhQCQk68Nlaya0GfmRSUfYXanDOpS3cEZIUlQU+2 HyHcaOlf8J4cs7Of75VQSJf5aiEqAeKd3A3acoNsuypUfQbf3vetZjJFnLEFvQqV 3fmOyuNVBUzpDsr6lP/ypU2B8KBBEU5z9Ub6AqPV8DZgR+GX7eDF/Qj/gZhl2WJg APMl3Yg8YqsVoV6se96d1C/vRS0n4fv6tKDnbuwgA0YYOZVduJoCyQPoe42QgmZg ljIwHKd6AloZOHTZcxAC3Y0y2RW+BWxMfPIjKmPB+Ra96sv5rc0PQPXbAw4GBdAm eaAWsQODnNa5wz1ApVEywbwvxdyxeEZO1aBMzYbSNZI6KYgUMiyG98ffhkEaEYXf T6H5KmpmmH/HqPBPHehNpOM9irnOUdOk+bWTVF3XNFTyoZ0o7dexVEL2L0IOPwjF vXGDww+zCmQgzB2GgG2s3tzFjkCnPoUyiRXaqNmNYgrD+5NNGK9r8Cb+m941uhwi jOH2bI/lgclv+ciFp6Er =Ph9N -----END PGP SIGNATURE----- --=-8xAPdvl0WG82tzkWjT4y-- --===============1263966729532062997== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============1263966729532062997==--