From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH] xen: credit1: fix a race when picking initial pCPU for a vCPU Date: Fri, 12 Aug 2016 17:17:16 +0200 Message-ID: <1471015036.6250.100.camel@citrix.com> References: <147097482567.29177.2373077001942557324.stgit@Solace.fritz.box> <9ae42c19-8326-f13c-4c9c-83641c93efcd@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3109679135660085942==" Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bYED7-0005oE-72 for xen-devel@lists.xenproject.org; Fri, 12 Aug 2016 15:17:25 +0000 In-Reply-To: <9ae42c19-8326-f13c-4c9c-83641c93efcd@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: George Dunlap , xen-devel@lists.xenproject.org Cc: George Dunlap , Andrew Cooper , Jan Beulich List-Id: xen-devel@lists.xenproject.org --===============3109679135660085942== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-bMjlxAEy6xnUAJPPsG2W" --=-bMjlxAEy6xnUAJPPsG2W Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, 2016-08-12 at 10:14 +0100, George Dunlap wrote: > On 12/08/16 05:07, Dario Faggioli wrote: > Let me know if you want me to check this in as-is or if you think you > might send a follow-up patch adding an ASSERT. >=20 Done, and it actually explodes like this: (XEN) [=C2=A0=C2=A0=C2=A0=C2=A04.870128] Xen call trace: (XEN) [=C2=A0=C2=A0=C2=A0=C2=A04.870130]=C2=A0=C2=A0=C2=A0=C2=A0[] spinlock.c#check_lock+0x42/0x46 (XEN) [=C2=A0=C2=A0=C2=A0=C2=A04.870133]=C2=A0=C2=A0=C2=A0=C2=A0[] _spin_is_locked+0x11/0x4d (XEN) [=C2=A0=C2=A0=C2=A0=C2=A04.870139]=C2=A0=C2=A0=C2=A0=C2=A0[] sched_credit.c#_csched_cpu_pick+0x1a9/0x632 (XEN) [=C2=A0=C2=A0=C2=A0=C2=A04.870142]=C2=A0=C2=A0=C2=A0=C2=A0[] sched_credit.c#csched_tick+0x1fd/0x385 (XEN) [=C2=A0=C2=A0=C2=A0=C2=A04.870146]=C2=A0=C2=A0=C2=A0=C2=A0[] timer.c#execute_timer+0x47/0x62 (XEN) [=C2=A0=C2=A0=C2=A0=C2=A04.870148]=C2=A0=C2=A0=C2=A0=C2=A0[] timer.c#timer_softirq_action+0xdb/0x22c (XEN) [=C2=A0=C2=A0=C2=A0=C2=A04.870151]=C2=A0=C2=A0=C2=A0=C2=A0[] softirq.c#__do_softirq+0x7f/0x8a (XEN) [=C2=A0=C2=A0=C2=A0=C2=A04.870153]=C2=A0=C2=A0=C2=A0=C2=A0[] do_softirq+0x13/0x15 (XEN) [=C2=A0=C2=A0=C2=A0=C2=A04.870157]=C2=A0=C2=A0=C2=A0=C2=A0[] entry.o#process_softirqs+0x21/0x30 (XEN) [=C2=A0=C2=A0=C2=A0=C2=A04.870159]=C2=A0 (XEN) [=C2=A0=C2=A0=C2=A0=C2=A05.619096]=C2=A0 (XEN) [=C2=A0=C2=A0=C2=A0=C2=A05.621085] **********************************= ****** (XEN) [=C2=A0=C2=A0=C2=A0=C2=A05.626536] Panic on CPU 0: (XEN) [=C2=A0=C2=A0=C2=A0=C2=A05.629826] Xen BUG at spinlock.c:48 (XEN) [=C2=A0=C2=A0=C2=A0=C2=A05.633895] **********************************= ****** And if I look at=C2=A0csched_tick(), it indeed is the case that we call=C2=A0csched_vcpu_acct() **without** holding the runq lock. It in turns calls things like burn_credits(), accesses current, and other stuff, which I'm having a little bit of an hard time convincing myself it is safe... Although it must be, if there have been no issues after all these years. :-O csched_runq_sort(), called later, still by csched_tick(), acquires the lock by itself, and we can't acquire it in csched_tick(), because __csched_vcpu_acct_start() acquires the private lock, and we'd violate the nesting rule. In summary, this is looking more complicated than it seemed, and I'll have to look at this again on Tuesday (it's public holiday, here, on Monday). Gosh, how much I hate this scheduler!! :-/ Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-bMjlxAEy6xnUAJPPsG2W Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXreh8AAoJEBZCeImluHPua+UP/3OycEZPm65a+9D7UeNiJn53 IQ/fd97hpkFu/8jR7FjDyzR/3lyTfA5uEhVnEJGWwAVskOeLhkwDBXn+Lyk6zhud wMr976YlnGtkfaBi/8Nc3URI1+Eae+M+ITyfOXgcD1UYm3C1PkKYPB4Vj3BmRu8P lxC2Kdj8chWyZ40c+50WZTzgpuyNFD+WgI4YmzTzHXPTHT81pxyvmSzThHTBA8PW mJCL9ZlrsR0quuzVpLnuCXxD5HK4J/boi6Subv/fT2CsHj2RMMkHjhOXTECgW/yu GKmGblheFPYaqUz1ZQuJJYFKo/HFCnXk2SfuFisgSwsS8CFTgzOFyPwHOsoZdUfG dEJimGnrEIiBPR+yi9mtezO3DV5ghVisUa1y/Qb8ytDgBqkcVYPlQbHzJc08sa14 T3L1gygGDKLfiM1o+flr2OhSrEK4jp9Ujh82WEI0jjdQvS3vMiecV7JxP/8pm6sM pkP9+9LGysSdpMoMFEEMqh6m00QyicdKBGf24WsF21BAsLBMmVNkvkag68QquQwk yb5q8PARfx3cbTBhxhBEhlZhsYA3DJrABpjtH0BKgM/q4MiMPQWr19WMu18i/yw3 H6qVei/kwXMjbNfPVFBQON9pP0yg1ZLKDrAoJoa6sTaCfYIlIDblLixJ9komTtnv jviZvQj6Owf9Ilwfoiqz =UkBC -----END PGP SIGNATURE----- --=-bMjlxAEy6xnUAJPPsG2W-- --===============3109679135660085942== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============3109679135660085942==--