From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH v2 1/2] xen: fix a (latent) cpupool-related race during domain destroy Date: Fri, 15 Jul 2016 16:23:25 +0200 Message-ID: <1468592605.13039.104.camel@citrix.com> References: <146851288308.22413.4619190133086534604.stgit@Solace.fritz.box> <146851308019.22413.8905002507733716302.stgit@Solace.fritz.box> <5788AF33.2030603@suse.com> <1468577660.13039.78.camel@citrix.com> <5788BCA2.4020404@suse.com> <1468583562.13039.96.camel@citrix.com> <5788DC89.4090009@suse.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============9092271910278885371==" Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bO42C-00046p-HN for xen-devel@lists.xenproject.org; Fri, 15 Jul 2016 14:24:08 +0000 In-Reply-To: <5788DC89.4090009@suse.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Juergen Gross , xen-devel@lists.xenproject.org Cc: George Dunlap , Andrew Cooper , Jan Beulich List-Id: xen-devel@lists.xenproject.org --===============9092271910278885371== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-Szlmbo7jd06N3ERY1H7g" --=-Szlmbo7jd06N3ERY1H7g Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, 2016-07-15 at 14:52 +0200, Juergen Gross wrote: > On 15/07/16 13:52, Dario Faggioli wrote: > >=20 > > On Fri, 2016-07-15 at 12:36 +0200, Juergen Gross wrote: > > >=20 > > > On 15/07/16 12:14, Dario Faggioli wrote: > > > >=20 > > > > In particular, I'm probably not fully understanding, from that > > > > commit > > > > changelog, what is the set of operations/command that I should > > > > run > > > > to > > > > check whether or not I reintroduced the issue back. > > > You need to create a domain in a cpupool and destroy it again > > > while > > > some dom0 process still is holding a reference to it (resulting > > > in a > > > zombie domain). Then try to destroy the cpupool. > > >=20 > > Ah, I see. I wasn't get the fact that it needed to be a zombie > > domain > > from anywhere. > >=20 > > >=20 > > > >=20 > > > > What am I missing? > > > The domain being a zombie domain might change the picture. Moving > > > it > > > to > > > cpupool0 was failing before my patch and it might do so again > > > with > > > your > > > patch applied. > > >=20 > > Mmmm... I don't immediately see the reason why moving a zombie > > domain > > fails either, but I guess I'll have to try. > Searching through the history I found commit > 934e7baa6c12d19cfaf24e8f8e27d6c6a8b8c5e4 which might has removed the > problematic condition (cpupool->n_dom being non-zero while d->cpupool > was NULL already). >=20 Yeah.. And there's also this: /* =C2=A0* unassign a specific cpu from a cpupool =C2=A0* we must be sure not to run on the cpu to be unassigned! to achieve = this =C2=A0* the main functionality is performed via continue_hypercall_on_cpu o= n a =C2=A0* specific cpu. =C2=A0* if the cpu to be removed is the last one of the cpupool no active d= omain =C2=A0* must be bound to the cpupool. dying domains are moved to cpupool0 a= s they =C2=A0* might be zombies. =C2=A0* possible failures: =C2=A0* - last cpu and still active domains in cpupool =C2=A0* - cpu just being unplugged =C2=A0*/ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu) { =C2=A0=C2=A0=C2=A0=C2=A0int work_cpu; =C2=A0=C2=A0=C2=A0=C2=A0int ret; =C2=A0=C2=A0=C2=A0=C2=A0struct domain *d; =C2=A0=C2=A0=C2=A0=C2=A0cpupool_dprintk("cpupool_unassign_cpu(pool=3D%d,cpu= =3D%d)\n", =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0c->cpupool_id, cpu); [...] =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0for_each_domain_in_cpupool(= d, c) =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0{ =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if = ( !d->is_dying ) =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0{ =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0ret =3D -EBUSY; =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0break; =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0ret= =3D cpupool_move_domain_locked(d, cpupool0); =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if = ( ret ) =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0break; =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} [...] So it really looks like it ought to be possible to move zombies to cpupool0 these days. :-) > > Therefore, I still think this patch is correct, but I'm up for > > investigating further and finding a way to solve the "zombie in > > cpupool" issue as well. > I'm not saying your patch is wrong. I just wanted to give you a hint > about the history of the stuff you are changing. :-) >=20 Sure, and that's much appreciated! :-) > If it is working I'd really prefer it over the current situation. >=20 Right. I've got to leave now. But I'll produce some zombies on Monday, and will see if they move. :-D Thanks and Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-Szlmbo7jd06N3ERY1H7g Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXiPHdAAoJEBZCeImluHPuhC4P/iVCix1cYY6rAKOtR1cZzYbj 0hwEsiI25u/ZrZLrXD2aEuPetKI1nc2pg2hYCJzIoZDV4gCjR5orSvURQ0tUo1OD wSDEVf7O3P/Dkaq4mS1VY5qrq1JeTPaO+uYYjfm5nd2IW6fxEMf1gt8MaKhg0CGS RzvNZS+H5T2Uu7H5Cp+MLi6QdMI39REiaHZBXGxEKngMfL2OHyFyxJFJXbY1T50c pEcXQG5VF4mp1YCEKE3Swe4E/uQa5F8IosnuCU2DTd9XXfzVcAVgkGFHeNph5fEi VlkLmEKMeMLU30sIF80ek3AbNUwT7k9BlXwJbywyBEsXEVYUNTHGv+qG1N/D/vDN GngpuIZJL4nduAhTJ8YDKiVzmvQusp78tSURx87HFVEfX6pOHxjO6lcCLtZk3DSn pswwzKTMqDr9vzEXk93iwFs30VG36ug51LcSMwSSdadWJuKbEwZV1ele6+SePYeC E0cGfCAbd8Ivp69+g7TEHAVBrk4Kv+Db+xUNKoiVYvfCivgwh0PPauChpc9PQ8kN xfgfyofQKrLpV3l4Es8kCPFBQfu5BNdpIZRC3IS6UT54pjkH/lrfT1zUuVzpl1ra UhYK0FKrTu11X5ivC9Ue0pyFVxXKBZFGYVOsI9/l+H6fS0ebCYT10LE38a2n2Y6K p+aooWrdxwaeEF61ID3q =oDxT -----END PGP SIGNATURE----- --=-Szlmbo7jd06N3ERY1H7g-- --===============9092271910278885371== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============9092271910278885371==--