From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: RFC/PATCH: xen: race during domain destruction [Re: [xen-4.7-testing test] 105948: regressions - FAIL] Date: Fri, 24 Feb 2017 17:14:37 +0100 Message-ID: <1487952877.5548.26.camel@citrix.com> References: <4b6337e3-74b3-8ecd-d7a6-aa8a451fb8d3@citrix.com> <58AD5DDA020000780013C9ED@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7667098525168132483==" Return-path: Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1chIW7-0005K6-IZ for xen-devel@lists.xenproject.org; Fri, 24 Feb 2017 16:14:47 +0000 In-Reply-To: <58AD5DDA020000780013C9ED@prv-mh.provo.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Jan Beulich , Andrew Cooper Cc: George Dunlap , xen-devel , osstest service owner , Juergen Gross List-Id: xen-devel@lists.xenproject.org --===============7667098525168132483== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-D30dsFl3YKD9dYvNvG3i" --=-D30dsFl3YKD9dYvNvG3i Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable [Adding Juergen] On Wed, 2017-02-22 at 01:46 -0700, Jan Beulich wrote: > > > > On 22.02.17 at 01:02, wrote: > > (XEN) Xen call trace: > > (XEN)=C2=A0=C2=A0=C2=A0=C2=A0[] > > sched_credit2.c#vcpu_is_migrateable+0x22/0x9a > > (XEN)=C2=A0=C2=A0=C2=A0=C2=A0[] > > sched_credit2.c#csched2_schedule+0x823/0xb4e > > (XEN)=C2=A0=C2=A0=C2=A0=C2=A0[] schedule.c#schedule+0= x108/0x609 > > (XEN)=C2=A0=C2=A0=C2=A0=C2=A0[] softirq.c#__do_softir= q+0x7f/0x8a > > (XEN)=C2=A0=C2=A0=C2=A0=C2=A0[] do_softirq+0x13/0x15 > > (XEN)=C2=A0=C2=A0=C2=A0=C2=A0[] domain.c#idle_loop+0x= 55/0x62 > > (XEN) > > (XEN) > > (XEN) **************************************** > > (XEN) Panic on CPU 14: > > (XEN) Assertion 'd->cpupool !=3D NULL' failed at > > ...5948.build-amd64/xen/xen/include/xen/sched-if.h:200 > > (XEN) **************************************** > > (XEN) > > (XEN) Manual reset required ('noreboot' specified) > >=20 > > I am guessing the most recent credit2 backports weren't quite so > > safe? >=20 Well, what I'd say we're facing is the surfacing of a latent bug. > However, comparing with the staging version of the file > (which is heavily different), the immediate code involved here isn't > all that different, so I wonder whether (a) this is a problem on > staging too or (b) we're missing another backport. Dario? >=20 So, according to my investigation, this is a genuine race. It affects this branch as well as staging, but it manifests less frequently (or, I should say, very rarely) in the latter. The problem is that the Credit2's load balancer operates not only on runnable vCPUs, but also on blocked, sleeping, and paused ones (and that's by design). In this case, the original domain is in the process of being destroyed, =C2=A0after migration completed, and reaches the point where, within domain_destroy(), we call cpupool_rm_domain(). This remove the domain from any cpupool, and sets d->cpupool =3D NULL. Then, on another pCPU --since the vCPUs of the domain are still around (until we call sched_destroy_vcpu(), which happens much later-- and they also are still assigned to a Credit2 runqueue, balance_load() picks up one of them for moving to another runqueue, and things explode when we realize that the vCPU is actually out of any pool! So, I've thought quite a bit of how to solve this. Possibilities are to act at the Credit2 level, or outside of it. I drafted a couple of solutions only affecting sched_credit2.c, but could not be satisfied with the results. And that's because I ultimately think it should be safe for a scheduler that it can play with a vCPU that it can reach out to, and that means the vCPU must be in a pool. And that's why I came up with the patch below. This is a draft and is on top of staging-4.7. I will properly submit it against staging, if you agree with me it's an ok thing to do. Basically, I anticipate a little bit calling sched_destroy_vcpu(), so that it happens before cpupool_rm_domain(). This ensures that vCPUs have valid cpupool information until the very last moment that they are accessible from a scheduler. --- diff --git a/xen/common/domain.c b/xen/common/domain.c index 45273d4..4db7750 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -643,7 +643,10 @@ int domain_kill(struct domain *d) if ( cpupool_move_domain(d, cpupool0) ) return -ERESTART; for_each_vcpu ( d, v ) + { unmap_vcpu_info(v); + sched_destroy_vcpu(v); + } d->is_dying =3D DOMDYING_dead; /* Mem event cleanup has to go here because the rings=20 * have to be put before we call put_domain. */ @@ -807,7 +810,6 @@ static void complete_domain_destroy(struct rcu_head *he= ad) continue; tasklet_kill(&v->continue_hypercall_tasklet); vcpu_destroy(v); - sched_destroy_vcpu(v); destroy_waitqueue_vcpu(v); } --- Let me know. Thanks and Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-D30dsFl3YKD9dYvNvG3i Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJYsFvuAAoJEBZCeImluHPu150QAL/zJ1a4YmUjEBMIAP2CMXWt AiJUjVB+847O7LBN5/lTWMP/JXP/pbkFdhx7ER9VuH48fBRKilNXylEjWQy2fXWv sCSSwWs7sh7OXRz6vEAgc9Pfu36EWb+NWMmAYrUjms6ZP4vEGjC52a/i/JeI2DSF 7LiIsfBZYWg2Zmtbu9BZIVUCJS38MKqOrYCU+/0EtF4jkjIQPX1ul0Y0Jpmq9J06 3U0/eVspkX9r5R/fCwxokjp0YBVb8asxiw/uBGsD9vQYKqOQVLQvouLkduKDf3jf 71i/NGQTv5IEEDDz99I+k3IUMi+3ZlMEEMbKq2w6tnHrfFNCPSPGTDAGeWcGeraM gC83Ei/7CorUI+RTgRswJbO+SLWTVZ3pNuIgZtj4WDqstGl//5Jhh0vMxyqsno1g c26prNW8ISkFOgsJ3TxRRXMV4RiFWf14As2lBn8HGjgiM4oOj+bYQh6h2ZgChor9 k6ae3HYWvpRoo+m47JjZ4zCNIBalRLn8WeZnuHD7zjiKwz617MDCHIeUvzAQDONv oYcjOgju0+kaP66G+9INYZfo4vXmz4w/a2axcZlrEuV+eleVRrZs2h22mSpgtJSo K3jO2ImkE1W0IKf4ftn2ePrezxaz+EBvyPYofGQ8Y8cJ9Sid6hhPAB0kBOmNZZF7 KqMeLd4mGg1WXxEfL9t0 =TKLD -----END PGP SIGNATURE----- --=-D30dsFl3YKD9dYvNvG3i-- --===============7667098525168132483== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============7667098525168132483==--