From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH v2 3/7] xen: credit: consider tickled pCPUs as busy. Date: Fri, 7 Apr 2017 14:53:46 +0200 Message-ID: <1491569626.3287.11.camel@citrix.com> References: <149146456487.21348.8554211499146017782.stgit@Solace.fritz.box> <149146658605.21348.4731246857836407966.stgit@Solace.fritz.box> <90dc30e6-635c-c3b3-548a-5bcb7aefca3d@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2110476575960521164==" Return-path: Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cwTOl-0000pJ-I6 for xen-devel@lists.xenproject.org; Fri, 07 Apr 2017 12:53:55 +0000 In-Reply-To: <90dc30e6-635c-c3b3-548a-5bcb7aefca3d@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: George Dunlap , xen-devel@lists.xenproject.org Cc: George Dunlap , Anshul Makkar List-Id: xen-devel@lists.xenproject.org --===============2110476575960521164== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-Xp/J4ucIw/CahdwkS6+q" --=-Xp/J4ucIw/CahdwkS6+q Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, 2017-04-07 at 11:56 +0100, George Dunlap wrote: > On 06/04/17 09:16, Dario Faggioli wrote: > > Currently, it can happen that __runq_tickle(), > > running on pCPU 2 because vCPU x woke up, decides > > to tickle pCPU 3, because it's idle. Just after > > that, but before pCPU 3 manages to schedule and > > pick up x, either __runq_tickel() or > > __csched_cpu_pick(), running on pCPU 6, sees that > > idle pCPUs are 0, 1 and also 3, and for whatever > > reason it also chooses 3 for waking up (or > > migrating) vCPU y. > >=20 > > When pCPU 3 goes through the scheduler, it will > > pick up, say, vCPU x, and y will sit in its > > runqueue, even if there are idle pCPUs. > >=20 > > Alleviate this by marking a pCPU to be idle > > right away when tickling it (like, e.g., it happens > > in Credit2). > >=20 > > Note that this does not eliminate the race. That > > is not possible without introducing proper locking > > for the cpumasks the scheduler uses. It significantly > > reduces the window during which it can happen, though. > >=20 > > Introduce proper locking for the cpumask can, in > > theory, be done, and may be investigated in future. > > It is a significant amount of work to do it properly > > (e.g., avoiding deadlock), and it is likely to adversely > > affect scalability, and so it may be a path it is just > > not worth following. > >=20 > > Signed-off-by: Dario Faggioli > > --- > > Cc: George Dunlap > > Cc: Anshul Makkar > > --- > > =C2=A0xen/common/sched_credit.c |=C2=A0=C2=A0=C2=A022 +++++++++++++++++= ++--- > > =C2=A01 file changed, 19 insertions(+), 3 deletions(-) > >=20 > > diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c > > index 5a3f13f..c753089 100644 > > --- a/xen/common/sched_credit.c > > +++ b/xen/common/sched_credit.c > > @@ -489,7 +489,17 @@ static inline void __runq_tickle(struct > > csched_vcpu *new) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0__trace_var(TRC_CSCHED_TICKLE, 1, sizeof(c= pu), > > &cpu); > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} > > =C2=A0 > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* Send scheduler inte= rrupts to designated CPUs */ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* Mark the desig= nated CPUs as busy and send them all the > > scheduler > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* interrupt. We = need the for_each_cpu for dealing with > > the > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* !opt_tickle_on= e_idle case. We must use > > cpumask_clear_cpu() and > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* can't use cpum= ask_andnot(), because prv->idlers needs > > atomic access. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* In the default= (and most common) case, when > > opt_rickle_one_idle is > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* true, the loop= does only one step, and only one bit is > > cleared. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0for_each_cpu(cpu, &mas= k) > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0cpumask_clear_cpu(cpu, prv->idlers); > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0cpumask_raise_sof= tirq(&mask, SCHEDULE_SOFTIRQ); > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0else > > @@ -985,6 +995,8 @@ csched_vcpu_acct(struct csched_private *prv, > > unsigned int cpu) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0SCHED_VCPU_STAT_CRANK(svc, migrate_r); > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0SCHED_STAT_CRANK(migrate_running); > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0set_bit(_VPF_migrating, ¤t->pause_flags); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0ASSERT(!cpumask_test_cpu(cpu, > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0CSCHED_PRIV(per_cpu(scheduler > > , cpu))->idlers)); >=20 > What are these about?=C2=A0=C2=A0Is this just to double-check that the "i= dler > accounting" logic is correct? > No, raising the SCHEDULE_SOFTIRQ (which is what we do here, right below the added ASSERT(), and also in the other place where I added another one) basically means tickling the cpu we raise it for. Therefore, according to the very purpose of this patch, we should clear the bit corresponding to the cpu itself in the idler mask. In practise, that is not necessary, because it (in both cases) that happens to be cleared already. The ASSERT()-s are for both making this explicit, and check/enforce it. I appreciate it's not that clear, though. I'll add comments. Thanks and Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-Xp/J4ucIw/CahdwkS6+q Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJY54vaAAoJEBZCeImluHPuY6AP/j3g/XeUXKdyy9xWDno5Bggr vIJf6l37+he6PgO1sMdx3rYOvCAFV/MAPjyw/5eCNjglzQZAYuC+DdTr9hQuAlkH jIzWbmOQNMgWB3QgCsO/1FN84/TJmmg80dfKAvuLg2pADPsKPMixwVrC6D1PlUNL YBtiGSSh7D0pPa3P2B3rer/PoRZbPBPYE8up7c6eQ+guFDCe+bOhQqRolvQigkKt 6ZXnLOj2es45DejF/YdB+WGFYBnTmADnrhlfh2gflGDxf+cWPz5va+Lxaj845Eph o4ukf26PLon5AOM+J6PKN8e1k+rPupKCEyMxTyRTbb/lW9vK5zDIT4yBohkrP8UN 4m4X6S6V820A+KNU7fuo52H74VBs5Se2EP/BaSKF30opab5AS2g5W14k6Xiga/Gu fmwSmlqYb0KvavNKJcO/T+Q25R1aZOeo6pmmUPBXAuy7gcqZnOE9A+IrcR96ZAN8 Q5M5FZW+EBWJwmS82b5xzkcnjP5kEkAKisFIw9UHfU+dnEdTdqIdq2uVKhTFHUaL hvN3T6Eu2dIxTlEZDtIr7xkDnONOtHCFSotZyQAX0qhSh1+Nen7ibJ+FyOFYKnYc HIHX+DIfT7lrQtY+9+Opu5/p5Uyj0auXE1eQC5l4uDg32Lf7Ru/cPCk4plAWDy3n OMQ2pxVXX4z+xCoOYItN =E9Ej -----END PGP SIGNATURE----- --=-Xp/J4ucIw/CahdwkS6+q-- --===============2110476575960521164== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============2110476575960521164==--