From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH 06/24] xen: credit2: implement yield() Date: Thu, 29 Sep 2016 18:05:34 +0200 Message-ID: <1475165134.5315.27.camel@citrix.com> References: <147145358844.25877.7490417583264534196.stgit@Solace.fritz.box> <147145429544.25877.17160453823324324872.stgit@Solace.fritz.box> <6e466a9e-c051-7d02-a626-96b4856cf9b3@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8641882415746854877==" Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bpdrJ-0005l4-Q2 for xen-devel@lists.xenproject.org; Thu, 29 Sep 2016 16:06:53 +0000 In-Reply-To: <6e466a9e-c051-7d02-a626-96b4856cf9b3@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: George Dunlap , xen-devel@lists.xenproject.org Cc: George Dunlap , Andrew Cooper , Anshul Makkar , Jan Beulich List-Id: xen-devel@lists.xenproject.org --===============8641882415746854877== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-nBRXqEmE2Pd+DgM38mKH" --=-nBRXqEmE2Pd+DgM38mKH Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, 2016-09-13 at 14:33 +0100, George Dunlap wrote: > On 17/08/16 18:18, Dario Faggioli wrote: > > Alternatively, we can actually _subtract_ some credits to a > > yielding vcpu. > > That will sort of make the effect of a call to yield last in time. >=20 > But normally we want the yield to be temporary, right?=C2=A0=C2=A0The kin= ds of > places it typically gets called is when the vcpu is waiting for a > spinlock held by another (probably pre-empted) vcpu.=C2=A0=C2=A0Doing a > permanent > credit subtraction will bias the credit algorithm against cpus that > have > a high amount of spinlock contention (since probably all the vcpus > will > be calling yield pretty regularly) >=20 Yes, indeed. Good point, actually. However, one can also think of a scenario where: =C2=A0- A yields, and is descheduled in favour of B, as a consequence of =C2=A0 =C2=A0that =C2=A0- B runs for just a little while and blocks =C2=A0- C and A are in runqueue, and A, without counting the idle bias, has= =C2=A0 =C2=A0 =C2=A0more credit than C. So A will be picked up again, even if it= =C2=A0 =C2=A0 =C2=A0yielded very recently, and it may still be in the spinlock wai= t (or=C2=A0 =C2=A0 =C2=A0whatever place that is yielding in a tight loop) Well, in this case, A will yield again, and C will be picked, i.e., what would have happened in the first place, if we subtracted credits to A. (I.e., functionally, this would work the same way, but with more overhead.) So, again, can this happen? How frequently, both in absolute and relative terms? Very hard to tell! So, really... >=20 > Yes, this is simple and should be effective for now.=C2=A0=C2=A0We can lo= ok at > improving it later. >=20 ...glad you also think this. Let's go for it. :-) > > diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen- > > command-line.markdown > > @@ -1389,6 +1389,16 @@ Choose the default scheduler. > > =C2=A0### sched\_credit2\_migrate\_resist > > =C2=A0> `=3D ` > > =C2=A0 > > +### sched\_credit2\_yield\_bias > > +> `=3D ` > > + > > +> Default: `1000` > > + > > +Set how much a yielding vcpu will be penalized, in order to > > actually > > +give a chance to run to some other vcpu. This is basically a bias, > > in > > +favour of the non-yielding vcpus, expressed in microseconds > > (default > > +is 1ms). >=20 > Probably add _us to the end to indicate that the number is in > microseconds. >=20 Good idea, although right now we have "sched_credit2_migrate_resist" which does not have the suffixe. Still, I'm doing as you suggest because I like it better, and we'll fix "migrate_resist" later, if we want consistency. > > @@ -2247,10 +2267,22 @@ runq_candidate(struct csched2_runqueue_data > > *rqd, > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct list_head *iter; > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct csched2_vcpu *snext =3D NULL; > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct csched2_private *prv =3D CSCHED2_P= RIV(per_cpu(scheduler, > > cpu)); > > +=C2=A0=C2=A0=C2=A0=C2=A0int yield_bias =3D 0; > > =C2=A0 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* Default to current if runnable, idle o= therwise */ > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( vcpu_runnable(scurr->vcpu) ) > > +=C2=A0=C2=A0=C2=A0=C2=A0{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* The way we act= ually take yields into account is like > > this: > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* if scurr is yi= elding, when comparing its credits with > > other > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* vcpus in the r= unqueue, act like those other vcpus had > > yield_bias > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* more credits. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( unlikely(scurr->f= lags & CSFLAG_vcpu_yield) ) > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0yield_bias =3D CSCHED2_YIELD_BIAS; > > + > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0snext =3D scurr; > > +=C2=A0=C2=A0=C2=A0=C2=A0} > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0else > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0snext =3D CSCHED2= _VCPU(idle_vcpu[cpu]); > > =C2=A0 > > @@ -2268,6 +2300,7 @@ runq_candidate(struct csched2_runqueue_data > > *rqd, > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0list_for_each( iter, &rqd->runq ) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0{ > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct csched2_vc= pu * svc =3D list_entry(iter, struct > > csched2_vcpu, runq_elem); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0int svc_credit =3D svc= ->credit + yield_bias; >=20 > Just curious, why did you decide to add yield_bias to everyone else, > rather than just subtracting it from snext->credit? >=20 I honestly don't recall. :-) It indeed feels more natural to subtract to next. I've done it that way now, let me give it a test spin and resend... > > @@ -2918,6 +2957,14 @@ csched2_init(struct scheduler *ops) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0printk(XENLOG_INFO "load tracking window = lenght %llu ns\n", > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A01ULL << opt_load_window_shift); > > =C2=A0 > > +=C2=A0=C2=A0=C2=A0=C2=A0if ( opt_yield_bias < CSCHED2_YIELD_BIAS_MIN ) > > +=C2=A0=C2=A0=C2=A0=C2=A0{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0printk("WARNING: %s: o= pt_yield_bias %d too small, > > resetting\n", > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0__func__, opt_yield_bias); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0opt_yield_bias =3D 100= 0; /* 1 ms */ > > +=C2=A0=C2=A0=C2=A0=C2=A0} >=20 > Why do we need a minimum bias?=C2=A0=C2=A0And why reset it to 1ms rather = than > SCHED2_YIELD_BIAS_MIN? >=20 You know what, I don't think we need that. I probably was thinking that we may always want to force yield to have _some_ effect, but there may be (or may will be) someone who just want to disable it at all... And in that case, this check will be in his way. I'll kill it. Thanks and regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-nBRXqEmE2Pd+DgM38mKH Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJX7TvOAAoJEBZCeImluHPuoJQP/3vswV+s18N2iancrOkWRHbl Ul7QORcaWJoRLWNjd1PiOmhtGq1k3H86nEEOAilILCVpe4K18BhvT9t/hSpcRc+G JehW0zUpPNOkFKXj0P8TLQbNtP3UUzsj7Ta7FLK9+zPip3nEqxwYCn2mYZjurfZj 0EPeXbLCILLjIrei3Qrf0y670T3fhsZ42vUbyEZ7scRqdq9Sctqa/lGEpU/pDCLA g3hSU22QuI+omJlZCwyRnwKITo/HrPac5VGqQBmGIPmditH3FHIOMe0/rTlkbAU1 1pXgIu3bHAF8s7PVTbFeT7yUm5/rJWO4rzqwo60kKgUCLlVeSCYZmp4XV76onAWJ 6SXi+S2PE0C4+Z1DOzxJmlpgiBbpLUui5dOwS6NqYGi9oFyBr9Ub1DQ9h8NcMLF8 KUaOGXHAVhXTJAlNUvmDwEGab4xk/XRrS9yvRkJrWlvULB0xNtLxw/LceIvrSGzx l0LhUk3KB9vY1ZVRoyYR8KjDISEmf23pEnHSHmKkK4wG2cFQ5gQgCumHftNCSrnm OLbvZRt+fyKwovFXueDnqQM+EFplfZ6Gt7shTzaSfjsmzcXdWP+ablOHFBNslvM/ hEP+OyrjQS124/N/3K62oq4/DRtGaHcEob3pRxB/FCD7YJGlh6FOxVm7NP6Mjm9w wnDGbUEARIfZPv/jGWPt =vdOb -----END PGP SIGNATURE----- --=-nBRXqEmE2Pd+DgM38mKH-- --===============8641882415746854877== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============8641882415746854877==--