From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH v2 1/1] ratelimit: Implement rate limit for credit2 scheduler Rate limit assures that a vcpu will execute for a minimum amount of time before being put at the back of a queue or being preempted by higher priority thread. Date: Fri, 22 Jul 2016 12:36:31 +0200 Message-ID: <1469183791.13039.288.camel@citrix.com> References: <1468844544-2229-1-git-send-email-anshul.makkar@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6650042841794355202==" Return-path: In-Reply-To: <1468844544-2229-1-git-send-email-anshul.makkar@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Anshul Makkar , xen-devel@lists.xen.org Cc: george.dunlap@eu.citrix.com List-Id: xen-devel@lists.xenproject.org --===============6650042841794355202== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-2fBekMs+xoamgtaQteOy" --=-2fBekMs+xoamgtaQteOy Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, 2016-07-18 at 13:22 +0100, Anshul Makkar wrote: > Hey, Anshul. Thanks, and sorry for the delay in reviewing. This version is an improvement, but it looks to me that you've missed a few of the review comments to v1. > It introduces a minimum amount of latency=20 > "introduces context-switch rate-limiting" > to enable a VM to batch its work and > it also ensures that system is not spending most of its time in > VMEXIT/VMENTRY because of VM that is waking/sleeping at high rate. >=20 > ratelimit can be disabled by setting it to 0. >=20 > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index 8b95a47..68bcdb8 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c >=C2=A0 > @@ -1601,6 +1602,34 @@ csched2_dom_cntl( > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return rc; > =C2=A0} > =C2=A0 > +static int csched2_sys_cntl(const struct scheduler *ops, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0struct xen_sysctl_scheduler_op *sc) > +{ > +=C2=A0=C2=A0=C2=A0=C2=A0int rc =3D -EINVAL; > +=C2=A0=C2=A0=C2=A0=C2=A0xen_sysctl_credit_schedule_t *params =3D &sc->u.= sched_credit; > +=C2=A0=C2=A0=C2=A0=C2=A0struct csched2_private *prv =3D CSCHED2_PRIV(ops= ); > +=C2=A0=C2=A0=C2=A0=C2=A0unsigned long flags; > + > +=C2=A0=C2=A0=C2=A0=C2=A0switch (sc->cmd ) > +=C2=A0=C2=A0=C2=A0=C2=A0{ > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0case XEN_SYSCTL_SCHEDOP_= putinfo: > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= if ( params->ratelimit_us && > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0( params->ratelimit_us < CSCHED2_MIN_TIMER || > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0params->ratelimit_us > > MICROSECS(CSCHED2_MAX_TIMER) )) > CSCHED2_MIN_TIMER and CSCHED2_MAX_TIMER are defined as follows: #define CSCHED2_MIN_TIMER=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0MICROSECS(500) #define CSCHED2_MAX_TIMER=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0MILLISECS(2) Which basically means they're value is 500*1000=3D500000 and 2*1000000=3D2000000. ratelimit_us is specified assuming it will be in microseconds. So, basically, you're forcing the value stored in ratelimit_us to be at least 500000, which means 500000 microseconds, which means 500 milliseconds, which is not what we want! I remember saying already (although, it may have be in pvt, not on this list) that I think we should just use=C2=A0XEN_SYSCTL_SCHED_RATELIMIT_MAX and=C2=A0XEN_SYSCTL_SCHED_RATELIMIT_MIN here. CSCHED2_MIN_TIMER and CSCHED2_MAX_TIMER are internal implementation details, and I don't like them exposed (although, indirectly) to the user. > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0return rc; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= spin_lock_irqsave(&prv->lock, flags); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= prv->ratelimit_us =3D params->ratelimit_us; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= spin_unlock_irqrestore(&prv->lock, flags); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= break; > + This is ok. However, the code base changed in the meanwhile (sorry! :- P), and this spin_lock_irqsave() needs to become a write_lock_irqsave(). > @@ -1688,9 +1719,20 @@ csched2_runtime(const struct scheduler *ops, > int cpu, struct csched2_vcpu *snext > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* 1) Run until snext's credit will be= 0 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* 2) But if someone is waiting, run u= ntil snext's credit is > equal > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* to his > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* 3) But never run longer than MAX_TIMER o= r shorter than > MIN_TIMER. > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* 3) But never run longer than MAX_TIMER o= r shorter than > MIN_TIMER or > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* run for ratelimit time. > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ >=20 I prefer George's version of this comment change: "3) But never run longer than MAX_TIMER or shorter than MIN_TIMER or the ratelimit time." > =C2=A0 > +=C2=A0=C2=A0=C2=A0=C2=A0/* Calculate mintime */ > +=C2=A0=C2=A0=C2=A0=C2=A0min_time =3D CSCHED2_MIN_TIMER; > +=C2=A0=C2=A0=C2=A0=C2=A0if ( prv->ratelimit_us ) { > Coding style. (Parenthesis goes on next line.) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0s_time_t ratelimit_min = =3D prv->ratelimit_us; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0ratelimit_min =3D snext-= >vcpu->runstate.state_entry_time + > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= MICROSECS(prv->ratelimit_us) - now; > Mmm... if you wanted to implement my suggestion from <1468400021.13039.33.camel@citrix.com>,=C2=A0you're definitely missing something: =C2=A0 =C2=A0 =C2=A0s_time_t ratelimit_min =3D prv->ratelimit_us; =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( snext->vcpu->is_running ) =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0ratelimit_min =3D sne= xt->vcpu->runstate.state_entry_time + =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= MICROSECS(prv->ratelimit_us) - now; In fact, you're initializing ratelimit_min and then immediately overriding that... I'm surprised the compiler didn't complain. > +=C2=A0=C2=A0=C2=A0=C2=A0if ( ratelimit_min > min_time ) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0min_time =3D ratelimit_m= in; > +=C2=A0=C2=A0=C2=A0=C2=A0} > + > @@ -1707,32 +1749,33 @@ csched2_runtime(const struct scheduler *ops, > int cpu, struct csched2_vcpu *snext > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} > =C2=A0 > -=C2=A0=C2=A0=C2=A0=C2=A0/* The next guy may actually have a higher credi= t, if we've > tried to > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* avoid migrating him from a different cpu= .=C2=A0=C2=A0DTRT.=C2=A0=C2=A0*/ > -=C2=A0=C2=A0=C2=A0=C2=A0if ( rt_credit <=3D 0 ) > +=C2=A0=C2=A0=C2=A0=C2=A0/* > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* The next guy ont the runqueue may actual= ly have a higher > credit, > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* if we've tried to avoid migrating him fr= om a different cpu. > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* Setting time=3D0 will ensure the minimum= timeslice is chosen. > George's draft patch had an empty line within this comment in here, separating the two paragraph. Can we keep it? (I know, this is a very minor thing, but since we're here :-D) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* FIXME: See if we can eliminate this conv= ersion if we know > time > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* will be outside (MIN,MAX).=C2=A0=C2=A0Pr= obably requires pre-calculating > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* credit values of MIN,MAX per vcpu, since= each vcpu burns > credit > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* at a different rate. > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ > +=C2=A0=C2=A0=C2=A0=C2=A0if (rt_credit > 0) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0time =3D c2t(rqd, rt_cre= dit, snext); > +=C2=A0=C2=A0=C2=A0=C2=A0else > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0time =3D 0; > + > +=C2=A0=C2=A0=C2=A0=C2=A0/* > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* Never run longer than MAX_TIMER or less = than MIN_TIMER or for > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* rate_limit time. > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ > =C2=A0=C2=A0=C2=A0=C2=A0/*=C2=A0 * 3) But never run longer than MAX_TIMER or shorter than MIN_TIMER =C2=A0 =C2=A0 =C2=A0* or the ratelimit time. =C2=A0 =C2=A0 =C2=A0*/ i.e., fix the style of the comment (wrt George's patch) by putting the "wings" there, but please, keep the "3)", and the fact that it basically mirrors the big comment at the beginning of the function. > @@ -1746,7 +1789,7 @@ void __dump_execstate(void *unused); > =C2=A0static struct csched2_vcpu * > =C2=A0runq_candidate(struct csched2_runqueue_data *rqd, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0struct csched2_vcpu *scurr, > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0int cpu, s_time_t now) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0int cpu, s_time_t now, struct csched2_private *prv) > Reviewing v1, George said this: =C2=A0 Since we have the cpu, I think we can get ops this way, without =C2=A0 cluttering things up with the extra argument: =C2=A0 =C2=A0 =C2=A0 struct csched_private *prv =3D CSCHED_PRIV(per_cpu(sch= eduler, cpu)); > @@ -1775,9 +1829,13 @@ runq_candidate(struct csched2_runqueue_data > *rqd, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} > =C2=A0 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* If the next one = on the list has more credit than current > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* (or idle, if cur= rent is not runnable), choose it. */ > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* (or idle, if cur= rent is not runnable) and current one has > already > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* executed for mor= e than ratelimit. choose it. > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* Control has reac= hed here means that current vcpu has > executed > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* ratelimit_us or = ratelimit is off, so chose the next one. > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( svc->credit > = snext->credit ) > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= snext =3D svc; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0snext =3D svc; > =C2=A0 Both me and George agreed that changing the comment like this is not helping much and should not be done. You also (pointed out already as well) don't need to touch the 'snext =3D svc' line. Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-2fBekMs+xoamgtaQteOy Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXkfcvAAoJEBZCeImluHPugfYP/jQB2iwV3fBL6muD5aIiNG/J tr3T3gyNlO8kwxDPstGlruun9BeuOgrs714y47t2puy8Gqr73Pd6B2lx1u3CcBye uMl9R0XLN+/OmyX1g1SwtBBsq4aUj6aarWq/c7GeV9ONluNiAGbiny8jsjfqPewI 5cbEBU3s4Zu2vXZ0y+s0hQmDUoMG0iL0b3CqNOAaGxfLpHINgmb5YY/jki4AZU/s qKC4IfL1BhUfU9d1XM3GmOjhoWbtW+cSIOAnRCpMs4gHPOJuvoH8SAuqqOaSEEZn MycVL2ML28NMBSUEIzhCx0rXf9EjBUHgc0pbXNJ5jNnqjHUvJ6zSvzk6qJB9VCma hTXqFkuKAixFzHUmU5LzbvXE3c28ravXzXX9/TZMM5cO2Ur804Nb9Yc4Y50HYIQs MBN/l4voZsQ9xtkA48sh3QdbuQIPQ21WA+KPNCR8NQwYSvGZH9zU7ub4mto3ZAxa vaB+VzJytWqTmpjTVRhBnh2xP1luQ0cYOHTDEHJPcrkHk0Nb4pahBAgAiGUrX4O1 K3bu3qKzNEhIcHYoBhtJNC9rFbf6M+ZKSXpwGquwrAp55oXAMkN49BrENnSliCEc nxdSOVk0rYPANQjg0FSu0Y5SkUhuRSN26N8xGGwf12fihUU7/J8vzwPTVGDxA7np 9JVFKZap5tBnPJLFZiq5 =gWeA -----END PGP SIGNATURE----- --=-2fBekMs+xoamgtaQteOy-- --===============6650042841794355202== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============6650042841794355202==--