From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH 05/24] xen: credit2: make tickling more deterministic Date: Fri, 30 Sep 2016 04:22:09 +0200 Message-ID: <1475202129.5315.70.camel@citrix.com> References: <147145358844.25877.7490417583264534196.stgit@Solace.fritz.box> <147145428827.25877.8612560340138019986.stgit@Solace.fritz.box> <2cf45a94-2e9a-3cbb-4bde-9850dc0af139@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2316229833254612502==" Return-path: Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bpnSt-0008Di-5J for xen-devel@lists.xenproject.org; Fri, 30 Sep 2016 02:22:19 +0000 In-Reply-To: <2cf45a94-2e9a-3cbb-4bde-9850dc0af139@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: George Dunlap , xen-devel@lists.xenproject.org Cc: Andrew Cooper , Anshul Makkar , Jan Beulich List-Id: xen-devel@lists.xenproject.org --===============2316229833254612502== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-TCqHaw0rURoqf3L3GOMA" --=-TCqHaw0rURoqf3L3GOMA Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, 2016-09-13 at 12:28 +0100, George Dunlap wrote: > On 17/08/16 18:18, Dario Faggioli wrote: > >=C2=A0 > diff --git a/xen/common/sched_credit2.c > > @@ -2233,7 +2241,8 @@ void __dump_execstate(void *unused); > > =C2=A0static struct csched2_vcpu * > > =C2=A0runq_candidate(struct csched2_runqueue_data *rqd, > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0struct csched2_vcpu *scurr, > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0int cpu, s_time_t now) > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0int cpu, s_time_t now, > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0unsigned int *pos) >=20 > I think I'd prefer if this were called "skipped" or something like > that > -- to indicate how many vcpus in the runqueue had been skipped before > coming to this one. >=20 Done. > > @@ -2298,6 +2340,7 @@ csched2_schedule( > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct csched2_runqueue_data *rqd; > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct csched2_vcpu * const scurr =3D CSC= HED2_VCPU(current); > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct csched2_vcpu *snext =3D NULL; > > +=C2=A0=C2=A0=C2=A0=C2=A0unsigned int snext_pos =3D 0; > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct task_slice ret; > > =C2=A0 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0SCHED_STAT_CRANK(schedule); > > @@ -2347,7 +2390,7 @@ csched2_schedule( > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0snext =3D CSCHED2= _VCPU(idle_vcpu[cpu]); > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0else > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0snext=3Drunq_candidate= (rqd, scurr, cpu, now); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0snext =3D runq_candida= te(rqd, scurr, cpu, now, &snext_pos); > > =C2=A0 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* If switching from a non-idle runnable = vcpu, put it > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* back on the runqueue. */ > > @@ -2371,8 +2414,21 @@ csched2_schedule( > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0__set_bit(__CSFLAG_scheduled, &snext->flags); > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} > > =C2=A0 > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* Check for the reset= condition */ > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( snext->credit <= =3D CSCHED2_CREDIT_RESET ) > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* The reset cond= ition is "has a scheduler epoch come to > > an end?". > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* The way this i= s enforced is checking whether the vcpu > > at the top > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* of the runqueu= e has negative credits. This means the > > epochs have > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* variable lengh= t, as in one epoch expores when: > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*=C2=A0=C2=A01) = the vcpu at the top of the runqueue has executed > > for > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0around 10 ms (with default parameters); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*=C2=A0=C2=A02) = no other vcpu with higher credits wants to run. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* Here, where we= want to check for reset, we need to make > > sure the > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* proper vcpu is= being used. In fact, > > runqueue_candidate() may have > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* not returned t= he first vcpu in the runqueue, for > > various reasons > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* (e.g., affinit= y). Only trigger a reset when it does. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( snext_pos =3D=3D = 0 && snext->credit <=3D > > CSCHED2_CREDIT_RESET ) >=20 > This bit wasn't mentioned in the description. :-) >=20 You're right. Actually, I think this change deserves to be in its own patch, so in v2 I'm splitting this patch in two. > There's a certain amount of sense to the idea here, but it's the kind > of > thing that may have strange side effects.=C2=A0=C2=A0Did you look at trac= es > before > and after this change?=C2=A0=C2=A0And does the behavior seem more rationa= l? >=20 I have. It's not like it was happening a lot of times that we were resetting upon the wrong vcpus, but I indeed have caught a couple of examples. And yes, the trace looked more 'regular' with this patch. Or, IOW, without this patch, there were some of the reset events that were suspiciously closer between each other. TBH, in the vast majority of the cases, even when a "spurious reset" was involved, the difference was rather hard to tell, but please, consider that the combination of hard-affinity, this patch and soft- affinity will potentially make things much worse (and in fact, I saw the most severe occurrences when using hard-affinity). It's also rather hard to measure the effect, but I think what is implemented here is the right thing to do. And even if it may be hard to measure the performance impact, I claim that this is a 'correctness' issue, or at least a matter of adhering as much as possible to the algorithm theory and idea. > If so, I'm happy to trust your judgement -- just want to check to > make > sure. :-) >=20 Ah, thanks. :-) Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-TCqHaw0rURoqf3L3GOMA Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJX7cxSAAoJEBZCeImluHPukvYQALCwep8sVVEomEY2Mt71At7X D9lAOWToAatiiLi7v10/UjR7SvgCNbxNCv9AnnDz1nk3R/Snb+hfwkQmAfRL23Id 083yoRFqN5UFp4Xe8g1rI+f0vnbTj5QnqKlPMKs5z282vC39Dg8LfXV0DuwdW62r pQerQgOYMkwaFriCA5b93Qs8Bq2DyAffvjK+fuKRj8amsmM8WTe00FfM67/+90Go eqNUWtAuFaLV9iahgJDDYPHrqINL8SPqvE7dgP5Bd6WtfszaoCokXkVgk9hx5RMj 5+zddC972ZLE73OxrEhI5iFqkQlMbTPL2sSicSw39iohsZC6lnvQb2BIE+pcCt3K NOojncXCDsvvnutF5V9CJ2knXxInwr4lBCJstkWOkkZ/OyHNb9fgyZ7JD7SN/UsQ HWONi1jE0cm4PUfXe0w5p3zS7oOrhkFuAiir7Wv96Ku7Z7utpqo3QL0TBPuBS5ba Qhfieh2RQbMbqxKHAl1yT2OiHrsHZdinXG3AcH/k7fh5J5cjVx/nl5XH4rneswoa gJdjFM9pGmcizrAB0GocwyJG6qBNEhMWI/IU2h6Eh2Cn9QM7rNJVSStfOgtzOPHd ARuWLt22TytqloKSo2bqMAIpqLwkgL/cKZQKlMvKFthoHD2xSWhXDBpQUgPmmKfr w4YvmvXQ7dPgU2UlrwuH =19Pu -----END PGP SIGNATURE----- --=-TCqHaw0rURoqf3L3GOMA-- --===============2316229833254612502== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============2316229833254612502==--