From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH 19/24] xen: credit2: soft-affinity awareness in load balancing Date: Mon, 5 Sep 2016 14:49:27 +0200 Message-ID: <1473079767.19612.23.camel@citrix.com> References: <147145358844.25877.7490417583264534196.stgit@Solace.fritz.box> <147145438726.25877.12520091608250776214.stgit@Solace.fritz.box> <57C96679.3000902@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4100094711802024087==" Return-path: Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bgtLG-00078i-8q for xen-devel@lists.xenproject.org; Mon, 05 Sep 2016 12:49:38 +0000 In-Reply-To: <57C96679.3000902@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: anshul makkar , xen-devel@lists.xenproject.org Cc: George Dunlap List-Id: xen-devel@lists.xenproject.org --===============4100094711802024087== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-c1OvfeM/kea+X29PvGiu" --=-c1OvfeM/kea+X29PvGiu Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, 2016-09-02 at 12:46 +0100, anshul makkar wrote: Hey, Anshul, Thanks for having a look at the patch! > On 17/08/16 18:19, Dario Faggioli wrote: > >=C2=A0 > > --- a/xen/common/sched_credit2.c > > +++ b/xen/common/sched_credit2.c > >=C2=A0 > > + * Basically, if a soft-affinity is defined, the work done by a > > vcpu on a > > + * runq to which it has higher degree of soft-affinity, is > > considered > > + * 'lighter' than the same work done by the same vcpu on a runq to > > which it > > + * has smaller degree of soft-affinity (degree of soft affinity is > > <=3D 1). In > > + * fact, if soft-affinity is used to achieve NUMA-aware > > scheduling, the higher > > + * the degree of soft-affinity of the vcpu to a runq, the greater > > the probability > > + * of accessing local memory, when running on such runq. And that > > is certainly\ > > + * 'lighter' than having to fetch memory from remote NUMA nodes. > Do we ensure that while defining soft-affinity for a vcpu, NUMA=C2=A0 > architecture is considered. If not, then this whole calculation can > go=C2=A0 > wrong and have negative impact on performance. >=20 Defining soft-affinity after topology is what we do by default, just not here in Xen: we do it in toolstack (in libxl, to be precise). NUMA aware scheduling is indeed the most obvious use case for all this --and, in fact that's why we configure things in such a way in higher layers-- but the mechanism is, at the Xen level, flexible enough to be used for any purpose that the user may find interesting. > Degree of affinity to runq will give good result if the affinity to=C2=A0 > pcpus has been chosen after due consideration .. > At this level, 'good result' means 'making sure that a vcpu runs for as much time as possible on a pcpu to which it has soft-affinity'. Whether that is good or not for performance (or for any other aspect or metric), it's not this algorithm's job to determine. Note that things are exactly the same for hard-affinity/pinning, or for weights. In fact, Xen won't stop one to, say, pin 128 vcpu all to pcpu 3. This will deeply suck, but it's the higher layers' will (fault?) and Xen should just comply to that. > > + * If there is no soft-affinity, load_balance() (actually, > > consider()) acts > > + * as follows: > > + * > > + *=C2=A0=C2=A0- D =3D abs(Li - Lj) > If we are consider absolute of Li -Lj, how will we know which runq > has=C2=A0 > less workload which, I think, is an essential parameter for load=C2=A0 > balancing. Am I missing something here ? > What we are aiming for is making the queues more balanced, which means we want the difference between their load to be smaller than how it is when the balancing start. As far as that happens, we don't care which loads goes down and which one goes up, as far as the final result is a smaller load delta. > > + *=C2=A0=C2=A0- consider pushing v from I to J: > > + *=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0- D' =3D abs(Li - lv - (Lj + lv))=C2= =A0=C2=A0=C2=A0(from now, abs(x) =3D=3D |x|) > > + *=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0- if (D' < D) { push } > > + *=C2=A0=C2=A0- consider pulling k from J to I: > > + *=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0- D' =3D |Li + lk - (Lj - lk)| > > + *=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0- if (D' < D) { pull } > For both push and pull we are checking (D` < D) ? > Indeed. And that's because of the abs(). :-) Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-c1OvfeM/kea+X29PvGiu Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXzWnYAAoJEBZCeImluHPu8cwQALqnOBRdwAJbfgRE8MOxGzf2 tvDdVmnUn5/03VR4xYFTcLQB5A0Iy1uetNYyyrdlno/sXNoSJ54aDmsTMTNe8YoG EGuRcskAEisjB6GfvbJI+eU0vc1zCK0w/sMVfyAUjRjvhNeJv3YVN+SlVevgyyCQ PaqhD3wqjBGHp1gEhA+5qsBny7d42YKC4S8Pf4uR/ZQBTOqiGpDjLviB119RaU0t vCH0YE1Ote4tZherCCXcdhaQ2bO6+Ef396LfyDd5E+PIG34oGM/bJ3vP128wft9n ZyXh0ZXyEYLw2dJQwpbLYQZqlMoIJrcX2cXr5btT1ryovY0mbdBivjG47IhYnC8J VnYNYIg3gwrt9yFnK+umqlgfHakCqUF8YrxuhJub0RxEX3urwqqCf7wCk1LuiQV2 gxtDmKPD0z5YHYK0fq43rnmfNag5+HI8aqQs8Mg79rRlwlww1dqTCmgElSMR8UMV Y52wtPFAarYfiuGzAvzmTm3uJrMWJces6v2moSoarB6a61M1W71sleBf7hUeLsIa UHtoRqWb1iT709+MTUW/7TOX0bwkOsssFUERyqdcz8R8+ImMDlkJSZhv9Rybde6B /lzfr12SQ2iEtU+GwRcLya9iepiS1PkyuTvBagjtDuJ/RdqIm+nMNjaF0f6F6ocW 6XoGGRoBWK4itzHEsbgP =FX+B -----END PGP SIGNATURE----- --=-c1OvfeM/kea+X29PvGiu-- --===============4100094711802024087== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============4100094711802024087==--