From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dario Faggioli <dario.faggioli@citrix.com>
Subject: Re: [PATCH 06/24] xen: credit2: implement yield()
Date: Thu, 29 Sep 2016 18:05:34 +0200
Message-ID: <1475165134.5315.27.camel@citrix.com>
References: <147145358844.25877.7490417583264534196.stgit@Solace.fritz.box>
 <147145429544.25877.17160453823324324872.stgit@Solace.fritz.box>
 <6e466a9e-c051-7d02-a626-96b4856cf9b3@citrix.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============8641882415746854877=="
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta3.messagelabs.com ([195.245.230.39])
 by lists.xenproject.org with esmtp (Exim 4.84_2)
 (envelope-from <prvs=073073991=dario.faggioli@citrix.com>)
 id 1bpdrJ-0005l4-Q2
 for xen-devel@lists.xenproject.org; Thu, 29 Sep 2016 16:06:53 +0000
In-Reply-To: <6e466a9e-c051-7d02-a626-96b4856cf9b3@citrix.com>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
To: George Dunlap <george.dunlap@citrix.com>, xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Anshul Makkar <anshul.makkar@citrix.com>, Jan Beulich <jbeulich@suse.com>
List-Id: xen-devel@lists.xenproject.org

--===============8641882415746854877==
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature"; boundary="=-nBRXqEmE2Pd+DgM38mKH"

--=-nBRXqEmE2Pd+DgM38mKH
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Tue, 2016-09-13 at 14:33 +0100, George Dunlap wrote:
> On 17/08/16 18:18, Dario Faggioli wrote:
> > Alternatively, we can actually _subtract_ some credits to a
> > yielding vcpu.
> > That will sort of make the effect of a call to yield last in time.
>=20
> But normally we want the yield to be temporary, right?=C2=A0=C2=A0The kin=
ds of
> places it typically gets called is when the vcpu is waiting for a
> spinlock held by another (probably pre-empted) vcpu.=C2=A0=C2=A0Doing a
> permanent
> credit subtraction will bias the credit algorithm against cpus that
> have
> a high amount of spinlock contention (since probably all the vcpus
> will
> be calling yield pretty regularly)
>=20
Yes, indeed. Good point, actually. However, one can also think of a
scenario where:
=C2=A0- A yields, and is descheduled in favour of B, as a consequence of
=C2=A0 =C2=A0that
=C2=A0- B runs for just a little while and blocks
=C2=A0- C and A are in runqueue, and A, without counting the idle bias, has=
=C2=A0
=C2=A0 =C2=A0more credit than C. So A will be picked up again, even if it=
=C2=A0
=C2=A0 =C2=A0yielded very recently, and it may still be in the spinlock wai=
t (or=C2=A0
=C2=A0 =C2=A0whatever place that is yielding in a tight loop)

Well, in this case, A will yield again, and C will be picked, i.e.,
what would have happened in the first place, if we subtracted credits
to A. (I.e., functionally, this would work the same way, but with more
overhead.)

So, again, can this happen? How frequently, both in absolute and
relative terms? Very hard to tell! So, really...
>=20
> Yes, this is simple and should be effective for now.=C2=A0=C2=A0We can lo=
ok at
> improving it later.
>=20
...glad you also think this. Let's go for it. :-)

> > diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-
> > command-line.markdown
> > @@ -1389,6 +1389,16 @@ Choose the default scheduler.
> > =C2=A0### sched\_credit2\_migrate\_resist
> > =C2=A0> `=3D <integer>`
> > =C2=A0
> > +### sched\_credit2\_yield\_bias
> > +> `=3D <integer>`
> > +
> > +> Default: `1000`
> > +
> > +Set how much a yielding vcpu will be penalized, in order to
> > actually
> > +give a chance to run to some other vcpu. This is basically a bias,
> > in
> > +favour of the non-yielding vcpus, expressed in microseconds
> > (default
> > +is 1ms).
>=20
> Probably add _us to the end to indicate that the number is in
> microseconds.
>=20
Good idea, although right now we have "sched_credit2_migrate_resist"
which does not have the suffixe.

Still, I'm doing as you suggest because I like it better, and we'll fix
"migrate_resist" later, if we want consistency.

> > @@ -2247,10 +2267,22 @@ runq_candidate(struct csched2_runqueue_data
> > *rqd,
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct list_head *iter;
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct csched2_vcpu *snext =3D NULL;
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct csched2_private *prv =3D CSCHED2_P=
RIV(per_cpu(scheduler,
> > cpu));
> > +=C2=A0=C2=A0=C2=A0=C2=A0int yield_bias =3D 0;
> > =C2=A0
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* Default to current if runnable, idle o=
therwise */
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( vcpu_runnable(scurr->vcpu) )
> > +=C2=A0=C2=A0=C2=A0=C2=A0{
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/*
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* The way we act=
ually take yields into account is like
> > this:
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* if scurr is yi=
elding, when comparing its credits with
> > other
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* vcpus in the r=
unqueue, act like those other vcpus had
> > yield_bias
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* more credits.
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( unlikely(scurr->f=
lags & CSFLAG_vcpu_yield) )
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0yield_bias =3D CSCHED2_YIELD_BIAS;
> > +
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0snext =3D scurr;
> > +=C2=A0=C2=A0=C2=A0=C2=A0}
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0else
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0snext =3D CSCHED2=
_VCPU(idle_vcpu[cpu]);
> > =C2=A0
> > @@ -2268,6 +2300,7 @@ runq_candidate(struct csched2_runqueue_data
> > *rqd,
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0list_for_each( iter, &rqd->runq )
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0{
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct csched2_vc=
pu * svc =3D list_entry(iter, struct
> > csched2_vcpu, runq_elem);
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0int svc_credit =3D svc=
->credit + yield_bias;
>=20
> Just curious, why did you decide to add yield_bias to everyone else,
> rather than just subtracting it from snext->credit?
>=20
I honestly don't recall. :-)

It indeed feels more natural to subtract to next. I've done it that way
now, let me give it a test spin and resend...

> > @@ -2918,6 +2957,14 @@ csched2_init(struct scheduler *ops)
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0printk(XENLOG_INFO "load tracking window =
lenght %llu ns\n",
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A01ULL << opt_load_window_shift);
> > =C2=A0
> > +=C2=A0=C2=A0=C2=A0=C2=A0if ( opt_yield_bias < CSCHED2_YIELD_BIAS_MIN )
> > +=C2=A0=C2=A0=C2=A0=C2=A0{
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0printk("WARNING: %s: o=
pt_yield_bias %d too small,
> > resetting\n",
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0__func__, opt_yield_bias);
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0opt_yield_bias =3D 100=
0; /* 1 ms */
> > +=C2=A0=C2=A0=C2=A0=C2=A0}
>=20
> Why do we need a minimum bias?=C2=A0=C2=A0And why reset it to 1ms rather =
than
> SCHED2_YIELD_BIAS_MIN?
>=20
You know what, I don't think we need that. I probably was thinking that
we may always want to force yield to have _some_ effect, but there may
be (or may will be) someone who just want to disable it at all... And
in that case, this check will be in his way.

I'll kill it.

Thanks and regards,
Dario
--=20
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
--=-nBRXqEmE2Pd+DgM38mKH
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAABCAAGBQJX7TvOAAoJEBZCeImluHPuoJQP/3vswV+s18N2iancrOkWRHbl
Ul7QORcaWJoRLWNjd1PiOmhtGq1k3H86nEEOAilILCVpe4K18BhvT9t/hSpcRc+G
JehW0zUpPNOkFKXj0P8TLQbNtP3UUzsj7Ta7FLK9+zPip3nEqxwYCn2mYZjurfZj
0EPeXbLCILLjIrei3Qrf0y670T3fhsZ42vUbyEZ7scRqdq9Sctqa/lGEpU/pDCLA
g3hSU22QuI+omJlZCwyRnwKITo/HrPac5VGqQBmGIPmditH3FHIOMe0/rTlkbAU1
1pXgIu3bHAF8s7PVTbFeT7yUm5/rJWO4rzqwo60kKgUCLlVeSCYZmp4XV76onAWJ
6SXi+S2PE0C4+Z1DOzxJmlpgiBbpLUui5dOwS6NqYGi9oFyBr9Ub1DQ9h8NcMLF8
KUaOGXHAVhXTJAlNUvmDwEGab4xk/XRrS9yvRkJrWlvULB0xNtLxw/LceIvrSGzx
l0LhUk3KB9vY1ZVRoyYR8KjDISEmf23pEnHSHmKkK4wG2cFQ5gQgCumHftNCSrnm
OLbvZRt+fyKwovFXueDnqQM+EFplfZ6Gt7shTzaSfjsmzcXdWP+ablOHFBNslvM/
hEP+OyrjQS124/N/3K62oq4/DRtGaHcEob3pRxB/FCD7YJGlh6FOxVm7NP6Mjm9w
wnDGbUEARIfZPv/jGWPt
=vdOb
-----END PGP SIGNATURE-----

--=-nBRXqEmE2Pd+DgM38mKH--


--===============8641882415746854877==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline

X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs
IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v
cmcveGVuLWRldmVsCg==

--===============8641882415746854877==--