From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dario Faggioli <dario.faggioli@citrix.com>
Subject: Re: [PATCH] xen/arm: introduce vwfi parameter
Date: Sat, 18 Feb 2017 02:47:43 +0100
Message-ID: <1487382463.6732.146.camel@citrix.com>
References: <1487286292-29502-1-git-send-email-sstabellini@kernel.org>
 <a271394a-6c76-027c-fb08-b3fe775224ba@arm.com>
 <alpine.DEB.2.10.1702171442320.9566@sstabellini-ThinkPad-X260>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============7648485414384225546=="
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta5.messagelabs.com ([195.245.231.135])
 by lists.xenproject.org with esmtp (Exim 4.84_2)
 (envelope-from <prvs=2150f978e=dario.faggioli@citrix.com>)
 id 1ceu7z-0005Dx-EF
 for xen-devel@lists.xenproject.org; Sat, 18 Feb 2017 01:47:59 +0000
In-Reply-To: <alpine.DEB.2.10.1702171442320.9566@sstabellini-ThinkPad-X260>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
To: Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien.grall@arm.com>
Cc: edgar.iglesias@xilinx.com, george.dunlap@eu.citrix.com, nd@arm.com, xen-devel@lists.xenproject.org
List-Id: xen-devel@lists.xenproject.org

--===============7648485414384225546==
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature"; boundary="=-8nJnxibrrFGXsFSJBdKJ"

--=-8nJnxibrrFGXsFSJBdKJ
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Fri, 2017-02-17 at 14:50 -0800, Stefano Stabellini wrote:
> On Fri, 17 Feb 2017, Julien Grall wrote:
> > Please explain in which context this will be beneficial. My gut
> > feeling is
> > only will make performance worst if a multiple vCPU of the same
> > guest is
> > running on vCPU
>=20
> I am not a scheduler expert, but I don't think so. Let me explain the
> difference:
>=20
> - vcpu_block blocks a vcpu until an event occurs, for example until
> it
> =C2=A0 receives an interrupt
>=20
> - vcpu_yield stops the vcpu from running until the next scheduler
> slot
>=20
So, what happens when you yield, depends on how yield is implemented in
the specific scheduler, and what other vcpus are runnable in the
system.

Currently, neither Credit1 nor Credit2 (and nor the Linux scheduler,
AFAICR) really stop the yielding vcpus. Broadly speaking, the following
two scenarios are possible:
=C2=A0- vcpu A yields, and there is one or more runnable but not already=C2=
=A0
=C2=A0 =C2=A0running other vcpus. In this case, A is indeed descheduled and=
 put=C2=A0
=C2=A0 =C2=A0back in a scheduler runqueue in such a way that one or more of=
 the=C2=A0
=C2=A0 =C2=A0runnable but not running other vcpus have a chance to execute,=
=C2=A0
=C2=A0 =C2=A0before the scheduler would consider A again. This may be=C2=A0
=C2=A0 =C2=A0implemented by putting A on the tail of the runqueue, so all t=
he=C2=A0
=C2=A0 =C2=A0other vcpus will get a chance to run (this is basically what=
=C2=A0
=C2=A0 =C2=A0happens in Credit1, modulo periodic runq sorting). Or it may b=
e
=C2=A0 =C2=A0implemented by ignoring A for the next <number> scheduling=C2=
=A0
=C2=A0 =C2=A0decisions after it yielded (this is basically what happens in=
=C2=A0
=C2=A0 =C2=A0Credit2). Both approaches have pros and cons, but the common b=
otton=C2=A0
=C2=A0 =C2=A0line is that others are given a chance to run.

=C2=A0- vcpu A yields, and there are no runnable but not running vcpus
=C2=A0 =C2=A0around. In this case, A gets to run again. Full stop.

And when a vcpu that has yielded is picked up back for execution
--either immediately or after a few others-- it can run again. And if
it yields again (and again, and again), we just go back to option 1 or
2 above.

> In both cases the vcpus is not run until the next slot, so I don't
> think
> it should make the performance worse in multi-vcpus scenarios. But I
> can
> do some tests to double check.
>=20
All the above being said, I also don't think it will affect much multi-
vcpus VM's performance. In fact, even if the yielding vcpu is never
really stopped, the other ones are indeed given a chance to execute if
they want and are capable of.

But sure it would not harm verifying with some tests.

> > The main point of using wfi is for power saving. With this change,
> > you will
> > end up in a busy loop and as you said consume more power.
>=20
> That's not true: the vcpu is still descheduled until the next slot.
> There is no busy loop (that would be indeed very bad).
>=20
Well, as a matter of fact there may be busy-looping involved... But
isn't it the main point of this all. AFAIR, idle=3Dpool in Linux does
very much the same, and has the same risk of potentially letting tasks
busy loop.

What will never happen is that a yielding vcpu, by busy looping,
prevents other runnable (and non yielding) vcpus to run. And if it
does, it's a bug. :-)

> > I don't think this is acceptable even to get a better interrupt
> > latency. Some
> > workload will care about interrupt latency and power.
> >=20
> > I think a better approach would be to check whether the scheduler
> > has another
> > vCPU to run. If not wait for an interrupt in the trap.
> >=20
> > This would save the context switch to the idle vCPU if we are still
> > on the
> > time slice of the vCPU.
>=20
> From my limited understanding of how schedulers work, I think this
> cannot work reliably. It is the scheduler that needs to tell the
> arch-specific code to put a pcpu to sleep, not the other way around.=20
>
Yes, that is basically true.

Another way to explain it would be by saying that, if there were other
vCPUs to run, we wouldn't have gone idle (and entered the idle loop).

In fact, in work conserving schedulers, if pCPU x becomes idle, it
means there is _nothing_ that can execute on x itself around. And our
schedulers are (with the exception of ARRINC, and if not using caps in
Credit1) work conserving, or at least they want and try to be an as
much work conserving as possible.

Regards,
Dario
--=20
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
--=-8nJnxibrrFGXsFSJBdKJ
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAABCAAGBQJYp6fAAAoJEBZCeImluHPuGt8QAIz+00nfSvtRy8Il5HYqxpcF
fIsHsCrg0e/d4ztbUDySkhQfQ4CX4zJ0s0rA787AbYRt88PzYJt+b9U9iDjRrp3m
m2byLqGz50GSniExzuPO1OUWfyeuiXAkAtXTYzDL6LA+wcUfSCTUc6EhUt0FYbv5
FN890EKwUmnLV0hsAXeFWpCg1E4ZJI8iQO3WFjeC6eUP7xPLGKMhuH4xXXK4V+P2
NwdhXtU7BHBXqwB3orOetWFcP/wHFY44+XzfPpBtpVPY3mb1OO1+dkUjad+R0Klk
VKjl9y+rC6HLem6WN6zxuZb5HUdZQNPL/Zem6aiwyniPoKfuZjH/nj6Kwelmoiqj
D66YBUsRYo1IWgdVH15AWjEum+opAoZsWjuFL1LzmadhBldqTLvf7k3vCE8GgPAm
ZnpVdf7gzzw15rXzuFvy6k8o5blonBKMfm1iUzwcFCCR8O5exXcftzqpuryL9bkX
kPvwmdcHNvstFfU/aTEqxXVmxeZxWE4iOUc026AKR1KwXOTtlhz5XOcOCuFfYBCu
y4bSn8gZo2PhvKyOBu/TODGdkfPZWykMpW1T0qigB17CN3eCsdFHVGm6z5QxQAse
VEvuhcOfV3ULSabovgqAx8rL3JNCPBEFva6y5Zp8+QfDvCccMvOL2sovxLwwb6Vm
ElJsgmwcyQ2j8YQr2XfF
=8Ex6
-----END PGP SIGNATURE-----

--=-8nJnxibrrFGXsFSJBdKJ--


--===============7648485414384225546==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline

X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs
IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v
cmcveGVuLWRldmVsCg==

--===============7648485414384225546==--