From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dario Faggioli <dario.faggioli@citrix.com>
Subject: Re: [RFC PATCH 21/24] ARM: vITS: handle INVALL command
Date: Fri, 16 Dec 2016 02:30:18 +0100
Message-ID: <1481851818.3445.390.camel@citrix.com>
References: <20160928182457.12433-1-andre.przywara@arm.com>
 <alpine.DEB.2.10.1611111200180.6951@sstabellini-ThinkPad-X260>
 <alpine.DEB.2.10.1611181028340.3290@sstabellini-ThinkPad-X260>
 <0fce93d4-605b-78b9-9146-b4d65eb4e86a@arm.com>
 <alpine.DEB.2.10.1611301638060.2781@sstabellini-ThinkPad-X260>
 <4a8bb842-bac5-942f-ca84-d223f43ab50b@arm.com>
 <alpine.DEB.2.10.1612021556381.6598@sstabellini-ThinkPad-X260>
 <b34595dc-1259-43a2-7d88-ab06fed50a30@arm.com>
 <alpine.DEB.2.10.1612051136110.6598@sstabellini-ThinkPad-X260>
 <de7d90fa-b4ea-d253-e631-3c4199210aa1@arm.com>
 <alpine.DEB.2.10.1612061132500.6598@sstabellini-ThinkPad-X260>
 <1481059976.3445.98.camel@citrix.com>
 <alpine.DEB.2.10.1612061347360.6598@sstabellini-ThinkPad-X260>
 <alpine.DEB.2.10.1612061356000.6598@sstabellini-ThinkPad-X260>
 <dae9a4d7-6333-b56f-db6c-688ef9a6c60c@arm.com>
 <alpine.DEB.2.10.1612071048580.22778@sstabellini-ThinkPad-X260>
 <09a541d6-a6ec-180d-0f24-400fbb0a3ea4@arm.com>
 <alpine.DEB.2.10.1612091204400.22778@sstabellini-ThinkPad-X260>
 <A5AE4178-19E0-44C7-B751-034EC37EF21F@citrix.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============1155970724603855794=="
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta6.messagelabs.com ([193.109.254.103])
 by lists.xenproject.org with esmtp (Exim 4.84_2)
 (envelope-from <prvs=15156663b=dario.faggioli@citrix.com>)
 id 1cHhLx-0001u1-Ty
 for xen-devel@lists.xenproject.org; Fri, 16 Dec 2016 01:30:30 +0000
In-Reply-To: <A5AE4178-19E0-44C7-B751-034EC37EF21F@citrix.com>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
To: George Dunlap <George.Dunlap@citrix.com>, Stefano Stabellini <sstabellini@kernel.org>
Cc: Andre Przywara <andre.przywara@arm.com>, Julien Grall <julien.grall@arm.com>, Steve Capper <Steve.Capper@arm.com>, Vijay Kilari <vijay.kilari@gmail.com>, xen-devel <xen-devel@lists.xenproject.org>
List-Id: xen-devel@lists.xenproject.org

--===============1155970724603855794==
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature"; boundary="=-XwHX39RGjYmgFy2tLKBo"

--=-XwHX39RGjYmgFy2tLKBo
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Wed, 2016-12-14 at 03:39 +0100, George Dunlap wrote:
> > On Dec 10, 2016, at 4:18 AM, Stefano Stabellini <sstabellini@kernel
> > .org> wrote:
> > > > The issue with spreading interrupts migrations over time is
> > > > that it makes
> > > > interrupt latency less deterministic. It is OK, in the uncommon
> > > > case of
> > > > vCPU migration with interrupts, to take a hit for a short time.
> > > > This
> > > > "hit" can be measured. It can be known. If your workload cannot
> > > > tolerate
> > > > it, vCPUs can be pinned. It should be a rare event anyway. On
> > > > the other
> > > > hand, by spreading interrupts migrations, we make it harder to
> > > > predict
> > > > latency. Aside from determinism, another problem with this
> > > > approach is
> > > > that it ensures that every interrupt assigned to a vCPU will
> > > > first hit
> > > > the wrong pCPU, then it will be moved. It guarantees the worst-
> > > > case
> > > > scenario for interrupt latency for the vCPU that has been
> > > > moved. If we
> > > > migrated all interrupts as soon as possible, we would minimize
> > > > the
> > > > amount of interrupts delivered to the wrong pCPU. Most
> > > > interrupts would
> > > > be delivered to the new pCPU right away, reducing interrupt
> > > > latency.
>=C2=A0
> Another approach which one might take:
> 3. Eagerly migrate a subset of the interrupts and lazily migrate the
> others.=C2=A0=C2=A0For instance, we could eagerly migrate all the interru=
pts
> which have fired since the last vcpu migration.=C2=A0=C2=A0In a system wh=
ere
> migrations happen frequently, this should only be a handful; in a
> system that migrates infrequently, this will be more, but it won=E2=80=99=
t
> matter, because it will happen less often.
>=20
Yes, if doable (e.g., I don't know how easy and practical is to know
and keep track of fired interrupts) this looks a good solution to me
too.

> So at the moment, the scheduler already tries to avoid migrating
> things *a little bit* if it can (see migrate_resist).=C2=A0=C2=A0It=E2=80=
=99s not clear
> to me at the moment whether this is enough or not.=C2=A0=C2=A0
>
Well, true, but migration resistance, in Credit2, is just a fixed value
which:
=C2=A01. is set at boot time;
=C2=A02. is always the same for all vcpus;
=C2=A03. is always the same, no matter what a vcpu is doing.

And even if we make it tunable and changeable at runtime (which I
intend to do), it's still something pretty "static" because of 2 and 3.

And even if we make it tunable per-vcpu (which is doable), it would be
rather hard to decide to what value to set it, for each vcpu. And, of
course, 3 would still apply (i.e., it would change according to the
vcpu workload or characteristics).

So, it's guessing. More or less fine grained, but always guessing.

On the other hand, using something proportional to nr. of routed
interrupt as the migration resistance threshold would overcome all 1, 2
and 3. It would give us a migrate_resist value which is adaptive, and
is determined according to actual workload of properties of a specific
vcpu.
Feeding routed interrupt info to the load balancer comes from similar
reasoning (and we actually may want to do both).

FTR, Credit1 has a similar mechanism, i.e., it *even wilded guesses*
whether a vcpu could still have some of its data in cache, and tries
not to migrate it if it's likely (see=C2=A0__csched_vcpu_is_cache_hot()).
We can improve that too, although it is a lot more complex and less
predictable, as usual with Credit1.

> Or to put it a different way =E2=80=94 how long should the scheduler try =
to
> wait before moving one of these vcpus?=C2=A0=C2=A0
>
Yep, it's similar to the "anticipation" problem in I/O schedulers
(where "excessive seeks" ~=3D "too frequent migrations").

=C2=A0https://en.wikipedia.org/wiki/Anticipatory_scheduling

> At the moment I haven=E2=80=99t seen a good way of calculating this.
>=20
Exactly, and basing the calculation on the number of routed interrupt
--and, if possible, other metrics too-- could be that "good way" we're
looking for.

It would need experimenting, of course, but I like the idea.

> #3 to me has the feeling of being somewhat more satisfying, but also
> potentially fairly complicated.=C2=A0=C2=A0Since the scheduler already do=
es
> migration resistance somewhat, #1 would be a simpler to implement in
> the sort run.=C2=A0=C2=A0If it turns out that #1 has other drawbacks, we =
can
> implement #3 as and when needed.
>=20
> Thoughts?
>=20
Yes, we can do things incrementally, which is always good. I like your
#1 proposal because it has the really positive side effect of bringing
us in the camp of adaptive migration resistance, which is something
pretty advanced and pretty cool, if we manage to do it right. :-)

Regards,
Dario
--=20
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
--=-XwHX39RGjYmgFy2tLKBo
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAABCAAGBQJYU0OrAAoJEBZCeImluHPuBNIQANGUvlO83JoKVSDjO4gI76Cz
4wakl+n1U4t/Kwt5Lx52OXh1ssoK8T2rVV7tavxC4CF6tZzv+BAIM7SQxzO+d36c
gGSWGSwTzVelYLes9vCUlEbX+L47sUp+H27ji1ZUfgFHtEuyDFfBgL4qlYi4T+Dv
3GGp7cLZYpyUaFE15rWxLLBay10LMNoGDbSctAZCSeaSl6/xTv1YxJ0tPIlApFaf
NB3sdjTpNaghmO6rpIBaxIHFUwyGrhi6O2+zPPJH1kgc12pZe/dJVMcvJBco6oN4
JZeweQ29WccbKDJQw02cBJHwikDQAvigwoX0jaL50mUP8C20X4KMJwoFYwxAK9MX
E9Pkd4HlbB0qAAjKD2B9+KFwP7wEId9ZZ8HZP0TYZ1GBAD+lsKNusv3pDkVi6nkw
Qe8EuSeMr1IYBYKZf9PyLleuoWYri+EWJV5s/ZX1byJZ085FWg0f1oQewsRGB2rk
nDTrtvdJBVi+neeU41ZhfFvd+PXDUTzN3k8svHbd/wNKfKzxMqmj9V30MsmvalxW
Ym9Z3veku0QkEyfKog+OAzFBum9B3aIhrXbVllW8+DpYzOs2X1CTVBs8xSsBbK9K
XWbpiFbK6E3GJunuF5+rsP/uBlFFN1F1w3rx6SfsZk03CIY/yEdUBgZwJg87bboi
rXgl3yM2GA/RJklyhQN+
=2O5a
-----END PGP SIGNATURE-----

--=-XwHX39RGjYmgFy2tLKBo--


--===============1155970724603855794==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline

X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs
IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v
cmcveGVuLWRldmVsCg==

--===============1155970724603855794==--