From mboxrd@z Thu Jan  1 00:00:00 1970
From: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Subject: Re: [PATCH 4 of 5 V3] tools/libxl: Control network
 buffering in remus callbacks [and 1 more messages]
Date: Mon, 4 Nov 2013 10:47:07 -0600
Message-ID: <CAP8mzPN5oTPLfAq+7pB38WTRAEUxkdG6pE_QWPKLNU++2LL4Ow@mail.gmail.com>
References: <patchbomb.1382335082@athos.nss.cs.ubc.ca>
	<a8deb9499e9dcce98690.1382335086@athos.nss.cs.ubc.ca>
	<21107.62159.140005.466786@mariner.uk.xensource.com>
	<CAP8mzPO=U5=azE6UZEdf_DR+nX03HuzVLFh7YfFgS8qBzDxQgQ@mail.gmail.com>
	<CAP8mzPOMbzd0jiimV9uhqnBGi=hEfbpRQOJSLbuX0ZMLNuT1Qw@mail.gmail.com>
	<21111.36679.369553.735409@mariner.uk.xensource.com>
	<CAP8mzPMsj_PHaTzWwgJbC2-Z8ubp5H-w5uB--0BD+K8fdL=dsQ@mail.gmail.com>
	<1383579138.8826.95.camel@kazak.uk.xensource.com>
	<21111.50707.178276.553159@mariner.uk.xensource.com>
Reply-To: rshriram@cs.ubc.ca
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============8488298324920470212=="
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <21111.50707.178276.553159@mariner.uk.xensource.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, Ian Campbell <Ian.Campbell@citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

--===============8488298324920470212==
Content-Type: multipart/alternative; boundary=047d7b874c9204a98404ea5cadad

--047d7b874c9204a98404ea5cadad
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Nov 4, 2013 at 10:06 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:

> Ian Campbell writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network
> buffering in remus callbacks [and 1 more messages]"):
> > Regardless of the answer here, would it make sense to do some/all of the
> > checkpoint processing in the helper subprocess anyway and only signal
> > the eventual failover up to the libxl process?
>
> It might do.  Mind you, the code in libxc is tangled enough as it is
> and is due for a rewrite.  Perhaps this could be done in the helper
> executable, although there isn't currently any way to easily
> intersperse code in there.
>
>
> This async op is potentially quite long running I think compared to a
> > normal one i.e. if the guest doesn't die it is expected that the ao
> > lives "forever". Since the associated gc's persist until the ao ends
> > this might end up accumulating lots of allocations? Ian had a similar
> > concern about Roger's hotplug daemon series and suggested creating a per
> > iteration gc or something.
>
Yes, this is indeed a problem.  Well spotted.
>
Which of the xc_domain_save (and _restore) callbacks are called each
> remus iteration ?
>
>
Almost all of them on the xc_domain_save side. (suspend, resume,
save_qemu state, checkpoint).
xc_domain_restore doesn't have any callbacks AFAIK. And remus as of now
does not have a component on the restore side. It piggybacks on live
migration's
restore framework.


> I think introducing a per-iteration gc here is going to involve taking
> some care, since we need to be sure not to put
> per-iteraton-gc-allocated objects into data structures which are used
> by subsequent iterations.
>
>
FWIW, the remus related code that executes per iteration does not allocate
anything.
All allocations happen only during setup and I was under the impression
that no other
allocations are taking place everytime xc_domain_save calls back into libxl.

However, it may be possible that other parts of the AO machinery (and there
are a lot of them) are allocating stuff per iteration. And if that is the
case, it could
easily lead to OOMs since Remus technically runs as long as the domain
lives.


Shriram writes:
> > Fair enough. My question is what is the overhead of setting up, firing
> > and tearing down a timeout event using the event gen framework, if I
> > wish to checkpoint the VM, say every 20ms ?
>
> The ultimate cost of going back into the event loop to wait for a
> timeout will depend on what else the process is doing.  If the process
> is doing nothing else, it's about two calls to gettimeofday and one to
> poll.  Plus a bit of in-process computation, but that's going to be
> swamped by system call overhead.
>
> Having said that, libxl is not performance-optimised.  Indeed the
> callback mechanism involves context switching, and IPC, between the
> save/restore helper and libxl proper.  Probably not too much to be
> doing every 20ms for a single domain, but if you have a lot of these
> it's going to end up taking a lot of dom0 cpu etc.
>
>
Yes and that is a problem. Xend+Remus avoided this by linking
the libcheckpoint library that interfaced with both the python & libxc code.


> I assume you're not doing this for HVM domains, which involve saving
> the qemu state each time too.
>
>
It includes HVM domains too. Although in that case, xenstore based suspend
takes about 5ms. So the checkpoint interval is typically 50ms or so.

If there is a latency sensitive task running inside
the VM, lower checkpoint interval leads to better performance.

--047d7b874c9204a98404ea5cadad
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Mon, Nov 4, 2013 at 10:06 AM, Ian Jackson <span dir=3D"=
ltr">&lt;<a href=3D"mailto:Ian.Jackson@eu.citrix.com" target=3D"_blank">Ian=
.Jackson@eu.citrix.com</a>&gt;</span> wrote:<br><div class=3D"gmail_extra">=
<div class=3D"gmail_quote">

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Ian Campbell writes (&quot;Re: [PATCH 4 of 5=
 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more =
messages]&quot;):<br>


<div class=3D"im">&gt; Regardless of the answer here, would it make sense t=
o do some/all of the<br>
&gt; checkpoint processing in the helper subprocess anyway and only signal<=
br>
&gt; the eventual failover up to the libxl process?<br>
<br>
</div>It might do. =A0Mind you, the code in libxc is tangled enough as it i=
s<br>
and is due for a rewrite. =A0Perhaps this could be done in the helper<br>
executable, although there isn&#39;t currently any way to easily<br>
intersperse code in there.<br>
<div class=3D"im"><span style=3D"color:rgb(34,34,34)">=A0</span><br></div><=
/blockquote><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex"><div class=3D"im">
&gt; This async op is potentially quite long running I think compared to a<=
br>
&gt; normal one i.e. if the guest doesn&#39;t die it is expected that the a=
o<br>
&gt; lives &quot;forever&quot;. Since the associated gc&#39;s persist until=
 the ao ends<br>
&gt; this might end up accumulating lots of allocations? Ian had a similar<=
br>
&gt; concern about Roger&#39;s hotplug daemon series and suggested creating=
 a per<br>
&gt; iteration gc or something.<span style=3D"color:rgb(34,34,34)">=A0</spa=
n><br></div></blockquote><blockquote class=3D"gmail_quote" style=3D"margin:=
0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class=3D"im">
</div>Yes, this is indeed a problem. =A0Well spotted.=A0<br></blockquote><b=
lockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px =
#ccc solid;padding-left:1ex">
Which of the xc_domain_save (and _restore) callbacks are called each<br>
remus iteration ?<br>
<br></blockquote><div><br></div><div>Almost all of them on the xc_domain_sa=
ve side. (suspend, resume,=A0</div><div>save_qemu state, checkpoint).</div>=
<div>xc_domain_restore doesn&#39;t have any callbacks AFAIK. And remus as o=
f now</div>

<div>does not have a component on the restore side. It piggybacks on live m=
igration&#39;s</div><div>restore framework.</div><div>=A0</div><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid=
;padding-left:1ex">


I think introducing a per-iteration gc here is going to involve taking<br>
some care, since we need to be sure not to put<br>
per-iteraton-gc-allocated objects into data structures which are used<br>
by subsequent iterations.<br>
<div class=3D"im"><br></div></blockquote><div><br></div><div>FWIW, the remu=
s related code that executes per iteration does not allocate anything.</div=
><div>All allocations happen only during setup and I was under the impressi=
on that no other</div>

<div>allocations are taking place everytime xc_domain_save calls back into =
libxl.</div><div><br></div><div>However, it may be possible that other part=
s of the AO machinery (and there=A0</div><div>are a lot of them) are alloca=
ting stuff per iteration. And if that is the case, it could</div>

<div>easily lead to OOMs since Remus technically runs as long as the domain=
 lives.</div><div><br></div><div><br></div><blockquote class=3D"gmail_quote=
" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class=3D"im">

Shriram writes:<br>
&gt; Fair enough. My question is what is the overhead of setting up, firing=
<br>
&gt; and tearing down a timeout event using the event gen framework, if I<b=
r>
&gt; wish to checkpoint the VM, say every 20ms ?<br>
<br>
</div>The ultimate cost of going back into the event loop to wait for a<br>
timeout will depend on what else the process is doing. =A0If the process<br=
>
is doing nothing else, it&#39;s about two calls to gettimeofday and one to<=
br>
poll. =A0Plus a bit of in-process computation, but that&#39;s going to be<b=
r>
swamped by system call overhead.<br>
<br>
Having said that, libxl is not performance-optimised. =A0Indeed the<br>
callback mechanism involves context switching, and IPC, between the<br>
save/restore helper and libxl proper. =A0Probably not too much to be<br>
doing every 20ms for a single domain, but if you have a lot of these<br>
it&#39;s going to end up taking a lot of dom0 cpu etc.<br>
<br></blockquote><div><br></div><div>Yes and that is a problem. Xend+Remus =
avoided this by linking</div><div>the libcheckpoint library that interfaced=
 with both the python &amp; libxc code.</div><div>=A0<br></div><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid=
;padding-left:1ex">


I assume you&#39;re not doing this for HVM domains, which involve saving<br=
>
the qemu state each time too.<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br></font></span></blockquo=
te><div><br></div><div>It includes HVM domains too. Although in that case, =
xenstore based suspend</div><div>takes about 5ms. So the checkpoint interva=
l is typically 50ms or so.</div>

<div><br></div><div>If there is a latency sensitive task running inside<br>=
</div><div>the VM, lower checkpoint interval leads to better performance.</=
div><div><br></div><div><br></div></div></div></div>

--047d7b874c9204a98404ea5cadad--


--===============8488298324920470212==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

--===============8488298324920470212==--