On Mon, Nov 4, 2013 at 10:06 AM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:

Ian Campbell writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]"):

> Regardless of the answer here, would it make sense to do some/all of the
> checkpoint processing in the helper subprocess anyway and only signal
> the eventual failover up to the libxl process?

It might do. Mind you, the code in libxc is tangled enough as it is
and is due for a rewrite. Perhaps this could be done in the helper
executable, although there isn't currently any way to easily
intersperse code in there.

> This async op is potentially quite long running I think compared to a
> normal one i.e. if the guest doesn't die it is expected that the ao
> lives "forever". Since the associated gc's persist until the ao ends
> this might end up accumulating lots of allocations? Ian had a similar
> concern about Roger's hotplug daemon series and suggested creating a per
> iteration gc or something.

Yes, this is indeed a problem. Well spotted.

Which of the xc_domain_save (and _restore) callbacks are called each
remus iteration ?

Almost all of them on the xc_domain_save side. (suspend, resume,

save_qemu state, checkpoint).

xc_domain_restore doesn't have any callbacks AFAIK. And remus as of now

does not have a component on the restore side. It piggybacks on live migration's

restore framework.

I think introducing a per-iteration gc here is going to involve taking
some care, since we need to be sure not to put
per-iteraton-gc-allocated objects into data structures which are used
by subsequent iterations.

FWIW, the remus related code that executes per iteration does not allocate anything.

All allocations happen only during setup and I was under the impression that no other

allocations are taking place everytime xc_domain_save calls back into libxl.

However, it may be possible that other parts of the AO machinery (and there

are a lot of them) are allocating stuff per iteration. And if that is the case, it could

easily lead to OOMs since Remus technically runs as long as the domain lives.

Shriram writes:
> Fair enough. My question is what is the overhead of setting up, firing
> and tearing down a timeout event using the event gen framework, if I
> wish to checkpoint the VM, say every 20ms ?

The ultimate cost of going back into the event loop to wait for a
timeout will depend on what else the process is doing. If the process
is doing nothing else, it's about two calls to gettimeofday and one to
poll. Plus a bit of in-process computation, but that's going to be
swamped by system call overhead.

Having said that, libxl is not performance-optimised. Indeed the
callback mechanism involves context switching, and IPC, between the
save/restore helper and libxl proper. Probably not too much to be
doing every 20ms for a single domain, but if you have a lot of these
it's going to end up taking a lot of dom0 cpu etc.

Yes and that is a problem. Xend+Remus avoided this by linking

the libcheckpoint library that interfaced with both the python & libxc code.

I assume you're not doing this for HVM domains, which involve saving
the qemu state each time too.

It includes HVM domains too. Although in that case, xenstore based suspend

takes about 5ms. So the checkpoint interval is typically 50ms or so.

If there is a latency sensitive task running inside

the VM, lower checkpoint interval leads to better performance.