All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: rshriram@cs.ubc.ca
Cc: FNST-Yang Hongyang <yanghy@cn.fujitsu.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Ian Campbell <ian.campbell@citrix.com>,
	Xen-devel <xen-devel@lists.xen.org>
Subject: Re: [PATCH v5 RFC 01/14] docs: libxc migration stream specification
Date: Sun, 22 Jun 2014 17:01:13 +0100	[thread overview]
Message-ID: <53A6FDC9.3040908@citrix.com> (raw)
In-Reply-To: <CAP8mzPNJiDpMpcTcb9f434LpJqKq4MN9Tw=MXUBG_Mo1OJ-Qpg@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 6087 bytes --]

On 22/06/14 15:36, Shriram Rajagopalan wrote:
>
>
> On Jun 19, 2014 4:16 PM, "Andrew Cooper" <andrew.cooper3@citrix.com
> <mailto:andrew.cooper3@citrix.com>> wrote:
> >
> > On 19/06/14 11:23, Hongyang Yang wrote:
> > > On 06/19/2014 05:36 PM, Andrew Cooper wrote:
> > >> On 19/06/14 10:13, Hongyang Yang wrote:
> > >>> Hi Andrew, Ian,
> > >>>
> > >>> On 06/18/2014 02:04 AM, Andrew Cooper wrote:
> > >>>> On 17/06/14 17:40, Ian Campbell wrote:
> > >>>>> On Wed, 2014-06-11 at 19:14 +0100, Andrew Cooper wrote:
> > >>>>>> +The following features are not yet fully specified and will be
> > >>>>>> +included in a future draft.
> > >>>>>> +
> > >>>>>> +* Remus
> > >>>>> What is the plan for Remus here?
> > >>>>>
> > >>>>> It has pretty large implications for the flow of a migration
> > >>>>> stream and
> > >>>>> therefore on the code in the final two patches, I suspect it will
> > >>>>> require high level changes to those functions, so I'm reluctant to
> > >>>>> spend
> > >>>>> a lot of time on them as they are.
> > >>>>
> > >>>> I don't believe too much change will be required to the final two
> > >>>> patches, but it does depend on fixing the current qemu record layer
> > >>>> violations.
> > >>>>
> > >>>> It will be much easier to do after a prototype to the libxl level
> > >>>> fixes.
> > >>>
> > >>> I'm trying to porting Remus to migration v2...
> > >>
> > >> Ah fantastic! Here I was expecting to have eventually brave that code
> > >> myself.
> > >>
> > >> How is it going?  How are you finding hacking on v2 compared to the
> > >> legacy code? (I think you are the first person who isn't me trying to
> > >> extend it)  Is there anything I can do while still developing v2
> to make
> > >> things easier?
> > >
> > > It's just starting, but only on libxc side based on your patch series.
> > > v2 code is more cleaner than legacy code, easy to understand, and yes,
> > > make hacking easier. Maybe I will need your help when the hacking goes
> > > on...
> > >
> > >>
> > >>
> > >> I really need to get a prototype libxl framing document sorted,
> but in
> > >> principle my plan (given only a minimum understanding of the
> algorithm)
> > >> is this:
> > >>
> > >> ...
> > >> * Write page data update
> > >> * Write vcpu context etc
> > >> * Write a REMUS_CHECKPOINT record (or appropriate name)
> > >> * Call the checkpoint callback, passing ownership of the fd to libxl
> > >> ** libxl writes a libxl qemu record into the stream
> > >> * checkpoint callback returns to libxl, returning ownership of the fd
> > >> * libxc chooses between sending an END record or looping
> > >> ...
> > >>
> > >> The fd ownership is expected to work exactly the same on the
> receiving
> > >> side, using the REMUS_CHECKPOINT record as an indicator.
> > >
> > > It mostly looks plausible, but the save side and restore side needs to
> > > be synchronised, otherwise, the following problem may exists:
> > >   sending side is in libxl and send qemu records, receiving side still
> > >   in libxc, after it is switched to libxl, part of record may lose.
> > > maybe a handshake will solve the problem, weather it's in libxl or
> libxc,
> > > but current migration frame dose not support send msgs from receiving
> > > side
> > > to sending side, so it need modifications. We should support this
> > > feature.
> >
> > Ah yes I see.
> >
> > How about this?
> >
> > Libxc REMUS_CHECKPOINT is defined as a 0-length record (like the current
> > END record).
> > Libxl REMUS_CHECKPOINT is defined containing at least "last checkpoint"
> > bit in the header.
> >
> > Libxc writes a libxc REMUS_CHECKPOINT record into the stream and always
> > hands the fd to libxl.
> > Libxl then writes a libxl REMUS_CHECKPOINT record, including the last
> > checkpoint bit if needed.
> >
>
> I am a bit lost on this part. A silly question: the last I recall (a
> long time ago), the v2 format didn't allow for the page compression to
> be done asynchronously. Has this limitation changed?
>

The v2 format specifies records in a stream; nothing more.  It has no
bearing on whether the page compression happens asynchronously wrt
unpausing the domain or not.

I presume you actually mean the current implementation...

> IOW, in the current migration process, the dirty page data is written
> out while the guest remains suspended. With remus, the compressed page
> data is written out after resuming the guest. This deferred write out
> logic needs to be incorporated into v2 code.
>

... which is the way it is because the first implementation was done
with regular basic migration as a top priority.  This can certainly be
reworked when remus support is reintroduced.

> > This means that it is libxl on the receiving side which determines
> > whether the last checkpoint has been reached, and libxc must always pass
> > the fd up.  This fixes the synchronisation issues, without requiring a
> > back channel, but still maintaining appropriate layering.
> >
>
> So there is a TODO item in the current libxl-remus patches. We need an
> explicit acknowledgement from the reveiver side that it has gotten the
> memory checkpoint. Whether it is from libxc or libxl on the receiver
> side does not matter, as long as the ack signifies reception of the
> memory checkpoint.
> The need for an explicit memory ack is because the disk and memory
> checkpoint channels are independent.
> We need both acks before releasing the buffered network output on the
> receiver side.
>   The disk channel (blktap2 or DRBD ) has always sent an explicit ack.
> But not the memory channel. Though its over TCP, on a given iteration,
> memory checkpoint data may still reside on the sender side socket
> buffer while the disk checkpoint has reached the other end -- which
> isn't good.
>
> Existing libxc code does a fdatasync or fsync on the fd at the end of
> each iteration. I don't think it works as intended on TCP sockets.
> Please correct me if I am wrong about this.
>

That is a very sensible need for an explicit ack, although it would seem
to make more sense at the libxl level rather than the libxc level.

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 9486 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2014-06-22 16:01 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-11 18:14 [PATCH v5 0/14] Migration Stream v2 Andrew Cooper
2014-06-11 18:14 ` [PATCH v5 RFC 01/14] docs: libxc migration stream specification Andrew Cooper
2014-06-12  9:45   ` David Vrabel
2014-06-12 15:26   ` David Vrabel
2014-06-17 15:20   ` Ian Campbell
2014-06-17 17:42     ` Andrew Cooper
2014-06-17 16:40   ` Ian Campbell
2014-06-17 18:04     ` Andrew Cooper
2014-06-19  9:13       ` Hongyang Yang
2014-06-19  9:36         ` Andrew Cooper
2014-06-19 10:23           ` Hongyang Yang
2014-06-19 10:44             ` Andrew Cooper
2014-06-22 14:36               ` Shriram Rajagopalan
2014-06-22 16:01                 ` Andrew Cooper [this message]
2014-06-11 18:14 ` [PATCH v5 RFC 02/14] scripts: Scripts for inspection/valdiation of legacy and new streams Andrew Cooper
2014-06-12  9:48   ` David Vrabel
2014-06-11 18:14 ` [PATCH v5 RFC 03/14] [HACK] tools/libxc: save/restore v2 framework Andrew Cooper
2014-06-17 16:00   ` Ian Campbell
2014-06-17 16:17     ` Andrew Cooper
2014-06-17 16:47       ` Ian Campbell
2014-06-11 18:14 ` [PATCH v5 RFC 04/14] tools/libxc: C implementation of stream format Andrew Cooper
2014-06-12  9:52   ` David Vrabel
2014-06-12 15:31   ` David Vrabel
2014-06-17 15:55     ` Ian Campbell
2014-06-11 18:14 ` [PATCH v5 RFC 05/14] tools/libxc: noarch common code Andrew Cooper
2014-06-12  9:55   ` David Vrabel
2014-06-17 16:10   ` Ian Campbell
2014-06-17 16:28     ` Andrew Cooper
2014-06-17 16:53       ` Ian Campbell
2014-06-17 18:26         ` Andrew Cooper
2014-06-18  9:19           ` Ian Campbell
2014-06-11 18:14 ` [PATCH v5 RFC 06/14] tools/libxc: x86 " Andrew Cooper
2014-06-12  9:57   ` David Vrabel
2014-06-17 16:11     ` Ian Campbell
2014-06-11 18:14 ` [PATCH v5 RFC 07/14] tools/libxc: x86 PV " Andrew Cooper
2014-06-12  9:59   ` David Vrabel
2014-06-11 18:14 ` [PATCH v5 RFC 08/14] tools/libxc: x86 PV save code Andrew Cooper
2014-06-12 10:04   ` David Vrabel
2014-06-11 18:14 ` [PATCH v5 RFC 09/14] tools/libxc: x86 PV restore code Andrew Cooper
2014-06-12 10:08   ` David Vrabel
2014-06-12 15:49   ` David Vrabel
2014-06-12 17:01     ` Andrew Cooper
2014-06-17 16:22       ` Ian Campbell
2014-06-11 18:14 ` [PATCH v5 RFC 10/14] tools/libxc: x86 HVM common code Andrew Cooper
2014-06-12 10:11   ` David Vrabel
2014-06-17 16:22     ` Ian Campbell
2014-06-11 18:14 ` [PATCH v5 RFC 11/14] tools/libxc: x86 HVM save code Andrew Cooper
2014-06-12 10:12   ` David Vrabel
2014-06-12 15:55   ` David Vrabel
2014-06-12 17:07     ` Andrew Cooper
2014-06-17 16:25       ` Ian Campbell
2014-06-11 18:14 ` [PATCH v5 RFC 12/14] tools/libxc: x86 HVM restore code Andrew Cooper
2014-06-12 10:14   ` David Vrabel
2014-06-11 18:14 ` [PATCH v5 RFC 13/14] tools/libxc: noarch save code Andrew Cooper
2014-06-12 10:24   ` David Vrabel
2014-06-17 16:28     ` Ian Campbell
2014-06-17 16:38       ` David Vrabel
2014-06-17 16:54         ` Ian Campbell
2014-06-18  6:59   ` Hongyang Yang
2014-06-18  7:08     ` Hongyang Yang
2014-06-19  2:48   ` Wen Congyang
2014-06-19  9:19     ` Andrew Cooper
2014-06-22 14:02       ` Shriram Rajagopalan
2014-06-11 18:14 ` [PATCH v5 RFC 14/14] tools/libxc: noarch restore code Andrew Cooper
2014-06-12 10:27   ` David Vrabel
2014-06-12 16:05   ` David Vrabel
2014-06-12 17:16     ` Andrew Cooper
2014-06-19  6:16   ` Hongyang Yang
2014-06-19  9:00     ` Andrew Cooper
2014-06-12  3:17 ` [PATCH v5 0/14] Migration Stream v2 Hongyang Yang
2014-06-12 13:27   ` Andrew Cooper
2014-06-12 13:49     ` Wei Liu
2014-06-12 14:18       ` Andrew Cooper
2014-06-12 14:27         ` Wei Liu
2014-06-12  9:38 ` David Vrabel
2014-06-17 15:57   ` Ian Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53A6FDC9.3040908@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=rshriram@cs.ubc.ca \
    --cc=xen-devel@lists.xen.org \
    --cc=yanghy@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.