From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shriram Rajagopalan Subject: Re: libxl - API call to return sxpr of a domain? Date: Thu, 9 Jun 2011 19:59:37 -0400 Message-ID: References: <1307437379.775.513.camel@zakaz.uk.xensource.com> <1307463411.775.652.camel@zakaz.uk.xensource.com> <1307619285.775.823.camel@zakaz.uk.xensource.com> Reply-To: rshriram@cs.ubc.ca Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1563878537==" Return-path: In-Reply-To: <1307619285.775.823.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Ian Campbell Cc: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org --===============1563878537== Content-Type: multipart/alternative; boundary=bcaec52d51d73aed5304a550416d --bcaec52d51d73aed5304a550416d Content-Type: text/plain; charset=ISO-8859-1 On Thu, Jun 9, 2011 at 7:34 AM, Ian Campbell wrote: > On Wed, 2011-06-08 at 16:55 +0100, Shriram Rajagopalan wrote: > > > On the receiving end, there is "no" Remus receiver process. > > Well, there are some remus related patches, that have long been > > integrated into xc_domain_restore, but apart from that, everything > > else is as-is. > > OK. > > > The only remus specific part on rx side, is the blktap2 userspace > > driver (block-remus), which again gets activated by usual Xend control > > flow (as it tries to create a tap device). But I dont think this needs > > special treatment as long as xl can parse/accept spec like > > tap:remus:backupHost:port|aio:/dev/foo (or tap2:remus:.. ). > > and launch the appropriate blktap2 backend driver (this system is > > already in place, afaik). > > Hmm. Please see docs/misc/xl-disk-configuration.txt for the > configuration syntax understood by xl. Also note that IanJ has a series > outstanding which improves the syntax, including compat with xend > syntaxes and makes it more extensible for the future. The series > includes an updated version of the doc, you'd be better off reading the > new version than what is currently in the tree. A pre-patched version is > attached. > > It doesn't currently support "remus:" and the "foo:" prefixes are in > general deprecated. It looks like "remus:" will fall into the category > of things which are supported via the script= directive. We've also > grandfathered some "foo:" prefixes as shorthand for the script syntax > (this is also how xend implemented them), so I think this will continue > to work (assuming calling a script is how this works in xend, if not > such a script might be needed). > > The "foo|bar" syntax is completely new to me (and I suspect anyone else > not familiar with remus). How does it work? Is the full > "backupHost:port|aio:/dev/foo" considered the argument to Remus (in > which case I think it can work via the Remus script as above) or does > xend somehow parse this into "remus:backupHost:port" and "aio:/dev/foo"? > In the latter case I've no idea what to suggest! > > I dont think the script= directive is going to work (or even necessary). The entire "foo|bar" part is handled by the blktap2 code base. IOW, if the disk spec is tap:remus:host:port|aio:/dev/abc, then xl invokes the blktap2 code and passes remus:host:port|aio:/dev/abc , which gets parsed and both remus and aio drivers are created (remus driver on top of aio). Have you considered making Remus a more top-level domain configuration > option rather than disk specific? i.e. adding remus_backup = "..." to > the cfg. This would allow libxl to do the right thing internally and > setup the disks in the right way etc etc. > > Yes I have, several times. Wading through xend code was not so much fun :(. With xl, as long as it can construct the "remus:host:port|aio:/dev/abc" arg and pass it to the blktap2 code, things should be fine. With a DRBD based backend, nothing of this sort is required. Xend automatically invokes the block-drbd script, which does the rest. If xl does the same, then things should be fine. > Doing The Right thing is something we are striving towards with libxl, > especially with disk config which is unnecessarily complex for users. > > e.g. it should not be necessary for a user to specifically ask for tap > or phy etc, rather they should present the path to the thing and libxl > should figure out if blkback or blktap is needed. For example if Remus > were enabled then it should DTRT and always select blktap even if > blkback is otherwise suitable. > The bulk of Remus transmission data is in libxc and hence is agnostic > > to both xend/xl. It basically prolongs the last iteration for > > eternity. It supplies a callback handler for checkpoint, which adds > > the "wait" time before the next suspend (e.g., suspend every 50ms). In > > case of Xend, the checkpoint handler is not supplied and hence the > > domain is suspended as soon as the previous iteration finishes. > > I think exposing such a callback is within the scope of the libxl API. > For example libxl_domain_checkpoint(...., callback) and > libxl_domain_suspend(...) can probably backend onto the same internal > function. > > Another option to the callbacks might be to integrate with the libxl > event handling mechanism. Note that IanJ wants to overhall this from its > current state. I'm less sure whether this would make sense. > > > > > (a) On the sending side, without Remus, Xend control flow is as > > follows: > [...] > > looks mostly the same as xl, except xl does all the xc_domain_save stuff > in process rather than indirecting via an external binary. Do you mean that xl does all the xend stuff ? Because xl still calls xc_domain_save in libxl_dom.c:libxl__domain_suspend_common > Also xl has > to take care of starting a receiver process on the other end and has a > bit more of a protocol interlock surrounding the actual migration to try > and ensure the other end really is ready and hasn't failed etc. > > > The callback structure has two other handlers (postcopy aka > > postresume, checkpoint) that > > is used by Remus. > > ************************* > > (b) On sending side, with Remus > > remus > > I suppose here there is a choice between adding libxl/xl support to this > remus binary or implementing "xl remus ". > > The latter is what I wanted to do. > > (i) tools/remus/remus: > > - calls tools/python/xen/remus/vm.py:VM(domid) > > - vm.py:VM issues xmlrpc call to Xend to obtain domid's > > sxpr and extract out the disk/vif info. > > Could be done via the libxl python bindings in the xl case? > > yep > > (ii) create the "buffers" for disk & vif. > > Stays the same, I guess, if you stick with the remus tool. > > > (iii) Connect with remote host's Xend socket and send the sxp > > info. [same as (i) for non Remus case] > > Hrm, this would involve duplicating a bunch of xl functionality to start > the receiver, and run the xl protocol etc. > > That rather suggests that at least this bit should be in xl itself > rather than remus. This needn't necessarily involve putting everything > in xl, but just forking xl for this bit. > > > (iv) tools/python/xen/remus/save.py:Saver uses libcheckpoint > > to initiate checkpointing. > > tools/python/xen/lowlevel/checkpoint: has > > suspend/resume handlers similar to xc_save.c > > trampoline functions to bounce the callbacks for > > suspend, postcopy and checkpoint to their > > python equivalents. > > I think either handling these inside libxl or bouncing them to the > caller (depending on the nature of the callback) would be reasonable. > > > > > > > tools/python/xen/lowlevel/checkpoint/libcheckpoint.c:checkpoint_start > > calls xc_domain_save with > > all needed callback handlers. > > ---> functionally equivalent to (ii) in non-Remus > > case. > > (v) xc_domain_save: (after the initial iterations) > > copypages: > > send dirty pages & tailbuf data > > postcopy_callback() [resumes domain] > > checkpoint_callback() > > netbuffer_checkpoint() [python - communicates via > > netlink to sch_plug] > > diskbuffer_checkpoint() [python - communicates via > > fifo to block-remus] > > sleep(50ms) [or whatever checkpoint interval] > > return > > suspend_callback() > > goto copypages > > > > Hope that explains the control flow. > > I think so. Thanks. > > Hopefully some of the suggestions even make sense and demonstrate my new > understanding ;-) > > > > > shriram > > > > Ian. > > > > > > shriram --bcaec52d51d73aed5304a550416d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On Thu, Jun 9, 2011 at 7:34 AM, Ian Campbell <Ian.Campbe= ll@eu.citrix.com> wrote:
On Wed, 2011-06-08 at 16:55 +0100, Shriram Rajagopalan wr= ote:

> On the receiving end, there is "no" Remus receiver process.<= br> > Well, there are some remus related patches, that have long been
> integrated into xc_domain_restore, but apart from that, everything
> else is as-is.

OK.

> The only remus specific part on rx side, is the blktap2 userspace
> driver (block-remus), which again gets activated by usual Xend control=
> flow (as it tries to create a tap device). But I dont think this needs=
> special treatment as long as xl can parse/accept spec like
> tap:remus:backupHost:port|aio:/dev/foo (or tap2:remus:.. ).
> and launch the appropriate blktap2 backend driver (this system is
> already in place, afaik).

Hmm. Please see docs/misc/xl-disk-configuration.txt for the
configuration syntax understood by xl. Also note that IanJ has a series
outstanding which improves the syntax, including compat with xend
syntaxes and makes it more extensible for the future. The series
includes an updated version of the doc, you'd be better off reading the=
new version than what is currently in the tree. A pre-patched version is attached.

It doesn't currently support "remus:" and the "foo:"= ; prefixes are in
general deprecated. It looks like "remus:" will fall into the cat= egory
of things which are supported via the script=3D directive. We've also grandfathered some "foo:" prefixes as shorthand for the script sy= ntax
(this is also how xend implemented them), so I think this will continue
to work (assuming calling a script is how this works in xend, if not
such a script might be needed).

The "foo|bar" syntax is completely new to me (and I suspect anyon= e else
not familiar with remus). How does it work? Is the full
"backupHost:port|aio:/dev/foo" considered the argument to Remus (= in
which case I think it can work via the Remus script as above) or does
xend somehow parse this into "remus:backupHost:port" and "ai= o:/dev/foo"?
In the latter case I've no idea what to suggest!

I dont think the script=3D directive is going to work= (or even
necessary). The entire "foo|bar" part is handled by = the blktap2 code base.
IOW, if the disk spec is tap:remus:host:port|aio:= /dev/abc, then xl invokes
the blktap2 code and passes remus:host:port|aio:/dev/abc , which gets
pa= rsed and both remus and aio drivers are created (remus driver on top of aio= ).

Have you considered making Remus a more top-level domain configuration
option rather than disk specific? i.e. adding remus_backup =3D "...&qu= ot; to
the cfg. This would allow libxl to do the right thing internally and
setup the disks in the right way etc etc.

Yes I have, several times. Wading through xend code w= as not so much fun :(.
With xl, as long as it can construct the "re= mus:host:port|aio:/dev/abc" arg and
pass it to the blktap2 code, th= ings should be fine.
=A0With a DRBD based backend, nothing of this sort is required. Xend automa= tically
invokes the block-drbd script, which does the rest. If xl does t= he same, then things
should be fine.
Doing The Right thing is something we are striving towards with libxl,
especially with disk config which is unnecessarily complex for users.

e.g. it should not be necessary for a user to specifically ask for tap
or phy etc, rather they should present the path to the thing and libxl
should figure out if blkback or blktap is needed. For example if Remus
were enabled then it should DTRT and always select blktap even if
blkback is otherwise suitable.
> The bulk of Remus transmission data is in libxc and hence is agnostic<= br> > to both xend/xl. It basically prolongs the last iteration for
> eternity. It supplies a callback handler for checkpoint, which adds > the "wait" time before the next suspend (e.g., suspend every= 50ms). In
> case of Xend, the checkpoint handler is not supplied and hence the
> domain is suspended as soon as the previous iteration finishes.

I think exposing such a callback is within the scope of the libxl API= .
For example libxl_domain_checkpoint(...., callback) and
libxl_domain_suspend(...) can probably backend onto the same internal
function.

Another option to the callbacks might be to integrate with the libxl
event handling mechanism. Note that IanJ wants to overhall this from its current state. I'm less sure whether this would make sense.

>
> (a) On the sending side, without Remus, Xend control flow is as
> follows:
[...]

looks mostly the same as xl, except xl does all the xc_domain_save stuff in process rather than indirecting via an external binary.
Do you mean that xl does all the xend stuff ? Because xl still calls xc_d= omain_save
in libxl_dom.c:libxl__domain_suspend_common
=A0
Also xl has
to take care of starting a receiver process on the other end and has a
bit more of a protocol interlock surrounding the actual migration to try and ensure the other end really is ready and hasn't failed etc.
=A0
> The callback structure has two other handlers (postcopy aka
> postresume, checkpoint) that
> is used by Remus.
> *************************
> (b) On sending side, with Remus
> =A0 =A0 =A0 remus <domain> <host>

I suppose here there is a choice between adding libxl/xl support to t= his
remus binary or implementing "xl remus <domain> <host>&quo= t;.

The latter is what I wanted t= o do.
<= div class=3D"im"> > =A0 =A0 =A0 =A0 =A0(i) tools/remus/remus:
> =A0 =A0 =A0 =A0 =A0 =A0 - calls tools/python/xen/remus/vm.py:VM(domid)=
> =A0 =A0 =A0 =A0 =A0 =A0 - vm.py:VM issues xmlrpc call to Xend to obtai= n domid's
> sxpr and extract out the disk/vif info.

Could be done via the libxl python bindings in the xl case?

yep
> =A0 =A0 =A0 =A0 =A0(ii) create the "buffers" for disk & = vif.

Stays the same, I guess, if you stick with the remus tool.

> =A0 =A0 =A0 =A0 =A0(iii) Connect with remote host's Xend socket an= d send the sxp
> info. [same as (i) for non Remus case]

Hrm, this would involve duplicating a bunch of xl functionality to st= art
the receiver, and run the xl protocol etc.

That rather suggests that at least this bit should be in xl itself
rather than remus. This needn't necessarily involve putting everything<= br> in xl, but just forking xl for this bit.

> =A0 =A0 =A0 =A0 =A0 (iv) tools/python/xen/remus/save.py:Saver uses lib= checkpoint
> to initiate checkpointing.
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 tools/python/xen/lowlevel/checkpoint: = has
> suspend/resume handlers similar to xc_save.c
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 trampoline functions to bounce the cal= lbacks for
> suspend, postcopy and checkpoint to their
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 python equivalents.

I think either handling these inside libxl or bouncing them to the caller (depending on the nature of the callback) would be reasonable.

>
>
> tools/python/xen/lowlevel/checkpoint/libcheckpoint.c:checkpoint_start<= br> > calls xc_domain_save with
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0all needed callback handlers.
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0---> functionally equiva= lent to (ii) in non-Remus
> case.
> =A0 =A0 =A0 =A0 =A0 =A0(v) xc_domain_save: (after the initial iteratio= ns)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 copypages:
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0send dirty pages & tailbuf data
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0postcopy_callback() [resumes domain] > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0checkpoint_callback()
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0netbuffer_checkpoint() [python = - communicates via
> netlink to sch_plug]
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0diskbuffer_checkpoint() [python= - communicates via
> fifo to block-remus]
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0sleep(50ms) [or whatever checkp= oint interval]
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 suspend_callback()
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto copypages
>
> Hope that explains the control flow.

I think so. Thanks.

Hopefully some of the suggestions even make sense and demonstrate my new understanding ;-)

>
> shriram
>
> =A0 =A0 =A0 =A0 Ian.
>
>

shriram
--bcaec52d51d73aed5304a550416d-- --===============1563878537== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --===============1563878537==--