From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shriram Rajagopalan Subject: Re: libxl - API call to return sxpr of a domain? Date: Wed, 8 Jun 2011 11:55:20 -0400 Message-ID: References: <1307437379.775.513.camel@zakaz.uk.xensource.com> <1307463411.775.652.camel@zakaz.uk.xensource.com> Reply-To: rshriram@cs.ubc.ca Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1325964340==" Return-path: In-Reply-To: <1307463411.775.652.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Ian Campbell Cc: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org --===============1325964340== Content-Type: multipart/alternative; boundary=0016e68deb3d73f00004a5355f63 --0016e68deb3d73f00004a5355f63 Content-Type: text/plain; charset=ISO-8859-1 On Tue, Jun 7, 2011 at 12:16 PM, Ian Campbell wrote: > On Tue, 2011-06-07 at 16:30 +0100, Shriram Rajagopalan wrote: > > On Tue, Jun 7, 2011 at 5:02 AM, Ian Campbell > > wrote: > > > > On Tue, 2011-06-07 at 04:30 +0100, Shriram Rajagopalan wrote: > > > I am looking into adding Remus support for libxl. The > > easiest way is > > > to obtain the domain's sxpr, so that the rest of Remus > > python code > > > stays as is. > > > > > > Is there an api call in libxl to return a domain's sxpr ? a > > grep on > > > the libxl code > > > base returned nothing. Or am I missing something pretty > > obvious? > > > > > > xl has some code to do this but libxl doesn't. An sxpr > > representation of > > a domain is rather a xend specific concept which is the only > > reason xl > > has it. > > There are some plans to allow libxl to generate json for any > > of the IDL > > defined datastructures, mostly as a convenient pretty-printer > > but being > > machine parsable is a handy side-effect. Currently this would > > just be > > for individual datastructures though. > > > > Where/how does remus use sxp? > > tools/python/xen/remus/vm.py:domtosxpr() > > seems to consume a xend datastructure and make a Remus sxp out > > of it -- > > can an xl equivalent not be written using the python bindings? > > (NB > > bindings may be incomplete, we can fix up as you discover > > stuff). Are > > all usages of sxp in Remus of that particular sxp format or > > are there > > others? > > > > > > The only reason remus uses sxpr is because xend conveys info in that > > form. Basically, it only needs the vif device name (vif1.0, etc), the > > disk device name and the access format (tap/drbd) for proper > > operation. > > ok, this stuff should be available to xl/libxl (as appropriate) pretty > easily. > > [...] > > The reason for bypassing the usual xend live migration code path is > > because of the callbacks, the checkpoint interval based > > suspend/resume, etc. Now that I know that xl/libxl doesnt use sxpr in > > its wire-protocol (dunce! :( ), the plan would have to be different. > > > > (a) Follow the same implementation style like that with xend (bypass > > xl's live migration mechanism) - involves some code duplication > > probably for communicating with remote machine, in xl's wire protocol. > > The advantage is most of remus' python code (save.py, device.py, > > qdisc.py, code to install/parse IFB devices, tc rules, etc) stays as > > is. > > > > (b) integrate the remus control flow into xl/libxl stack - I dont know > > how much work that would be yet. > > I don't know enough about the needs etc of Remus to make much in the way > of concrete proposals but in general plan b is the sort of thing we > would prefer since all toolstacks can then benefit (at least to some > extent). > > Certainly I would prefer to see libxl functions which provide the > necessary interfaces (likely sharing common code within the library) etc > to duplication of the code. > > Perhaps you could quickly explain the Remus architecture within the xend > world, which might help us to advise. e.g. How are things different on > the tx and rx sides with and without Remus? What additional callbacks > and control flow are there etc? > > Do I gather correctly that the thing on the receiving end is not xend > but rather a Remus process? > > On the receiving end, there is "no" Remus receiver process. Well, there are some remus related patches, that have long been integrated into xc_domain_restore, but apart from that, everything else is as-is. The only remus specific part on rx side, is the blktap2 userspace driver (block-remus), which again gets activated by usual Xend control flow (as it tries to create a tap device). But I dont think this needs special treatment as long as xl can parse/accept spec like tap:remus:backupHost:port|aio:/dev/foo (or tap2:remus:.. ). and launch the appropriate blktap2 backend driver (this system is already in place, afaik). The bulk of Remus transmission data is in libxc and hence is agnostic to both xend/xl. It basically prolongs the last iteration for eternity. It supplies a callback handler for checkpoint, which adds the "wait" time before the next suspend (e.g., suspend every 50ms). In case of Xend, the checkpoint handler is not supplied and hence the domain is suspended as soon as the previous iteration finishes. (a) On the sending side, without Remus, Xend control flow is as follows: xm migrate --live (i) XendCheckpoint:save [which writes the signature record, sxp to the socket] and issues "xc_save " (ii) xc_save calls xc_domain_save with appropriate callback handlers for suspend & switch_qemu_logdirty only. These handlers are in libxc/xcutils/xc_save.c. (iv) xc_domain_save: send dirty pages for max_iters if (last_iter) suspend_callback() send final set of dirty pages send tailbuf data The callback structure has two other handlers (postcopy aka postresume, checkpoint) that is used by Remus. ************************* (b) On sending side, with Remus remus (i) tools/remus/remus: - calls tools/python/xen/remus/vm.py:VM(domid) - vm.py:VM issues xmlrpc call to Xend to obtain domid's sxpr and extract out the disk/vif info. (ii) create the "buffers" for disk & vif. (iii) Connect with remote host's Xend socket and send the sxp info. [same as (i) for non Remus case] (iv) tools/python/xen/remus/save.py:Saver uses libcheckpoint to initiate checkpointing. tools/python/xen/lowlevel/checkpoint: has suspend/resume handlers similar to xc_save.c trampoline functions to bounce the callbacks for suspend, postcopy and checkpoint to their python equivalents. tools/python/xen/lowlevel/checkpoint/libcheckpoint.c:checkpoint_start calls xc_domain_save with all needed callback handlers. ---> functionally equivalent to (ii) in non-Remus case. (v) xc_domain_save: (after the initial iterations) copypages: send dirty pages & tailbuf data postcopy_callback() [resumes domain] checkpoint_callback() netbuffer_checkpoint() [python - communicates via netlink to sch_plug] diskbuffer_checkpoint() [python - communicates via fifo to block-remus] sleep(50ms) [or whatever checkpoint interval] return suspend_callback() goto copypages Hope that explains the control flow. shriram > Ian. > > --0016e68deb3d73f00004a5355f63 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On Tue, Jun 7, 2011 at 12:16 PM, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote:
On Tue, 2011-06-07 at 16:30 +0100, Shriram Rajagopalan= wrote:
> On Tue, Jun 7, 2011 at 5:02 AM, Ian Campbell <Ian.Campbell@citrix.com>
> wrote:
>
> =A0 =A0 =A0 =A0 On Tue, 2011-06-07 at 04:30 +0100, Shriram Rajagopalan= wrote:
> =A0 =A0 =A0 =A0 > I am looking into adding Remus support for libxl.= The
> =A0 =A0 =A0 =A0 easiest way is
> =A0 =A0 =A0 =A0 > to obtain the domain's sxpr, so that the rest= of Remus
> =A0 =A0 =A0 =A0 python code
> =A0 =A0 =A0 =A0 > stays as is.
> =A0 =A0 =A0 =A0 >
> =A0 =A0 =A0 =A0 > Is there an api call in libxl to return a domain&= #39;s sxpr ? a
> =A0 =A0 =A0 =A0 grep on
> =A0 =A0 =A0 =A0 > the libxl code
> =A0 =A0 =A0 =A0 > base returned nothing. Or am I missing something = pretty
> =A0 =A0 =A0 =A0 obvious?
>
>
> =A0 =A0 =A0 =A0 xl has some code to do this but libxl doesn't. An = sxpr
> =A0 =A0 =A0 =A0 representation of
> =A0 =A0 =A0 =A0 a domain is rather a xend specific concept which is th= e only
> =A0 =A0 =A0 =A0 reason xl
> =A0 =A0 =A0 =A0 has it.
> =A0 =A0 =A0 =A0 There are some plans to allow libxl to generate json f= or any
> =A0 =A0 =A0 =A0 of the IDL
> =A0 =A0 =A0 =A0 defined datastructures, mostly as a convenient pretty-= printer
> =A0 =A0 =A0 =A0 but being
> =A0 =A0 =A0 =A0 machine parsable is a handy side-effect. Currently thi= s would
> =A0 =A0 =A0 =A0 just be
> =A0 =A0 =A0 =A0 for individual datastructures though.
>
> =A0 =A0 =A0 =A0 Where/how does remus use sxp?
> =A0 =A0 =A0 =A0 tools/python/xen/remus/vm.py:domtosxpr()
> =A0 =A0 =A0 =A0 seems to consume a xend datastructure and make a Remus= sxp out
> =A0 =A0 =A0 =A0 of it --
> =A0 =A0 =A0 =A0 can an xl equivalent not be written using the python b= indings?
> =A0 =A0 =A0 =A0 (NB
> =A0 =A0 =A0 =A0 bindings may be incomplete, we can fix up as you disco= ver
> =A0 =A0 =A0 =A0 stuff). Are
> =A0 =A0 =A0 =A0 all usages of sxp in Remus of that particular sxp form= at or
> =A0 =A0 =A0 =A0 are there
> =A0 =A0 =A0 =A0 others?
>
>
> The only reason remus uses sxpr is because xend conveys info in that > form. Basically, it only needs the vif device name (vif1.0, etc), the<= br> > disk device name and the access format (tap/drbd) for proper
> operation.

ok, this stuff should be available to xl/libxl (as appropriate)= pretty
easily.

[...]
> The reason for bypassing the usual xend live migration code path = is
> because of the =A0callbacks, the checkpoint interval based
> suspend/resume, etc. Now that I know that xl/libxl doesnt use sxpr in<= br> > its wire-protocol (dunce! :( ), the plan would have to be different. >
> (a) Follow the same implementation style like that with xend (bypass > xl's live migration mechanism) - involves some code duplication > probably for communicating with remote machine, in xl's wire proto= col.
> The advantage is most of remus' python code (save.py, =A0device.py= ,
> qdisc.py, code to install/parse IFB devices, tc rules, etc) stays as > is.
>
> (b) integrate the remus control flow into xl/libxl stack - I dont know=
> how much work that would be yet.

I don't know enough about the needs etc of Remus to make much in = the way
of concrete proposals but in general plan b is the sort of thing we
would prefer since all toolstacks can then benefit (at least to some
extent).

Certainly I would prefer to see libxl functions which provide the
necessary interfaces (likely sharing common code within the library) etc to duplication of the code.

Perhaps you could quickly explain the Remus architecture within the xend world, which might help us to advise. e.g. How are things different on
the tx and rx sides with and without Remus? What additional callbacks
and control flow are there etc?

Do I gather correctly that the thing on the receiving end is not xend
but rather a Remus process?

On the receiving end, = there is "no" Remus receiver process.
Well, there are some rem= us related patches, that have long been integrated
into xc_domain_restor= e, but apart from that, everything else is as-is.

The only remus specific part on rx side, is the blktap2 userspace drive= r (block-remus),
which again gets activated by usual Xend control flow (= as it tries to create a tap device).
But I dont think this needs special= treatment as long as xl can parse/accept spec like
=A0tap:remus:backupHost:port|aio:/dev/foo (or tap2:remus:.. ).
and launc= h the appropriate blktap2 backend driver (this system is already in place, = afaik).

The bulk of Remus transmission data is in libxc and hence is= agnostic to both
xend/xl. It basically prolongs the last iteration for eternity. It supplies= a callback
handler for checkpoint, which adds the "wait" time before the nex= t suspend (e.g., suspend
every 50ms). In case of Xend, the checkpoint handler is not supplied and he= nce the domain
is suspended as soon as the previous iteration finishes.

(a) On the sending side, without Remus, Xend control flow is as follows= :
=A0=A0 xm migrate --live <domain> <host>
=A0=A0 =A0 (i)= XendCheckpoint:save [which writes the signature record, sxp to the socket]=
=A0 =A0=A0 =A0 =A0=A0 and issues "xc_save <params>"
=A0 =A0=A0 (ii) xc_save calls xc_domain_save with appropriate callback hand= lers for suspend
=A0 =A0 =A0 =A0 =A0 =A0 & switch_qemu_logdirty only= . These handlers are in libxc/xcutils/xc_save.c.
=A0=A0=A0=A0 (iv) xc_do= main_save:
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 send dirty pages for= max_iters
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 if (last_iter) suspend_callback()=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 send final set of dirty pages
= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 send tailbuf data

The callba= ck structure has two other handlers (postcopy aka postresume, checkpoint) t= hat
is used by Remus.
*************************
(b) On sending side, with= Remus
=A0=A0=A0=A0=A0 remus <domain> <host>
=A0=A0=A0=A0= =A0=A0=A0=A0 (i) tools/remus/remus:
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 - = calls tools/python/xen/remus/vm.py:VM(domid)
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 - vm.py:VM issues xmlrpc call to Xend to = obtain domid's sxpr and extract out the disk/vif info.
=A0=A0=A0=A0= =A0=A0=A0=A0 (ii) create the "buffers" for disk & vif.
=A0= =A0=A0=A0=A0=A0=A0=A0 (iii) Connect with remote host's Xend socket and = send the sxp info. [same as (i) for non Remus case]

=A0=A0=A0=A0=A0=A0=A0=A0=A0 (iv) tools/python/xen/remus/save.py:Saver u= ses libcheckpoint to initiate checkpointing.
=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0 tools/python/xen/lowlevel/checkpoint: has suspend/resume= handlers similar to xc_save.c
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0 trampoline functions to bounce the callbacks for suspend, postcopy a= nd checkpoint to their
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 python equivalents.
=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0 tools/python/xen/lowlevel/checkpoint/libcheckpoint.c:che= ckpoint_start calls xc_domain_save with
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0 all needed callback handlers.
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ---> functi= onally equivalent to (ii) in non-Remus case.
=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0 (v) xc_domain_save: (after the initial iterations)
=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0 copypages:
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0 send dirty pages & tailbuf data
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 postcopy_callback() [resumes dom= ain]
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 checkpoint_callback()=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 netbuffer_checkpoin= t() [python - communicates via netlink to sch_plug]
=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 diskbuffer_checkpoint() [python - comm= unicates via fifo to block-remus]
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 sleep(50ms) [or what= ever checkpoint interval]
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0 return
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 suspend_c= allback()
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 goto copypages
Hope that explains the control flow.

shriram
Ian.


--0016e68deb3d73f00004a5355f63-- --===============1325964340== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --===============1325964340==--