xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Shriram Rajagopalan <rshriram@cs.ubc.ca>
To: Ian Campbell <Ian.Campbell@eu.citrix.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: libxl - API call to return sxpr of a domain?
Date: Thu, 9 Jun 2011 19:59:37 -0400	[thread overview]
Message-ID: <BANLkTinvU-pctriAUmYH8g2_GZ67kmCVXw@mail.gmail.com> (raw)
In-Reply-To: <1307619285.775.823.camel@zakaz.uk.xensource.com>


[-- Attachment #1.1: Type: text/plain, Size: 7894 bytes --]

On Thu, Jun 9, 2011 at 7:34 AM, Ian Campbell <Ian.Campbell@eu.citrix.com>wrote:

> On Wed, 2011-06-08 at 16:55 +0100, Shriram Rajagopalan wrote:
>
> > On the receiving end, there is "no" Remus receiver process.
> > Well, there are some remus related patches, that have long been
> > integrated into xc_domain_restore, but apart from that, everything
> > else is as-is.
>
> OK.
>
> > The only remus specific part on rx side, is the blktap2 userspace
> > driver (block-remus), which again gets activated by usual Xend control
> > flow (as it tries to create a tap device). But I dont think this needs
> > special treatment as long as xl can parse/accept spec like
> > tap:remus:backupHost:port|aio:/dev/foo (or tap2:remus:.. ).
> > and launch the appropriate blktap2 backend driver (this system is
> > already in place, afaik).
>
> Hmm. Please see docs/misc/xl-disk-configuration.txt for the
> configuration syntax understood by xl. Also note that IanJ has a series
> outstanding which improves the syntax, including compat with xend
> syntaxes and makes it more extensible for the future. The series
> includes an updated version of the doc, you'd be better off reading the
> new version than what is currently in the tree. A pre-patched version is
> attached.
>
> It doesn't currently support "remus:" and the "foo:" prefixes are in
> general deprecated. It looks like "remus:" will fall into the category
> of things which are supported via the script= directive. We've also
> grandfathered some "foo:" prefixes as shorthand for the script syntax
> (this is also how xend implemented them), so I think this will continue
> to work (assuming calling a script is how this works in xend, if not
> such a script might be needed).
>
> The "foo|bar" syntax is completely new to me (and I suspect anyone else
> not familiar with remus). How does it work? Is the full
> "backupHost:port|aio:/dev/foo" considered the argument to Remus (in
> which case I think it can work via the Remus script as above) or does
> xend somehow parse this into "remus:backupHost:port" and "aio:/dev/foo"?
> In the latter case I've no idea what to suggest!
>
> I dont think the script= directive is going to work (or even
necessary). The entire "foo|bar" part is handled by the blktap2 code base.
IOW, if the disk spec is tap:remus:host:port|aio:/dev/abc, then xl invokes
the blktap2 code and passes remus:host:port|aio:/dev/abc , which gets
parsed and both remus and aio drivers are created (remus driver on top of
aio).

Have you considered making Remus a more top-level domain configuration
> option rather than disk specific? i.e. adding remus_backup = "..." to
> the cfg. This would allow libxl to do the right thing internally and
> setup the disks in the right way etc etc.
>
> Yes I have, several times. Wading through xend code was not so much fun :(.
With xl, as long as it can construct the "remus:host:port|aio:/dev/abc" arg
and
pass it to the blktap2 code, things should be fine.
 With a DRBD based backend, nothing of this sort is required. Xend
automatically
invokes the block-drbd script, which does the rest. If xl does the same,
then things
should be fine.

> Doing The Right thing is something we are striving towards with libxl,
> especially with disk config which is unnecessarily complex for users.
>
> e.g. it should not be necessary for a user to specifically ask for tap
> or phy etc, rather they should present the path to the thing and libxl
> should figure out if blkback or blktap is needed. For example if Remus
> were enabled then it should DTRT and always select blktap even if
> blkback is otherwise suitable.

> The bulk of Remus transmission data is in libxc and hence is agnostic
> > to both xend/xl. It basically prolongs the last iteration for
> > eternity. It supplies a callback handler for checkpoint, which adds
> > the "wait" time before the next suspend (e.g., suspend every 50ms). In
> > case of Xend, the checkpoint handler is not supplied and hence the
> > domain is suspended as soon as the previous iteration finishes.
>
> I think exposing such a callback is within the scope of the libxl API.
> For example libxl_domain_checkpoint(...., callback) and
> libxl_domain_suspend(...) can probably backend onto the same internal
> function.
>
> Another option to the callbacks might be to integrate with the libxl
> event handling mechanism. Note that IanJ wants to overhall this from its
> current state. I'm less sure whether this would make sense.
>
> >
> > (a) On the sending side, without Remus, Xend control flow is as
> > follows:
> [...]
>
> looks mostly the same as xl, except xl does all the xc_domain_save stuff
> in process rather than indirecting via an external binary.

Do you mean that xl does all the xend stuff ? Because xl still calls
xc_domain_save
in libxl_dom.c:libxl__domain_suspend_common


> Also xl has
> to take care of starting a receiver process on the other end and has a
> bit more of a protocol interlock surrounding the actual migration to try
> and ensure the other end really is ready and hasn't failed etc.
>
>
> The callback structure has two other handlers (postcopy aka
> > postresume, checkpoint) that
> > is used by Remus.
> > *************************
> > (b) On sending side, with Remus
> >       remus <domain> <host>
>
> I suppose here there is a choice between adding libxl/xl support to this
> remus binary or implementing "xl remus <domain> <host>".
>
> The latter is what I wanted to do.

> >          (i) tools/remus/remus:
> >             - calls tools/python/xen/remus/vm.py:VM(domid)
> >             - vm.py:VM issues xmlrpc call to Xend to obtain domid's
> > sxpr and extract out the disk/vif info.
>
> Could be done via the libxl python bindings in the xl case?
>
> yep

> >          (ii) create the "buffers" for disk & vif.
>
> Stays the same, I guess, if you stick with the remus tool.
>
> >          (iii) Connect with remote host's Xend socket and send the sxp
> > info. [same as (i) for non Remus case]
>
> Hrm, this would involve duplicating a bunch of xl functionality to start
> the receiver, and run the xl protocol etc.
>
> That rather suggests that at least this bit should be in xl itself
> rather than remus. This needn't necessarily involve putting everything
> in xl, but just forking xl for this bit.
>
> >           (iv) tools/python/xen/remus/save.py:Saver uses libcheckpoint
> > to initiate checkpointing.
> >                 tools/python/xen/lowlevel/checkpoint: has
> > suspend/resume handlers similar to xc_save.c
> >                 trampoline functions to bounce the callbacks for
> > suspend, postcopy and checkpoint to their
> >                 python equivalents.
>
> I think either handling these inside libxl or bouncing them to the
> caller (depending on the nature of the callback) would be reasonable.
>
> >
> >
> > tools/python/xen/lowlevel/checkpoint/libcheckpoint.c:checkpoint_start
> > calls xc_domain_save with
> >                  all needed callback handlers.
> >                      ---> functionally equivalent to (ii) in non-Remus
> > case.
> >            (v) xc_domain_save: (after the initial iterations)
> >               copypages:
> >                send dirty pages & tailbuf data
> >                postcopy_callback() [resumes domain]
> >                checkpoint_callback()
> >                    netbuffer_checkpoint() [python - communicates via
> > netlink to sch_plug]
> >                    diskbuffer_checkpoint() [python - communicates via
> > fifo to block-remus]
> >                    sleep(50ms) [or whatever checkpoint interval]
> >                    return
> >                 suspend_callback()
> >                 goto copypages
> >
> > Hope that explains the control flow.
>
> I think so. Thanks.
>
> Hopefully some of the suggestions even make sense and demonstrate my new
> understanding ;-)
>
> >
> > shriram
> >
> >         Ian.
> >
> >
>
> shriram

[-- Attachment #1.2: Type: text/html, Size: 10601 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

  reply	other threads:[~2011-06-09 23:59 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-07  3:30 libxl - API call to return sxpr of a domain? Shriram Rajagopalan
2011-06-07  9:02 ` Ian Campbell
2011-06-07 15:30   ` Shriram Rajagopalan
2011-06-07 16:16     ` Ian Campbell
2011-06-08 15:55       ` Shriram Rajagopalan
2011-06-09 11:34         ` Ian Campbell
2011-06-09 23:59           ` Shriram Rajagopalan [this message]
2011-06-10  6:27             ` Ian Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BANLkTinvU-pctriAUmYH8g2_GZ67kmCVXw@mail.gmail.com \
    --to=rshriram@cs.ubc.ca \
    --cc=Ian.Campbell@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).