Re: [PATCH 2 of 4 V5] tools/libxl: Remus network buffering - hotplug scripts and setup code

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>
To: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	Roger Pau Monne <roger.pau@citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	Ian Campbell <ian.campbell@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH 2 of 4 V5] tools/libxl: Remus network buffering - hotplug scripts and setup code
Date: Thu, 19 Dec 2013 13:37:55 -0600	[thread overview]
Message-ID: <CAP8mzPMCKzyRGrP3YGbbaAedyTeFvT3_ODGdUWPViBS2xrf4pA@mail.gmail.com> (raw)
In-Reply-To: <21139.28037.675641.135725@mariner.uk.xensource.com>


[-- Attachment #1.1: Type: text/plain, Size: 16079 bytes --]

On Mon, Nov 25, 2013 at 9:32 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:

> Shriram Rajagopalan writes ("[PATCH 2 of 4 V5] tools/libxl: Remus network
> buffering - hotplug scripts and setup code"):
> > tools/libxl: Remus network buffering - hotplug scripts and setup code
>
> Thanks.  This is going in the right direction, but I think there's
> still some way to go.
>
>
> One thing that makes reviewing it particularly difficult is the
> physical ordering of the new code in your patch.
>
> The callback programming model makes it easy to write spaghetti logic;
> to counter this we try to have a firm structure to each set of event
> handling code:
>
>  * Firstly, function forward declarations, types, structs, etc.
>
>  * Then helper functions which have a simple "call this and it
>    does something and returns" control flow.
>
>  * Then the main body of the event callbacks, in the order in
>    which they will normally be called.  Obviously there is some
>    judgement needed because the call flow won't always be entirely
>    linear.  But keeping the layout as linear as possible means that
>    one is never hunting for the next function.
>
>  * Call flow of a single operation does not bounce back and forth
>    between files.  If this is necessary a formal sub-operation is
>    defined (see how it's done with the bootloader, for example).
>
> I think you would be able to structure the code this way.  Would you
> do that please ?
>
> I think you may need to move several of the entrypoints into a new
> "remus" file.  The "netbuf" file may need to become a more formally
> separated part of the suspend/resume function.
>
> We also try to break out sub-operations, giving them their own types
> and callbacks.
>
>
> > diff -r 4b471809a5bb -r 5b4e029bb79b tools/libxl/Makefile
> > --- a/tools/libxl/Makefile    Mon Nov 18 10:17:35 2013 -0800
> > +++ b/tools/libxl/Makefile    Mon Nov 18 11:09:34 2013 -0800
> > @@ -42,6 +42,13 @@ LIBXL_OBJS-y += libxl_blktap2.o
> > -static void remus_failover_cb(libxl__egc *egc,
> > -                              libxl__domain_suspend_state *dss, int rc);
> > +static void remus_replication_failure_cb(libxl__egc *egc,
> > +                                         libxl__domain_suspend_state
> *dss,
> > +                                         int rc);
> ...
> > -    dss->remus = info;
>
> This change from dss->remus to dss->remus_ctx needs a mention in the
> commit message.  To be honest, I'm not sure why you don't just change
> the type of the remus member.
>
> > diff -r 4b471809a5bb -r 5b4e029bb79b tools/libxl/libxl_internal.h
> > --- a/tools/libxl/libxl_internal.h    Mon Nov 18 10:17:35 2013 -0800
> > +++ b/tools/libxl/libxl_internal.h    Mon Nov 18 11:09:34 2013 -0800
> > @@ -2291,6 +2291,50 @@ typedef struct libxl__logdirty_switch {
> >      libxl__ev_time timeout;
> >  } libxl__logdirty_switch;
> >
> > +typedef struct libxl__remus_ctx {
> > +    /* checkpoint interval */
> > +    int interval;
> > +    int blackhole;
> > +    int compression;
> > +    int saved_rc;
> > +    /* Script to setup/teardown network buffers */
> > +    const char *netbufscript;
> > +    /* Opaque context containing network buffer related stuff */
> > +    void *netbuf_ctx;
> > +    libxl__domain_suspend_state *dss;
> > +    libxl__ev_time timeout;
> > +    libxl__ev_child child;
> > +    int num_exec;
> > +} libxl__remus_ctx;
>
> I think you should probably call this a "libxl__remus_state", to
> correspond to all of the other "libxl__foo_state" structs.
>
> You also need to specify in comments who fills in what parts of it, as
> is done in the other libxl__foo_state's.
>
>
> > diff -r 4b471809a5bb -r 5b4e029bb79b tools/libxl/libxl_netbuffer.c
> > --- /dev/null Thu Jan 01 00:00:00 1970 +0000
> > +++ b/tools/libxl/libxl_netbuffer.c   Mon Nov 18 11:09:34 2013 -0800
> > @@ -0,0 +1,516 @@
> ...
> > +typedef struct libxl__remus_netbuf_ctx {
> > +    struct rtnl_qdisc **netbuf_qdisc_list;
> > +    struct nl_sock *nlsock;
> > +    struct nl_cache *qdisc_cache;
> > +    char **vif_list;
> > +    char **ifb_list;
> > +    uint32_t num_netbufs;
> > +    uint32_t unused;
> > +} libxl__remus_netbuf_ctx;
>
> Presumably this wants to be a libxl__remus_netbuf_state, then.
>
> The callbacks from the remus and netbuf code to the suspend code
> should not be hardwired.  Instead, function pointers should be passed,
> the way the domain creation stuff does with a bootloader, and the way
> the bootloader does with openpty.
>
> This may seem like extra complication if there is only one caller and
> it will always pass the same function, but it makes the logical
> separation much clearer.  It's analogous to the separation between an
> ordinary function and its caller, even if there is in fact only one
> call site.
>
> That is, functions like this
>
> +_hidden void libxl__remus_setup_done(libxl__egc *egc,
> +                                     libxl__domain_suspend_state *dss,
> +                                     int rc);
>
> should be private functions in the remus file which make the
> appropriate callback.
>
>
> > +/* If the device has a vifname, then use that instead of
> > + * the vifX.Y format.
> > + */
> > +static char *get_vifname(libxl__gc *gc, uint32_t domid,
> > +                               libxl_device_nic *nic)
> > +{
> > +    const char *vifname = NULL;
> > +    vifname = libxl__xs_read(gc, XBT_NULL,
> > +                             libxl__sprintf(gc,
> > +
>  "%s/backend/vif/%d/%d/vifname",
> > +                                            libxl__xs_get_dompath(gc,
> 0),
> > +                                            domid, nic->devid));
>
> Why not libxl__xs_read_checked ?
>
> > +    if (!vifname) {
>
> Surely this error handling is wrong.  NULL here might mean "failed" or
> ENOENT.  If it means failed then get_vifname needs to return the error
> to its caller, by returning NULL.
>
> > +        vifname = libxl__device_nic_devname(gc, domid,
> > +                                            nic->devid,
> > +                                            nic->nictype);
> > +    }
> > +
> > +    return libxl__strdup(gc, vifname);
>
> Why this strdup ?
>
> > +static char **get_guest_vif_list(libxl__gc *gc, uint32_t domid,
> > +                                 int *num_vifs)
> > +{
> > +    libxl_device_nic *nics = NULL;
> > +    int nb, i = 0;
> > +    char **vif_list = NULL;
> > +
> > +    *num_vifs = 0;
> > +    nics = libxl_device_nic_list(CTX, domid, &nb);
> > +    if (!nics) return NULL;
> > +
> > +    /* Ensure that none of the vifs are backed by driver domains */
> > +    for (i = 0; i < nb; i++) {
> > +        if (nics[i].backend_domid != LIBXL_TOOLSTACK_DOMID) {
> > +            LOG(ERROR, "vif %s has driver domain (%u) as its backend.\n"
> > +                "Network buffering is not supported with driver
> domains",
>
> The LOG macros should not normally be passed messages with newlines
> in.
>
> > +                get_vifname(gc, domid, &nics[i]),
> nics[i].backend_domid);
> > +            *num_vifs = -1;
>
> This calling convention is strange.  I think get_guest_vif_list should
> return a libxl rc value.
>
> > +    vif_list = libxl__calloc(gc, nb, sizeof(char *));
> > +    for (i = 0; i < nb; ++i)
> > +        vif_list[i] = get_vifname(gc, domid, &nics[i]);
>
> What if get_vifname fails ?
>
>

The idea here is that if the interface has a name, then we use it.
Otherwise,
we simply revert to the name that Xen assigns it - which is in vifX.Y
format.
But I agree with you that the vifname reading process from xenstore may not
be successful always.


> ...
>
> > +
> > +static int init_qdiscs(libxl__gc *gc,
> > +                       libxl__remus_ctx *remus_ctx)
> > +{
> > +    int i, ret, ifindex;
> > +    struct rtnl_link *ifb = NULL;
> > +    struct rtnl_qdisc *qdisc = NULL;
> ...
> > +    /* convenience vars */
>
> We use the term "Convenience aliases".  It's best to use the exact
> same terminology as elsewhere.
>
> > +    libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx;
> > +    int num_netbufs = netbuf_ctx->num_netbufs;
> > +    char **ifb_list = netbuf_ctx->ifb_list;
>
> These should all be declared const.
>
> > +    /* list of handles to plug qdiscs */
> > +    netbuf_ctx->netbuf_qdisc_list =
> > +        libxl__calloc(gc, num_netbufs,
> > +                      sizeof(struct rtnl_qdisc *));
>
> Please use GCNEW_ARRAY.
>
> > +        if (qdisc) {
> > +            const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
> > +            /* Sanity check: Ensure that the root qdisc is a plug
> qdisc. */
> > +            if (!tc_kind || strcmp(tc_kind, "plug")) {
> > +                LOG(ERROR, "plug qdisc is not installed on %s",
> ifb_list[i]);
>
> You seem to assume that all the failures from rtnl_tc_get_kind are the
> due to the expected cause.  You need to check errno, I think.
>

Not according to the documentation. Its return value is either the name of
the
qdisc or NULL if its not set. There is no mention of errno.
I can only go by whats documented :).


> > +    rtnl_link_put(ifb);
> > +    return 0;
> > +
> > + end:
>
> We call these "out", not "end".  This should be changed everywhere.
>
> > +    if (ifb) rtnl_link_put(ifb);
> > +    return ERROR_FAIL;
>
> Can't this leak netbuf_ctx->nlsock and netbuf_ctx->qdisc_cache ?
>
>
You asked this question before (in a previous version). There is a cleanup
code that gets called that releases these resources (the teardown function).

>
> > +}
> > +
> > +/* the script needs the following env & args
> > + * $vifname
> > + * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/)
> > + * $IFB (for teardown)
> > + * setup/teardown as command line arg.
> > + * In return, the script writes the name of IFB device (during setup)
> to be
> > + * used for output buffering into XENBUS_PATH/ifb
> > + */
>
> This information should be somewhere else, not buried in the libxl
> source code.  For now, could it be in the one script which you
> actually supply ?
>
>
It is also documented in the script.


>
> > +static int exec_netbuf_script(libxl__gc *gc, libxl__remus_ctx
> *remus_ctx,
> > +                              char *op, libxl__ev_child_callback *death)
> > +{
> > +    int arraysize, nr = 0;
> > +    char **env = NULL, **args = NULL;
> > +    pid_t pid;
>
> This code has many important similarities to the device_hotplug
> functions in libxl_device.c.
>
> For example your timeout callback is pretty much identical to
> device_hotplug_timeout_cb.
>
> I think you should consider whether you can use libxl__ao_device, and
> perhaps share other parts of that code.
>
>
I looked at them before deciding to roll out my own.  Please correct me if
I am
wrong about this:
The ao_device interface while sharing some similarities, does not really
fit in here.
It attempts bring up/down a device (that has been created already, for
e.g., a vif),
add it to the guest and represent it internally using libxl__device or
something.

The netbuf code invokes the script, which in turn sets up an IFB device
that is not
part of the guest's configuration.  The information about the device
returned by the
script is collected in an internal data structure and new handles (plug
qdisc handle)
are created, and added to remus' state.

The control flows are different.  The only commonality is invoke external
script, wait
until it finishes and invoke callback.

On alternative is to hack into the device_hotplug code, and add special
cases. For e.g.,
if device_ptr is null, then this is a remus op, so do Y instead of X.

This kind of special casing would percolate throughout the device hotplug
code,
polluting a what was otherwise cleanly defined interface to handle generic
device hotplug.


If you have any other alternatives in mind, I would be happy to check it
out.


>
> > +    libxl__ev_child_init(child);
>
> This should be done where the remus state is first populated, not
> here.
>
>
> > +    char *script = libxl__strdup(gc, remus_ctx->netbufscript);
>
> These next lot are all convenience aliases AFIACT.  This should be
> mentioned, and they should all be const:
>
> > +    libxl__ev_child *child = &remus_ctx->child;
> > +    libxl__ev_time *timeout = &remus_ctx->timeout;
> > +    uint32_t domid = remus_ctx->dss->domid;
> > +    int devid = remus_ctx->num_exec;
> > +    libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx;
> > +    char *vif = netbuf_ctx->vif_list[devid];
> > +    char *ifb = netbuf_ctx->ifb_list[devid];
>
> So instead, something like:
>
>   +    libxl__ev_child *const child = &remus_ctx->child;
> ...
>   +    const uint32_t domid = remus_ctx->dss->domid;
> ...
>   +    char *const ifb = netbuf_ctx->ifb_list[devid];
>
> There are other places in the code with a similar pattern which also
> need to be changed.
>
>
> > +    if (!strcmp(op, "setup"))
> > +        arraysize = 5;
> > +    else
> > +        arraysize = 7;
>
> Perhaps we could do away with this if() and just always allocate 7 ?
>
> > +    GCNEW_ARRAY(env, arraysize);
> > +    env[nr++] = "vifname";
> > +    env[nr++] = vif;
> > +    env[nr++] = "XENBUS_PATH";
> > +    env[nr++] = GCSPRINTF("%s/remus/netbuf/%d",
> > +                          libxl__xs_libxl_path(gc, domid), devid);
> > +    if (!strcmp(op, "teardown")) {
>
> I think you should pass op as an enum, and convert it to a string
> right at the end.  That would do away with these strcmps.
>
> > +    remus_ctx->num_exec++;
>
> num_exec is a misleading variable name here.  In libxl_linux.c and
> libxl_device.c it refers to the number of times we have run the
> hotplug script _for each device_, not the iteration over the number of
> devices.
>
> > +static void netbuf_setup_script_cb(libxl__egc *egc,
> > +                                   libxl__ev_child *child,
> > +                                   pid_t pid, int status)
> > +{
> ...
> > +    hotplug_error = libxl__xs_read(gc, XBT_NULL,
> > +                                   GCSPRINTF("%s/hotplug-error",
> > +                                             out_path_base));
>
> Again, libxl__xs_read_checked (several times) ?
>
> > +static void netbuf_setup_timeout_cb(libxl__egc *egc,
> > +                                    libxl__ev_time *ev,
> > +                                    const struct timeval *requested_abs)
> > +{
>
> This seems to be yet another copy of device_hotplug_timeout_cb.
>
>
Yep. See my answer above.

 > +void libxl__remus_netbuf_setup(libxl__egc *egc,
> > +                               libxl__domain_suspend_state *dss)
> > +{
> > +    libxl__remus_netbuf_ctx *netbuf_ctx = NULL;
> > +    uint32_t domid = dss->domid;
> > +    libxl__remus_ctx *remus_ctx = dss->remus_ctx;
> > +    int num_netbufs = 0;
> > +    int rc = ERROR_FAIL;
> ...
> > +    netbuf_ctx = libxl__zalloc(gc, sizeof(libxl__remus_netbuf_ctx));
>
> Please use GCNEW, not libxl__zalloc, when you can.
>
> > +/* Note: This function will be called in the same gc context as
> > + * libxl__remus_netbuf_setup, created during the
> libxl_domain_remus_start
> > + * API call.
> > + */
> > +void libxl__remus_netbuf_teardown(libxl__egc *egc,
> > +                                  libxl__domain_suspend_state *dss)
> > +{
> ...
> > +    remus_ctx->num_exec = 0; //start at devid == 0
> > +    if (exec_netbuf_script(gc, remus_ctx, "teardown",
> > +                           netbuf_teardown_script_cb))
> > +        libxl__remus_teardown_done(egc, dss);
>
> I think the return value from exec_netbuf_script should be put into an
> rc variable.  And we don't correctly propagate the rc here AFAICT.
>
> > +#define TC_BUFFER_START 1
> > +#define TC_BUFFER_RELEASE 2
>
> Are these #defines not supposed to come from some Linux header file ?
> I don't think having them as constants here can be right.
>
>
They are not constants taken from any library and stuck in here.  They are
merely
for readability.  The related code is remus_netbuffer_op, where you would
find
the buffer/release code blocks controlled through the value of op variable
(which is either BUFFER_START or BUFFER_RELEASE)

 thanks
shriram

[-- Attachment #1.2: Type: text/html, Size: 21190 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2013-12-19 19:37 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-18 20:03 [PATCH 0 of 4 V5] Remus/Libxl: Network buffering support Shriram Rajagopalan
2013-11-18 20:03 ` [PATCH 1 of 4 V5] remus: add libnl3 dependency to autoconf scripts Shriram Rajagopalan
2013-11-18 20:03 ` [PATCH 2 of 4 V5] tools/libxl: Remus network buffering - hotplug scripts and setup code Shriram Rajagopalan
2013-11-25 15:32   ` Ian Jackson
2013-12-19 19:37     ` Shriram Rajagopalan [this message]
2013-11-18 20:03 ` [PATCH 3 of 4 V5] tools/libxl: Control network buffering in remus callbacks Shriram Rajagopalan
2013-11-25 15:38   ` Ian Jackson
2013-11-18 20:03 ` [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch Shriram Rajagopalan
2013-11-25 15:37   ` Ian Jackson
2013-12-12  5:45     ` Shriram Rajagopalan
2013-12-12 11:07       ` Ian Campbell
2013-12-12 15:13         ` Ian Jackson
2013-12-12 15:36           ` Ian Campbell
2013-12-12 15:12       ` Ian Jackson
2013-12-12 18:37         ` Shriram Rajagopalan
2013-12-12 18:43           ` Ian Jackson
2013-12-12 18:47           ` Brendan Cully
2013-12-12 18:52             ` Shriram Rajagopalan
2013-12-12 18:55               ` Brendan Cully
2013-11-19 16:23 ` [PATCH 0 of 4 V5] Remus/Libxl: Network buffering support Ian Campbell
2013-11-19 17:58   ` Shriram Rajagopalan
2013-11-24  6:51     ` Shriram Rajagopalan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAP8mzPMCKzyRGrP3YGbbaAedyTeFvT3_ODGdUWPViBS2xrf4pA@mail.gmail.com \
    --to=rshriram@cs.ubc.ca \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=roger.pau@citrix.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).