From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shriram Rajagopalan Subject: Re: [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks Date: Fri, 1 Nov 2013 13:04:16 -0700 Message-ID: References: <21107.62159.140005.466786@mariner.uk.xensource.com> Reply-To: rshriram@cs.ubc.ca Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8602480014145770308==" Return-path: In-Reply-To: <21107.62159.140005.466786@mariner.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Jackson Cc: Andrew Cooper , Stefano Stabellini , Ian Campbell , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org --===============8602480014145770308== Content-Type: multipart/alternative; boundary=20cf301cc2daa5576f04ea23140b --20cf301cc2daa5576f04ea23140b Content-Type: text/plain; charset=ISO-8859-1 On Fri, Nov 1, 2013 at 11:28 AM, Ian Jackson wrote: > Shriram Rajagopalan writes ("[PATCH 4 of 5 V3] tools/libxl: Control > network buffering in remus callbacks"): > > tools/libxl: Control network buffering in remus callbacks > ... > > static int libxl__remus_domain_suspend_callback(void *data) > > { > > - /* REMUS TODO: Issue disk and network checkpoint reqs. */ > > - return libxl__domain_suspend_common_callback(data); > > + /* REMUS TODO: Issue disk checkpoint reqs. */ > > + libxl__save_helper_state *shs = data; > > + libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs); > > + libxl__remus_ctx *remus_ctx = dss->remus_ctx; > > + bool is_suspended; > > + STATE_AO_GC(dss->ao); > > + > > + is_suspended = !!libxl__domain_suspend_common_callback(data); > > I think this function would be less confusing if the return value > variable was called "ok" or something similar. It's really just an > error flag. > > Also AFAICT the !! has no function here. The return value is > essentially a boolean. > > > + if (!remus_ctx->netbuf_ctx) return is_suspended; > > + > > + if (is_suspended) { > > + if (libxl__remus_netbuf_start_new_epoch(gc, dss->domid, > > + remus_ctx)) > > + return !is_suspended; > > This construction is rather odd. Why return !is_suspended when you > know that !!is_suspended ? > > If a new network buffer cannot be created (for the next epoch), then return an error. This ends up terminating the checkpointing process. > > @@ -1300,11 +1316,42 @@ static void libxl__remus_domain_checkpoi > > static void remus_checkpoint_dm_saved(libxl__egc *egc, > > libxl__domain_suspend_state *dss, > int rc) > > { > > - /* REMUS TODO: Wait for disk and memory ack, release network buffer > */ > > - /* REMUS TODO: make this asynchronous */ > > - assert(!rc); /* REMUS TODO handle this error properly */ > > - usleep(dss->remus_ctx->interval * 1000); > > - libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1); > > + /* > > + * REMUS TODO: Wait for disk and explicit memory ack (through > restore > > + * callback from remote) before releasing network buffer. > > + */ > > + libxl__remus_ctx *remus_ctx = dss->remus_ctx; > > + struct timespec epoch; > > + int do_next_iter = 0; > > Again, this variable is probably best called "ok". > > > + epoch.tv_sec = remus_ctx->interval / 1000; /* interval is in ms */ > > + epoch.tv_nsec = remus_ctx->interval * 1000L * 1000L; > > + nanosleep(&epoch, 0); > > This is no good, I'm afraid. You should be using a libxl event loop > timer. (Also why are you changing your existing usleep to this new > nanosleep which seems functionally very similar?) > > Ian. > > --20cf301cc2daa5576f04ea23140b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On Fri, Nov 1, 2013 at 11:28 AM, Ian Jackson <Ian= .Jackson@eu.citrix.com> wrote:
=
Shriram Rajagopalan writes ("[PATCH 4 o= f 5 V3] tools/libxl: Control network buffering in remus callbacks"):
> tools/libxl: Control network buffering in remus call= backs
...
> =A0static int libxl__remus_domain_suspend_callback(v= oid *data)
> =A0{
> - =A0 =A0/* REMUS TODO: Issue disk and network checkpoint reqs. */
> - =A0 =A0return libxl__domain_suspend_common_callback(data);
> + =A0 =A0/* REMUS TODO: Issue disk checkpoint reqs. */
> + =A0 =A0libxl__save_helper_state *shs =3D data;
> + =A0 =A0libxl__domain_suspend_state *dss =3D CONTAINER_OF(shs, *dss, = shs);
> + =A0 =A0libxl__remus_ctx *remus_ctx =3D dss->remus_ctx;
> + =A0 =A0bool is_suspended;
> + =A0 =A0STATE_AO_GC(dss->ao);
> +
> + =A0 =A0is_suspended =3D !!libxl__domain_suspend_common_callback(data= );

I think this function would be less confusing if the return value
variable was called "ok" or something similar. =A0It's really= just an
error flag.

Also AFAICT the !! has no function here. =A0The return value is
essentially a boolean.

> + =A0 =A0if (!remus_ctx->netbuf_ctx) return is_suspended;
> +
> + =A0 =A0if (is_suspended) {
> + =A0 =A0 =A0 =A0if (libxl__remus_netbuf_start_new_epoch(gc, dss->d= omid,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0remus_ctx))
> + =A0 =A0 =A0 =A0 =A0 =A0return !is_suspended;

This construction is rather odd. =A0Why return !is_suspended when you=
know that !!is_suspended ?


If a new netwo= rk buffer cannot be created (for the next epoch), then
return an = error. This ends up terminating the checkpointing process.

=A0
> @@ -1300,11 +1316,42 @@ static void libxl__remus_domain_checkpoi
> =A0static void remus_checkpoint_dm_saved(libxl__egc *egc,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0libxl__domain_suspend_state *dss, int rc)
> =A0{
> - =A0 =A0/* REMUS TODO: Wait for disk and memory ack, release network = buffer */
> - =A0 =A0/* REMUS TODO: make this asynchronous */
> - =A0 =A0assert(!rc); /* REMUS TODO handle this error properly */
> - =A0 =A0usleep(dss->remus_ctx->interval * 1000);
> - =A0 =A0libxl__xc_domain_saverestore_async_callback_done(egc, &ds= s->shs, 1);
> + =A0 =A0/*
> + =A0 =A0 * REMUS TODO: Wait for disk and explicit memory ack (through= restore
> + =A0 =A0 * callback from remote) before releasing network buffer.
> + =A0 =A0 */
> + =A0 =A0libxl__remus_ctx *remus_ctx =3D dss->remus_ctx;
> + =A0 =A0struct timespec epoch;
> + =A0 =A0int do_next_iter =3D 0;

Again, this variable is probably best called "ok".

> + =A0 =A0epoch.tv_sec =3D remus_ctx->interval / 1000; /* interval i= s in ms */
> + =A0 =A0epoch.tv_nsec =3D remus_ctx->interval * 1000L * 1000L;
> + =A0 =A0nanosleep(&epoch, 0);

This is no good, I'm afraid. =A0You should be using a libxl event= loop
timer. =A0(Also why are you changing your existing usleep to this new
nanosleep which seems functionally very similar?)

Ian.


--20cf301cc2daa5576f04ea23140b-- --===============8602480014145770308== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============8602480014145770308==--