All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v15 0/7] Remus/Libxl: Remus network buffering and drbd disk
@ 2014-07-07  8:26 Yang Hongyang
  2014-07-07  8:26 ` [PATCH v15 1/7] remus: add libnl3 dependency for network buffering support Yang Hongyang
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Yang Hongyang @ 2014-07-07  8:26 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
	eddie.dong, rshriram, laijs

This patch series adds support for network buffering and drbd disk
in the Remus codebase in libxl.

the code is also hosted on github:

url: https://github.com/laijs/xen
branch: remus-v15

Changes in v15:
  The first patch in v14 has been taken, so remove it from the patchset.
  Add a patch to Update maintained files of REMUS.
  Rebased.

Changes in v14:
  Addressed IanJ's comments.
  Rebased.

Changes in v13:
  Addressed Konrad's comments.
  Rebased.

Changes in v12:
  Add disk buffering cmdline switch.

Changes in v11:
  Addressed comments from Ian J and Shriram.
  Add drbd disk implement into this patch series.

Changes in V10:
  Restructured the whole patch series.
  Introduce the remus device abstract layer.
  Make remus checkpoint asynchronous.

Changes in V9:
  Use async exec script api to exec scripts.

Changes in V8:
  Applied some comments(by IanJ).
  Merge some struct definitions to it's implementation.
  (2/3/5 in V7 => 3 in V8)

Changes in V7:
  Applied missing comments(by IanJ).
  Applied Shriram comments.

  merge netbufering tangled setup/teardown code into one patch.
  (2/6/8 in V6 => 5 in V7. 9/10 in V6 => 7 in V7)

Changes in V6:
  Applied Ian Jackson's comments of V5 series.
  the [PATCH 2/4 V5] is split by small functionalities.

  [PATCH 4/4 V5] --> [PATCH 13/13] netbuffer is default enabled.

Changes in V5:

Merge hotplug script patch (2/5) and hotplug script setup/teardown
patch (3/5) into a single patch.

Changes in V4:

[1/5] Remove check for libnl command line utils in autoconf checks

[2/5] minor nits

[3/5] define LIBXL_HAVE_REMUS_NETBUF in libxl.h

[4/5] clean ups. Make the usleep in checkpoint callback asynchronous

[5/5] minor nits

Changes in V3:
[1/5] Fix redundant checks in configure scripts
      (based on Ian Campbell's suggestions)

[2/5] Introduce locking in the script, during IFB setup.
      Add xenstore paths used by netbuf scripts
      to xenstore-paths.markdown

[3/5] Hotplug scripts setup/teardown invocations are now asynchronous
      following IanJ's feedback.  However, the invocations are still
      sequential. 

[5/5] Allow per-domain specification of netbuffer scripts in xl remus
      commmand.

And minor nits throughout the series based on feedback from
the last version

Changes in V2:
[1/5] Configure script will automatically enable/disable network
      buffer support depending on the availability of the appropriate
      libnl3 version. [If libnl3 is unavailable, a warning message will be
      printed to let the user know that the feature has been disabled.]

      use macros from pkg.m4 instead of pkg-config commands
      removed redundant checks for libnl3 libraries.

[3,4/5] - Minor nits.

Version 1:

[1/5] Changes to autoconf scripts to check for libnl3. Add linker flags
      to libxl Makefile.

[2/5] External script to setup/teardown network buffering using libnl3's
      CLI. This script will be invoked by libxl before starting Remus.
      The script's main job is to bring up an IFB device with plug qdisc
      attached to it.  It then re-routes egress traffic from the guest's
      vif to the IFB device.

[3/5] Libxl code to invoke the external setup script, followed by netlink
      related setup to obtain a handle on the output buffers attached
      to each vif.

[4/5] Libxl interaction with network buffer module in the kernel via
      libnl3 API.

[5/5] xl cmdline switch to explicitly enable network buffering when
      starting remus.


  Few things to note(by shriram): 

    a) Based on previous email discussions, the setup/teardown task has
    been moved to a hotplug style shell script which can be customized as
    desired, instead of implementing it as C code inside libxl.

    b) Libnl3 is not available on NetBSD. Nor is it available on CentOS
   (Linux).  So I have made network buffering support an optional feature
   so that it can be disabled if desired.

   c) NetBSD does not have libnl3. So I have put the setup script under
   tools/hotplug/Linux folder.

thanks

Legend:
  A - acked
  D - previous acked, but new change introduced so acked-by dropped
  M - Modified
  S - the same version as last round
  No marker - new patch

Yang Hongyang (7):
  A remus: add libnl3 dependency for network buffering support
  S remus: introduce remus device
  S remus netbuffer: implement remus network buffering for nic devices
  S remus drbd: Implement remus drbd replicated disk
  S libxl: network buffering cmdline switch
  S libxl: disk buffering cmdline switch
    MAINTAINERS: Update maintained files of REMUS

 MAINTAINERS                            |   5 +
 README                                 |   4 +
 config/Tools.mk.in                     |   4 +
 docs/README.remus                      |   6 +
 docs/man/xl.conf.pod.5                 |   6 +
 docs/man/xl.pod.1                      |  15 +-
 docs/misc/xenstore-paths.markdown      |   4 +
 tools/configure.ac                     |  16 ++
 tools/hotplug/Linux/Makefile           |   2 +
 tools/hotplug/Linux/block-drbd-probe   |  85 ++++++
 tools/hotplug/Linux/remus-netbuf-setup | 203 +++++++++++++
 tools/libxl/Makefile                   |  15 +
 tools/libxl/libxl.c                    |  58 +++-
 tools/libxl/libxl.h                    |   6 +
 tools/libxl/libxl_dom.c                | 132 ++++++++-
 tools/libxl/libxl_internal.h           | 199 +++++++++++++
 tools/libxl/libxl_netbuffer.c          | 506 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c        |  58 ++++
 tools/libxl/libxl_remus_device.c       | 411 ++++++++++++++++++++++++++
 tools/libxl/libxl_remus_disk_drbd.c    | 239 ++++++++++++++++
 tools/libxl/libxl_types.idl            |   4 +
 tools/libxl/xl.c                       |   4 +
 tools/libxl/xl.h                       |   1 +
 tools/libxl/xl_cmdimpl.c               |  32 ++-
 tools/libxl/xl_cmdtable.c              |   4 +
 25 files changed, 1993 insertions(+), 26 deletions(-)
 create mode 100755 tools/hotplug/Linux/block-drbd-probe
 create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
 create mode 100644 tools/libxl/libxl_netbuffer.c
 create mode 100644 tools/libxl/libxl_nonetbuffer.c
 create mode 100644 tools/libxl/libxl_remus_device.c
 create mode 100644 tools/libxl/libxl_remus_disk_drbd.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v15 1/7] remus: add libnl3 dependency for network buffering support
  2014-07-07  8:26 [PATCH v15 0/7] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
@ 2014-07-07  8:26 ` Yang Hongyang
  2014-07-07  8:26 ` [PATCH v15 2/7] remus: introduce remus device Yang Hongyang
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Yang Hongyang @ 2014-07-07  8:26 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
	eddie.dong, rshriram, laijs

Libnl3 is required for controlling Remus network buffering.
This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
Also provide ability to configure tools without libnl3 support, that
is without network buffering support.

when there's no network buffering support,libxl__netbuffer_enabled()
returns 0, otherwise returns 1. The callers of this api will be
introduced in the rest of the series.

NOTE: This patch changes tools/configure.ac, please rerun
      autogen.sh while apply the patch.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
 README                          |  4 ++++
 config/Tools.mk.in              |  4 ++++
 docs/README.remus               |  6 ++++++
 tools/configure.ac              | 16 ++++++++++++++++
 tools/libxl/Makefile            | 13 +++++++++++++
 tools/libxl/libxl_internal.h    |  1 +
 tools/libxl/libxl_netbuffer.c   | 31 +++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c | 31 +++++++++++++++++++++++++++++++
 8 files changed, 106 insertions(+)
 create mode 100644 tools/libxl/libxl_netbuffer.c
 create mode 100644 tools/libxl/libxl_nonetbuffer.c

diff --git a/README b/README
index 9bbe734..e770932 100644
--- a/README
+++ b/README
@@ -72,6 +72,10 @@ disabled at compile time:
     * cmake (if building vtpm stub domains)
     * markdown
     * figlet (for generating the traditional Xen start of day banner)
+    * Development install of libnl3 (e.g., libnl-3-200,
+      libnl-3-dev, etc).  Required if network buffering is desired
+      when using Remus with libxl.  See tools/remus/README for detailed
+      information.
 
 Second, you need to acquire a suitable kernel for use in domain 0. If
 possible you should use a kernel provided by your OS distributor. If
diff --git a/config/Tools.mk.in b/config/Tools.mk.in
index 852c941..003ba95 100644
--- a/config/Tools.mk.in
+++ b/config/Tools.mk.in
@@ -38,6 +38,9 @@ PTHREAD_LIBS        := @PTHREAD_LIBS@
 
 PTYFUNCS_LIBS       := @PTYFUNCS_LIBS@
 
+LIBNL3_LIBS         := @LIBNL3_LIBS@
+LIBNL3_CFLAGS       := @LIBNL3_CFLAGS@
+
 # Download GIT repositories via HTTP or GIT's own protocol?
 # GIT's protocol is faster and more robust, when it works at all (firewalls
 # may block it). We make it the default, but if your GIT repository downloads
@@ -57,6 +60,7 @@ CONFIG_BLKTAP1      := @blktap1@
 CONFIG_BLKTAP2      := @blktap2@
 CONFIG_VTPM         := @vtpm@
 CONFIG_QEMUU_EXTRA_ARGS:= @EXTRA_QEMUU_CONFIGURE_ARGS@
+CONFIG_REMUS_NETBUF := @remus_netbuf@
 
 #System options
 ZLIB                := @zlib@
diff --git a/docs/README.remus b/docs/README.remus
index 9fa00fe..ddf5b55 100644
--- a/docs/README.remus
+++ b/docs/README.remus
@@ -2,3 +2,9 @@ Remus provides fault tolerance for virtual machines by sending continuous
 checkpoints to a backup, which will activate if the target VM fails.
 
 See the website at http://wiki.xen.org/wiki/Remus for details.
+
+Using Remus with libxl on Xen 4.5 and higher:
+ To enable network buffering, you need libnl 3.2.8
+ or higher along with the development headers and command line utilities.
+ If your distro does not have the appropriate libnl3 version, you can find
+ the latest source tarball of libnl3 at http://www.carisma.slowglass.com/~tgr/libnl/
diff --git a/tools/configure.ac b/tools/configure.ac
index 6d70f04..a1f47e2 100644
--- a/tools/configure.ac
+++ b/tools/configure.ac
@@ -287,5 +287,21 @@ esac
 # Checks for header files.
 AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h valgrind/memcheck.h utmp.h])
 
+# Check for libnl3 >=3.2.8. If present enable remus network buffering.
+PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8],
+    [libnl3_lib="y"], [libnl3_lib="n"])
+
+AS_IF([test "x$libnl3_lib" = "xn" ], [
+    AC_MSG_WARN([Disabling support for Remus network buffering.
+    Please install libnl3 libraries, command line tools and devel
+    headers - version 3.2.8 or higher])
+    AC_SUBST(remus_netbuf, [n])
+    ],[
+    AC_SUBST(remus_netbuf, [y])
+])
+
+AC_SUBST(LIBNL3_LIBS)
+AC_SUBST(LIBNL3_CFLAGS)
+
 AC_OUTPUT()
 
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 78c1996..086fbe0 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -21,11 +21,17 @@ endif
 
 LIBXL_LIBS =
 LIBXL_LIBS = $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) $(LDLIBS_libblktapctl) $(PTYFUNCS_LIBS) $(LIBUUID_LIBS)
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+LIBXL_LIBS += $(LIBNL3_LIBS)
+endif
 
 CFLAGS_LIBXL += $(CFLAGS_libxenctrl)
 CFLAGS_LIBXL += $(CFLAGS_libxenguest)
 CFLAGS_LIBXL += $(CFLAGS_libxenstore)
 CFLAGS_LIBXL += $(CFLAGS_libblktapctl) 
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
+endif
 CFLAGS_LIBXL += -Wshadow
 
 LIBXL_LIBS-$(CONFIG_ARM) += -lfdt
@@ -43,6 +49,13 @@ LIBXL_OBJS-y += libxl_blktap2.o
 else
 LIBXL_OBJS-y += libxl_noblktap2.o
 endif
+
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+LIBXL_OBJS-y += libxl_netbuffer.o
+else
+LIBXL_OBJS-y += libxl_nonetbuffer.o
+endif
+
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index e8f2abb..f8441f6 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2468,6 +2468,7 @@ typedef struct libxl__save_helper_state {
                       * marshalling and xc callback functions */
 } libxl__save_helper_state;
 
+_hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Domain suspend (save) state structure -----*/
 
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
new file mode 100644
index 0000000..52d593c
--- /dev/null
+++ b/tools/libxl/libxl_netbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2014
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 1;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
new file mode 100644
index 0000000..1c72a7f
--- /dev/null
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2014
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v15 2/7] remus: introduce remus device
  2014-07-07  8:26 [PATCH v15 0/7] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
  2014-07-07  8:26 ` [PATCH v15 1/7] remus: add libnl3 dependency for network buffering support Yang Hongyang
@ 2014-07-07  8:26 ` Yang Hongyang
  2014-07-09 17:26   ` Ian Jackson
  2014-07-07  8:26 ` [PATCH v15 3/7] remus netbuffer: implement remus network buffering for nic devices Yang Hongyang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Yang Hongyang @ 2014-07-07  8:26 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
	eddie.dong, rshriram, laijs

introduce remus device, an abstract layer of remus devices(nic, disk,
etc).It provides the following APIs for libxl:
  >libxl__remus_device_setup
    setup remus devices, like attach qdisc, enable disk buffering, etc
  >libxl__remus_device_teardown
    teardown devices
  >libxl__remus_device_postsuspend
  >libxl__remus_device_preresume
  >libxl__remus_device_commit
    above three are for checkpoint.
through remus device layer, the remus execution flow will be like
this:
  xl remus -> remus device setup
                |-> remus checkpoint(postsuspend, preresume, commit)
                      ...
                       |-> remus device teardown, failover or abort
the remus device layer provides an interface
  libxl__remus_device_ops
which a remus device must implement. the whole remus structure:
                            |remus|
                               |
                        |remus device|
                               |
                |nic| |drbd disks| |qemu disks| ...
a device(nic, drbd disks, qemu disks, etc) must implement
libxl__remus_device_ops to support remus.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 tools/libxl/Makefile             |   2 +
 tools/libxl/libxl.c              |  34 +++-
 tools/libxl/libxl.h              |   6 +
 tools/libxl/libxl_dom.c          | 132 ++++++++++++--
 tools/libxl/libxl_internal.h     | 184 +++++++++++++++++++
 tools/libxl/libxl_remus_device.c | 379 +++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_types.idl      |   1 +
 7 files changed, 722 insertions(+), 16 deletions(-)
 create mode 100644 tools/libxl/libxl_remus_device.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 086fbe0..9c81a59 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -56,6 +56,8 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
+LIBXL_OBJS-y += libxl_remus_device.o
+
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 39f1c28..bcbd02b 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -733,6 +733,31 @@ out:
 static void remus_failover_cb(libxl__egc *egc,
                               libxl__domain_suspend_state *dss, int rc);
 
+static void libxl__remus_setup_failed(libxl__egc *egc,
+                                      libxl__remus_state *rs, int rc)
+{
+    STATE_AO_GC(rs->ao);
+    libxl__ao_complete(egc, ao, rc);
+}
+
+static void libxl__remus_setup_done(libxl__egc *egc,
+                                    libxl__remus_state *rs, int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
+    STATE_AO_GC(rs->ao);
+
+    if (!rc) {
+        libxl__domain_suspend(egc, dss);
+        return;
+    }
+
+    LOG(ERROR, "Remus: failed to setup device for guest with domid %u",
+        dss->domid);
+    rs->saved_rc = rc;
+    rs->callback = libxl__remus_setup_failed;
+    libxl__remus_device_teardown(egc, rs);
+}
+
 /* TODO: Explicit Checkpoint acknowledgements via recv_fd. */
 int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
                              uint32_t domid, int send_fd, int recv_fd,
@@ -761,10 +786,15 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     assert(info);
 
-    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+    /* Convenience aliases */
+    libxl__remus_state *const rs = &dss->rs;
+    rs->ao = ao;
+    rs->domid = domid;
+    rs->saved_rc = 0;
+    rs->callback = libxl__remus_setup_done;
 
     /* Point of no return */
-    libxl__domain_suspend(egc, dss);
+    libxl__remus_device_setup(egc, rs);
     return AO_INPROGRESS;
 
  out:
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 459557d..0fb1aad 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -565,6 +565,12 @@ typedef struct libxl__ctx libxl_ctx;
  */
 #define LIBXL_HAVE_CPUPOOL_NAME 1
 
+/*
+ * LIBXL_HAVE_REMUS
+ * If this is defined, then libxl supports remus.
+ */
+#define LIBXL_HAVE_REMUS 1
+
 typedef uint8_t libxl_mac[6];
 #define LIBXL_MAC_FMT "%02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx"
 #define LIBXL_MAC_FMTLEN ((2*6)+5) /* 6 hex bytes plus 5 colons */
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 83eb29a..8cc90dc 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1454,6 +1454,17 @@ static void libxl__domain_suspend_callback(void *data)
     domain_suspend_callback_common(egc, dss);
 }
 
+static void remus_device_postsuspend_cb(libxl__egc *egc,
+                                        libxl__remus_state *rs, int rc)
+{
+    int ok = 0;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
+
+    if (!rc)
+        ok = 1;
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
 static void domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dss, int ok)
 {
@@ -1475,32 +1486,51 @@ static void libxl__remus_domain_suspend_callback(void *data)
 static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
                                 libxl__domain_suspend_state *dss, int ok)
 {
-    /* REMUS TODO: Issue disk and network checkpoint reqs. */
+    if (!ok)
+        goto out;
+
+    libxl__remus_state *const rs = &dss->rs;
+    rs->callback = remus_device_postsuspend_cb;
+    libxl__remus_device_postsuspend(egc, rs);
+    return;
+
+out:
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
 }
 
-static void libxl__remus_domain_resume_callback(void *data)
+static void remus_device_preresume_cb(libxl__egc *egc,
+                                      libxl__remus_state *rs, int rc)
 {
     int ok = 0;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
+    STATE_AO_GC(dss->ao);
+
+    if (!rc) {
+        /* Resumes the domain and the device model */
+        if (!libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
+            ok = 1;
+    }
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+static void libxl__remus_domain_resume_callback(void *data)
+{
     libxl__save_helper_state *shs = data;
     libxl__egc *egc = shs->egc;
     libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
     STATE_AO_GC(dss->ao);
 
-    /* Resumes the domain and the device model */
-    if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
-        goto out;
-
-    /* REMUS TODO: Deal with disk. Start a new network output buffer */
-    ok = 1;
-out:
-    libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok);
+    libxl__remus_state *const rs = &dss->rs;
+    rs->callback = remus_device_preresume_cb;
+    libxl__remus_device_preresume(egc, rs);
 }
 
 /*----- remus asynchronous checkpoint callback -----*/
 
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc);
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs);
 
 static void libxl__remus_domain_checkpoint_callback(void *data)
 {
@@ -1517,13 +1547,67 @@ static void libxl__remus_domain_checkpoint_callback(void *data)
     }
 }
 
+static void remus_device_commit_cb(libxl__egc *egc,
+                                   libxl__remus_state *rs, int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
+
+    STATE_AO_GC(dss->ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to do device commit op."
+            " Terminating Remus..");
+        goto out;
+    } else {
+        /* Set checkpoint interval timeout */
+        rc = libxl__ev_time_register_rel(gc, &rs->timeout,
+                                         remus_next_checkpoint,
+                                         dss->interval);
+        if (rc) {
+            LOG(ERROR, "unable to register timeout for next epoch."
+                " Terminating Remus..");
+            goto out;
+        }
+    }
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc)
 {
-    /* REMUS TODO: Wait for disk and memory ack, release network buffer */
-    /* REMUS TODO: make this asynchronous */
-    assert(!rc); /* REMUS TODO handle this error properly */
-    usleep(dss->interval * 1000);
+    /* Convenience aliases */
+    libxl__remus_state *const rs = &dss->rs;
+
+    STATE_AO_GC(dss->ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to save device model. Terminating Remus..");
+        goto out;
+    }
+
+    rs->callback = remus_device_commit_cb;
+    libxl__remus_device_commit(egc, rs);
+
+    return;
+
+out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs)
+{
+    libxl__remus_state *rs = CONTAINER_OF(ev, *rs, timeout);
+
+    /* Convenience aliases */
+    libxl__domain_suspend_state *const dss = CONTAINER_OF(rs, *dss, rs);
+
+    STATE_AO_GC(dss->ao);
+
+    libxl__ev_time_deregister(gc, &rs->timeout);
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
 }
 
@@ -1738,6 +1822,13 @@ static void save_device_model_datacopier_done(libxl__egc *egc,
     dss->save_dm_callback(egc, dss, our_rc);
 }
 
+static void libxl__remus_teardown_done(libxl__egc *egc,
+                                       libxl__remus_state *rs, int rc)
+{
+    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
+    dss->callback(egc, dss, rc);
+}
+
 static void domain_suspend_done(libxl__egc *egc,
                         libxl__domain_suspend_state *dss, int rc)
 {
@@ -1752,6 +1843,19 @@ static void domain_suspend_done(libxl__egc *egc,
         xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
                            dss->guest_evtchn.port, &dss->guest_evtchn_lockfd);
 
+    if (dss->remus) {
+        /*
+         * With Remus, if we reach this point, it means either
+         * backup died or some network error occurred preventing us
+         * from sending checkpoints. Teardown the network buffers and
+         * release netlink resources.  This is an async op.
+         */
+        dss->rs.saved_rc = rc;
+        dss->rs.callback = libxl__remus_teardown_done;
+        libxl__remus_device_teardown(egc, &dss->rs);
+        return;
+    }
+
     dss->callback(egc, dss, rc);
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index f8441f6..8ef20a0 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2468,6 +2468,189 @@ typedef struct libxl__save_helper_state {
                       * marshalling and xc callback functions */
 } libxl__save_helper_state;
 
+/*----- remus device related state structure -----*/
+/* remus device is an abstract layer of remus devices(nic, disk,
+ * etc).It provides the following APIs for libxl:
+ *   >libxl__remus_device_setup
+ *     setup remus devices, like attach qdisc, enable disk buffering, etc
+ *   >libxl__remus_device_teardown
+ *     teardown devices
+ *   >libxl__remus_device_postsuspend
+ *   >libxl__remus_device_preresume
+ *   >libxl__remus_device_commit
+ *     above three are for checkpoint.
+ * through remus device layer, the remus execution flow will be like
+ * this:
+ * xl remus -> remus device setup
+ *               |-> remus checkpoint(postsuspend, preresume, commit)
+ *                     ...
+ *                      |-> remus device teardown, failover or abort
+ * the remus device layer provides an interface
+ *   libxl__remus_device_ops
+ * which a remus device must implement. the whole remus structure:
+ *                           |remus|
+ *                              |
+ *                       |remus device|
+ *                              |
+ *               |nic| |drbd disks| |qemu disks| ...
+ * a device(nic, drbd disks, qemu disks, etc) must implement
+ * libxl__remus_device_ops to support remus.
+ */
+
+typedef enum libxl__remus_device_kind {
+    LIBXL__REMUS_DEVICE_NIC,
+    LIBXL__REMUS_DEVICE_DISK,
+} libxl__remus_device_kind;
+
+typedef struct libxl__remus_state libxl__remus_state;
+typedef struct libxl__remus_device libxl__remus_device;
+typedef struct libxl__remus_device_state libxl__remus_device_state;
+typedef struct libxl__remus_device_ops libxl__remus_device_ops;
+
+struct libxl__remus_device_ops {
+    /* Which kind of device's ops */
+    libxl__remus_device_kind kind;
+    /*
+     * init() and destroy() APIs are produced by a device type and
+     * consumed by the main remus code, a device type must implement
+     * these two APIs.
+     */
+    /* init device ops private data, etc. must implement */
+    int (*init)(const libxl__remus_device_ops *self,
+                libxl__remus_state *rs);
+    /* free device ops private data, etc. must implement */
+    void (*destroy)(const libxl__remus_device_ops *self,
+                    libxl__remus_state *rs);
+
+    /*
+     * checkpoint callbacks, these are async ops, call dev->callback
+     * when done. These function pointers may be NULL, means the op is
+     * not implemented, and it will do nothing when checkpoint.
+     * The callers of these APIs must check the function pointer first.
+     * These callbacks can be implemented synchronously, call
+     * dev->callback at last directly.
+     */
+    void (*postsuspend)(libxl__remus_device *dev);
+    void (*preresume)(libxl__remus_device *dev);
+    void (*commit)(libxl__remus_device *dev);
+
+    /*
+     * This API determines whether the ops matchs the specific device. In the
+     * implementation, we first init all device ops, for example, NIC ops,
+     * DRBD ops ... Then we will find out the libxl devices, and match the
+     * device with the ops, if the device is a drbd disk, then it will be
+     * matched with DRBD ops, and the further ops(such as checkpoint ops etc.)
+     * of this device will using DRBD ops. This API is mainly for disks,
+     * because we must use an external script to determine whether a
+     * libxl_disk is a DRBD disk.
+     * This function pointer may be NULL. That means this *kind* of
+     * device's ops is always matched with the *kind* of device.
+     * It's an async op and must be implemented asynchronously,
+     * call dev->callback when done.
+     */
+    void (*match)(const libxl__remus_device_ops *self,
+                  libxl__remus_device *dev);
+
+    /*
+     * setup() and teardown() are refer to the actual remus device,
+     * a device type must implement these two APIs. They are async
+     * ops, and call dev->callback when done.
+     * These callbacks can be implemented synchronously, call
+     * dev->callback at last directly.
+     */
+    /* setup the remus device */
+    void (*setup)(libxl__remus_device *dev);
+
+    /* teardown the remus device */
+    void (*teardown)(libxl__remus_device *dev);
+};
+
+/*
+ * This structure is for remus device layer, it records remus devices
+ * that have been setuped.
+ */
+struct libxl__remus_device_state {
+    libxl__ao *ao;
+    libxl__egc *egc;
+
+    /* devices that have been setuped */
+    libxl__remus_device **dev;
+
+    int num_nics;
+    int num_disks;
+
+    /* for counting devices that have been handled */
+    int num_devices;
+    /* for counting devices that matched and setuped */
+    int num_set_up;
+};
+
+typedef void libxl__remus_device_callback(libxl__egc *,
+                                          libxl__remus_device *,
+                                          int rc);
+/*
+ * This structure is init and setup by remus device abstruct layer,
+ * and pass to remus device ops
+ */
+struct libxl__remus_device {
+    /*----- shared between abstract and concrete layers -----*/
+    /* set by remus device abstruct layer */
+    /* libxl__device_* which this remus device related to */
+    const void *backend_dev;
+    libxl__remus_device_kind kind;
+
+    /*----- private for abstract layer only -----*/
+    /*
+     * This is for matching, we must go through all device ops until we
+     * find a matched ops for the device. The ops_index record which ops
+     * we are matching.
+     */
+    int ops_index;
+    const libxl__remus_device_ops *ops;
+    libxl__remus_device_callback *callback;
+    libxl__remus_device_state *rds;
+
+    /*----- private for concrete (device-specific) layer -----*/
+    /* *kind* of device's private data */
+    void *data;
+    /* for calling scripts, eg. setup|teardown|match scripts */
+    libxl__async_exec_state aes;
+    /*
+     * for async func calls, in the implementation of device ops, we
+     * may use fork to do async ops. this is owned by device-specific
+     * ops methods
+     */
+    libxl__ev_child child;
+};
+
+typedef void libxl__remus_callback(libxl__egc *,
+                                   libxl__remus_state *, int rc);
+
+struct libxl__remus_state {
+    /* must set by caller of libxl__remus_device_(setup|teardown) */
+    libxl__ao *ao;
+    uint32_t domid;
+    libxl__remus_callback *callback;
+
+    /* private */
+    int saved_rc;
+    /* context containing device related stuff */
+    libxl__remus_device_state dev_state;
+
+    libxl__ev_time timeout; /* used for checkpoint */
+};
+
+/* the following 5 APIs are async ops, call rs->callback when done */
+_hidden void libxl__remus_device_setup(libxl__egc *egc,
+                                       libxl__remus_state *rs);
+_hidden void libxl__remus_device_teardown(libxl__egc *egc,
+                                          libxl__remus_state *rs);
+_hidden void libxl__remus_device_postsuspend(libxl__egc *egc,
+                                             libxl__remus_state *rs);
+_hidden void libxl__remus_device_preresume(libxl__egc *egc,
+                                           libxl__remus_state *rs);
+_hidden void libxl__remus_device_commit(libxl__egc *egc,
+                                        libxl__remus_state *rs);
 _hidden int libxl__netbuffer_enabled(libxl__gc *gc);
 
 /*----- Domain suspend (save) state structure -----*/
@@ -2498,6 +2681,7 @@ struct libxl__domain_suspend_state {
     int live;
     int debug;
     const libxl_domain_remus_info *remus;
+    libxl__remus_state rs;
     /* private */
     libxl__ev_evtchn guest_evtchn;
     int guest_evtchn_lockfd;
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
new file mode 100644
index 0000000..cd71242
--- /dev/null
+++ b/tools/libxl/libxl_remus_device.c
@@ -0,0 +1,379 @@
+/*
+ * Copyright (C) 2014
+ * Author: Yang Hongyang <yanghy@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+static const libxl__remus_device_ops *dev_ops[] = {
+};
+
+/*----- checkpointing APIs -----*/
+
+/* callbacks */
+
+static void device_common_cb(libxl__egc *egc,
+                             libxl__remus_device *dev,
+                             int rc)
+{
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = dev->rds;
+    libxl__remus_state *const rs = CONTAINER_OF(rds, *rs, dev_state);
+
+    STATE_AO_GC(rs->ao);
+
+    rds->num_devices++;
+
+    if (rc)
+        rs->saved_rc = ERROR_FAIL;
+
+    if (rds->num_devices == rds->num_set_up)
+        rs->callback(egc, rs, rs->saved_rc);
+}
+
+/* API implementations */
+
+void libxl__remus_device_postsuspend(libxl__egc *egc, libxl__remus_state *rs)
+{
+    int i;
+    libxl__remus_device *dev;
+    STATE_AO_GC(rs->ao);
+
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = &rs->dev_state;
+
+    rds->num_devices = 0;
+    rs->saved_rc = 0;
+
+    if(rds->num_set_up == 0)
+        goto out;
+
+    for (i = 0; i < rds->num_set_up; i++) {
+        dev = rds->dev[i];
+        dev->callback = device_common_cb;
+        if (dev->ops->postsuspend) {
+            dev->ops->postsuspend(dev);
+        } else {
+            rds->num_devices++;
+            if (rds->num_devices == rds->num_set_up)
+                rs->callback(egc, rs, rs->saved_rc);
+        }
+    }
+
+    return;
+
+out:
+    rs->callback(egc, rs, rs->saved_rc);
+}
+
+void libxl__remus_device_preresume(libxl__egc *egc, libxl__remus_state *rs)
+{
+    int i;
+    libxl__remus_device *dev;
+    STATE_AO_GC(rs->ao);
+
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = &rs->dev_state;
+
+    rds->num_devices = 0;
+    rs->saved_rc = 0;
+
+    if(rds->num_set_up == 0)
+        goto out;
+
+    for (i = 0; i < rds->num_set_up; i++) {
+        dev = rds->dev[i];
+        dev->callback = device_common_cb;
+        if (dev->ops->preresume) {
+            dev->ops->preresume(dev);
+        } else {
+            rds->num_devices++;
+            if (rds->num_devices == rds->num_set_up)
+                rs->callback(egc, rs, rs->saved_rc);
+        }
+    }
+
+    return;
+
+out:
+    rs->callback(egc, rs, rs->saved_rc);
+}
+
+void libxl__remus_device_commit(libxl__egc *egc, libxl__remus_state *rs)
+{
+    int i;
+    libxl__remus_device *dev;
+    STATE_AO_GC(rs->ao);
+
+    /*
+     * REMUS TODO: Wait for disk and explicit memory ack (through restore
+     * callback from remote) before releasing network buffer.
+     */
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = &rs->dev_state;
+
+    rds->num_devices = 0;
+    rs->saved_rc = 0;
+
+    if(rds->num_set_up == 0)
+        goto out;
+
+    for (i = 0; i < rds->num_set_up; i++) {
+        dev = rds->dev[i];
+        dev->callback = device_common_cb;
+        if (dev->ops->commit) {
+            dev->ops->commit(dev);
+        } else {
+            rds->num_devices++;
+            if (rds->num_devices == rds->num_set_up)
+                rs->callback(egc, rs, rs->saved_rc);
+        }
+    }
+
+    return;
+
+out:
+    rs->callback(egc, rs, rs->saved_rc);
+}
+
+/*----- main flow of control -----*/
+
+/* callbacks */
+
+static void device_setup_cb(libxl__egc *egc,
+                            libxl__remus_device *dev,
+                            int rc);
+static void device_match_cb(libxl__egc *egc,
+                            libxl__remus_device *dev,
+                            int rc)
+{
+    libxl__remus_device_state *const rds = dev->rds;
+    libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state);
+
+    STATE_AO_GC(rs->ao);
+
+    if (rc) {
+        if (++dev->ops_index >= ARRAY_SIZE(dev_ops) ||
+            rc != ERROR_REMUS_DEVOPS_NOT_MATCH) {
+            /* the device can not be matched */
+            rds->num_devices++;
+            rs->saved_rc = ERROR_FAIL;
+            if (rds->num_devices == (rds->num_nics + rds->num_disks))
+                rs->callback(egc, rs, rs->saved_rc);
+            return;
+        }
+        /* the ops does not match, try next ops */
+        for ( ; dev->ops_index < ARRAY_SIZE(dev_ops); dev->ops_index++) {
+            dev->ops = dev_ops[dev->ops_index];
+            if (dev->ops->kind == dev->kind) {
+                /*
+                 * we have entered match process, that means this *kind* of
+                 * device's ops must have a match() implementation.
+                 */
+                assert(dev->ops->match);
+                dev->ops->match(dev->ops, dev);
+                break;
+            }
+        }
+    } else {
+        /* the ops matched, setup the device */
+        dev->callback = device_setup_cb;
+        dev->ops->setup(dev);
+    }
+}
+
+static void device_setup_cb(libxl__egc *egc,
+                            libxl__remus_device *dev,
+                            int rc)
+{
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = dev->rds;
+    libxl__remus_state *const rs = CONTAINER_OF(rds, *rs, dev_state);
+
+    STATE_AO_GC(rs->ao);
+
+    rds->num_devices++;
+    /*
+     * we add devices that have been setuped to the array no matter
+     * the setup process succeed or failed because we need to ensure
+     * the device been teardown while setup failed. If any of the
+     * device setup failed, we will quit remus, but before we exit,
+     * we will teardown the devices that have been added to **dev
+     */
+    rds->dev[rds->num_set_up++] = dev;
+    if (rc) {
+        /* setup failed */
+        rs->saved_rc = ERROR_FAIL;
+    }
+
+    if (rds->num_devices == (rds->num_nics + rds->num_disks))
+        rs->callback(egc, rs, rs->saved_rc);
+}
+
+static void destroy_device_ops(libxl__remus_state *rs);
+static void device_teardown_cb(libxl__egc *egc,
+                               libxl__remus_device *dev,
+                               int rc)
+{
+    libxl__remus_device_state *const rds = dev->rds;
+    libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state);
+
+    STATE_AO_GC(rs->ao);
+
+    /* ignore teardown errors to teardown as many devs as possible*/
+    rds->num_set_up--;
+
+    if (rds->num_set_up == 0) {
+        destroy_device_ops(rs);
+        rs->callback(egc, rs, rs->saved_rc);
+    }
+}
+
+/* remus device setup and teardown */
+
+static __attribute__((unused)) void libxl__remus_device_init(libxl__egc *egc,
+                                     libxl__remus_device_state *rds,
+                                     libxl__remus_device_kind kind,
+                                     void *libxl_dev)
+{
+    libxl__remus_device *dev = NULL;
+
+    STATE_AO_GC(rds->ao);
+    GCNEW(dev);
+    dev->backend_dev = libxl_dev;
+    dev->kind = kind;
+    dev->rds = rds;
+
+    libxl__async_exec_init(&dev->aes);
+    libxl__ev_child_init(&dev->child);
+
+    /* match the ops begin */
+    for (dev->ops_index = 0;
+         dev->ops_index < ARRAY_SIZE(dev_ops);
+         dev->ops_index++) {
+        dev->ops = dev_ops[dev->ops_index];
+        if (dev->ops->kind == dev->kind) {
+            if (dev->ops->match) {
+                dev->callback = device_match_cb;
+                dev->ops->match(dev->ops, dev);
+            } else {
+                /*
+                 * This devops do not have match() implementation.
+                 * That means this *kind* of device's ops is always
+                 * matched with the *kind* of device.
+                 */
+                dev->callback = device_setup_cb;
+                dev->ops->setup(dev);
+            }
+            break;
+        }
+    }
+}
+
+static int init_device_ops(libxl__remus_state *rs)
+{
+    int i, rc;
+    const libxl__remus_device_ops *ops;
+
+    for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
+        ops = dev_ops[i];
+        if (ops->init(ops, rs)) {
+            rc = ERROR_FAIL;
+            goto out;
+        }
+    }
+
+    rc = 0;
+out:
+    return rc;
+
+}
+
+static void destroy_device_ops(libxl__remus_state *rs)
+{
+    int i;
+    const libxl__remus_device_ops *ops;
+
+    for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
+        ops = dev_ops[i];
+        ops->destroy(ops, rs);
+    }
+}
+
+void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
+{
+    STATE_AO_GC(rs->ao);
+
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = &rs->dev_state;
+
+    if (ARRAY_SIZE(dev_ops) == 0)
+        goto out;
+
+    rs->saved_rc = init_device_ops(rs);
+    if (rs->saved_rc)
+        goto out;
+
+    rds->ao = rs->ao;
+    rds->egc = egc;
+    rds->num_devices = 0;
+    rds->num_nics = 0;
+    rds->num_disks = 0;
+
+    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+
+    if (rds->num_nics == 0 && rds->num_disks == 0)
+        goto out;
+
+    GCNEW_ARRAY(rds->dev, rds->num_nics + rds->num_disks);
+
+    /* TBD: CALL libxl__remus_device_init to init remus devices */
+
+    return;
+
+out:
+    rs->callback(egc, rs, rs->saved_rc);
+    return;
+}
+
+void libxl__remus_device_teardown(libxl__egc *egc, libxl__remus_state *rs)
+{
+    int i, num_set_up;
+    libxl__remus_device *dev;
+
+    STATE_AO_GC(rs->ao);
+
+    /* Convenience aliases */
+    libxl__remus_device_state *const rds = &rs->dev_state;
+
+    if (rds->num_set_up == 0) {
+        destroy_device_ops(rs);
+        goto out;
+    }
+
+    /* we will decrease rds->num_set_up in the teardown callback */
+    num_set_up = rds->num_set_up;
+    for (i = 0; i < num_set_up; i++) {
+        dev = rds->dev[i];
+        dev->callback = device_teardown_cb;
+        dev->ops->teardown(dev);
+    }
+
+    return;
+
+out:
+    rs->callback(egc, rs, rs->saved_rc);
+    return;
+}
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index de25f42..e56567f 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -44,6 +44,7 @@ libxl_error = Enumeration("error", [
     (-12, "OSEVENT_REG_FAIL"),
     (-13, "BUFFERFULL"),
     (-14, "UNKNOWN_CHILD"),
+    (-15, "REMUS_DEVOPS_NOT_MATCH"),
     ], value_namespace = "")
 
 libxl_domain_type = Enumeration("domain_type", [
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v15 3/7] remus netbuffer: implement remus network buffering for nic devices
  2014-07-07  8:26 [PATCH v15 0/7] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
  2014-07-07  8:26 ` [PATCH v15 1/7] remus: add libnl3 dependency for network buffering support Yang Hongyang
  2014-07-07  8:26 ` [PATCH v15 2/7] remus: introduce remus device Yang Hongyang
@ 2014-07-07  8:26 ` Yang Hongyang
  2014-07-09 23:15   ` Ian Jackson
  2014-07-07  8:26 ` [PATCH v15 4/7] remus drbd: Implement remus drbd replicated disk Yang Hongyang
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Yang Hongyang @ 2014-07-07  8:26 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
	eddie.dong, rshriram, laijs

1.Add two members in libxl_domain_remus_info:
    netbuf: whether netbuf is enabled
    netbufscript: the path of the script which will be run to setup
       and tear down the guest's interface.
2.Introduce remus-netbuf-setup hotplug script responsible for
  setting up and tearing down the necessary infrastructure required for
  network output buffering in Remus.  This script is intended to be invoked
  by libxl for each guest interface, when starting or stopping Remus.

  Apart from returning success/failure indication via the usual hotplug
  entries in xenstore, this script also writes to xenstore, the name of
  the IFB device to be used to control the vif's network output.

  The script relies on libnl3 command line utilities to perform various
  setup/teardown functions. The script is confined to Linux platforms only
  since NetBSD does not seem to have libnl3.

  The following steps are taken during init:
    a) establish a dedicated remus context containing libnl related
       state (netlink sockets)

  The following steps are taken for each vif during setup:
    a) call the hotplug script to setup its network buffer and
       init qdisc caches

    b) Obtain handles to plug qdiscs installed on the IFB devices
       chosen by the hotplug scripts.

  And during teardown, the netlink resources are released, followed by
  invocation of hotplug scripts to remove the ifb devices.
3.Implement the remus device interface. setup, teardown, etc. The
  checkpoint callbacks for netbuffer are implemented as sync op because
  net ops are quick enough and will not block.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 docs/misc/xenstore-paths.markdown      |   4 +
 tools/hotplug/Linux/Makefile           |   1 +
 tools/hotplug/Linux/remus-netbuf-setup | 203 ++++++++++++++
 tools/libxl/libxl.c                    |  21 +-
 tools/libxl/libxl_internal.h           |  10 +
 tools/libxl/libxl_netbuffer.c          | 475 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl_nonetbuffer.c        |  27 ++
 tools/libxl/libxl_remus_device.c       |  21 +-
 tools/libxl/libxl_types.idl            |   2 +
 9 files changed, 759 insertions(+), 5 deletions(-)
 create mode 100644 tools/hotplug/Linux/remus-netbuf-setup

diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown
index ea67536..d94ea9d 100644
--- a/docs/misc/xenstore-paths.markdown
+++ b/docs/misc/xenstore-paths.markdown
@@ -393,6 +393,10 @@ The guest's virtual time offset from UTC in seconds.
 
 The device model version for a domain.
 
+#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL]
+
+ifb device used by Remus to buffer network output from the associated vif.
+
 [BLKIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html
 [FBIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,fbif.h.html
 [HVMPARAMS]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,hvm,params.h.html
diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index d5de9e6..721f8c0 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -16,6 +16,7 @@ XEN_SCRIPTS += vif-nat
 XEN_SCRIPTS += vif-openvswitch
 XEN_SCRIPTS += vif2
 XEN_SCRIPTS += vif-setup
+XEN_SCRIPTS-$(CONFIG_REMUS_NETBUF) += remus-netbuf-setup
 XEN_SCRIPTS += block
 XEN_SCRIPTS += block-enbd block-nbd
 XEN_SCRIPTS-$(CONFIG_BLKTAP1) += blktap
diff --git a/tools/hotplug/Linux/remus-netbuf-setup b/tools/hotplug/Linux/remus-netbuf-setup
new file mode 100644
index 0000000..58c46f3
--- /dev/null
+++ b/tools/hotplug/Linux/remus-netbuf-setup
@@ -0,0 +1,203 @@
+#!/bin/bash
+#============================================================================
+# ${XEN_SCRIPT_DIR}/remus-netbuf-setup
+#
+# Script for attaching a network buffer to the specified vif (in any mode).
+# The hotplugging system will call this script when starting remus via libxl
+# API, libxl_domain_remus_start.
+#
+# Usage:
+# remus-netbuf-setup (setup|teardown)
+#
+# Environment vars:
+# vifname     vif interface name (required).
+# XENBUS_PATH path in Xenstore, where the IFB device details will be stored
+#                      or read from (required).
+#             (libxl passes /libxl/<domid>/remus/netbuf/<devid>)
+# IFB         ifb interface to be cleaned up (required). [for teardown op only]
+
+# Written to the store: (setup operation)
+# XENBUS_PATH/ifb=<ifbdevName> the IFB device serving
+#  as the intermediate buffer through which the interface's network output
+#  can be controlled.
+#
+# To install a network buffer on a guest vif (vif1.0) using ifb (ifb0)
+# we need to do the following
+#
+#  ip link set dev ifb0 up
+#  tc qdisc add dev vif1.0 ingress
+#  tc filter add dev vif1.0 parent ffff: proto ip \
+#    prio 10 u32 match u32 0 0 action mirred egress redirect dev ifb0
+#  nl-qdisc-add --dev=ifb0 --parent root plug
+#  nl-qdisc-add --dev=ifb0 --parent root --update plug --limit=10000000
+#                                                (10MB limit on buffer)
+#
+# So order of operations when installing a network buffer on vif1.0
+# 1. find a free ifb and bring up the device
+# 2. redirect traffic from vif1.0 to ifb:
+#   2.1 add ingress qdisc to vif1.0 (to capture outgoing packets from guest)
+#   2.2 use tc filter command with actions mirred egress + redirect
+# 3. install plug_qdisc on ifb device, with which we can buffer/release
+#    guest's network output from vif1.0
+#
+#
+
+#============================================================================
+
+# Unlike other vif scripts, vif-common is not needed here as it executes vif
+#specific setup code such as renaming.
+dir=$(dirname "$0")
+. "$dir/xen-hotplug-common.sh"
+
+findCommand "$@"
+
+if [ "$command" != "setup" -a  "$command" != "teardown" ]
+then
+  echo "Invalid command: $command"
+  log err "Invalid command: $command"
+  exit 1
+fi
+
+evalVariables "$@"
+
+: ${vifname:?}
+: ${XENBUS_PATH:?}
+
+check_libnl_tools() {
+    if ! command -v nl-qdisc-list > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-list tool"
+    fi
+    if ! command -v nl-qdisc-add > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-add tool"
+    fi
+    if ! command -v nl-qdisc-delete > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-delete tool"
+    fi
+}
+
+# We only check for modules. We don't load them.
+# User/Admin is supposed to load ifb during boot time,
+# ensuring that there are enough free ifbs in the system.
+# Other modules will be loaded automatically by tc commands.
+check_modules() {
+    for m in ifb sch_plug sch_ingress act_mirred cls_u32
+    do
+        if ! modinfo $m > /dev/null 2>&1; then
+            fatal "Unable to find $m kernel module"
+        fi
+    done
+}
+
+xs_write_failed() {
+    local vif=$1
+    local ifb=$2
+    teardown_netbuf "$vifname" "$IFB"
+    fatal "failed to write ifb name to xenstore"
+}
+
+#return 0 if the ifb is free
+check_ifb() {
+    local installed=`nl-qdisc-list -d $1`
+    [ -n "$installed" ] && return 1
+
+    for domid in `xenstore-list "/local/domain" 2>/dev/null || true`
+    do
+        [ $domid -eq 0 ] && continue
+        xenstore-exists "/libxl/$domid/remus/netbuf" || continue
+        for devid in `xenstore-list "/libxl/$domid/remus/netbuf" 2>/dev/null || true`
+        do
+            local path="/libxl/$domid/remus/netbuf/$devid/ifb"
+            xenstore-exists $path || continue
+            local ifb=`xenstore-read "$path" 2>/dev/null || true`
+            [ "$ifb" = "$1" ] && return 1
+        done
+    done
+
+    return 0
+}
+
+setup_ifb() {
+
+    for ifb in `ifconfig -a -s|egrep ^ifb|cut -d ' ' -f1`
+    do
+        check_ifb "$ifb" || continue
+        IFB="$ifb"
+        break
+    done
+
+    if [ -z "$IFB" ]
+    then
+        fatal "Unable to find a free IFB device for $vifname"
+    fi
+
+    #not using xenstore_write that automatically exits on error
+    #because we need to cleanup
+    _xenstore_write "$XENBUS_PATH/ifb" "$IFB" || xs_write_failed "$vifname" "$IFB"
+    do_or_die ip link set dev "$IFB" up
+}
+
+redirect_vif_traffic() {
+    local vif=$1
+    local ifb=$2
+
+    do_or_die tc qdisc add dev "$vif" ingress
+
+    tc filter add dev "$vif" parent ffff: proto ip prio 10 \
+        u32 match u32 0 0 action mirred egress redirect dev "$ifb" >/dev/null 2>&1
+
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to redirect traffic from $vif to $ifb"
+    fi
+}
+
+add_plug_qdisc() {
+    local vif=$1
+    local ifb=$2
+
+    nl-qdisc-add --dev="$ifb" --parent root plug >/dev/null 2>&1
+    if [ $? -ne 0 ]
+    then
+        do_without_error tc qdisc del dev "$vif" ingress
+        fatal "Failed to add plug qdisc to $ifb"
+    fi
+
+    #set ifb buffering limit in bytes. Its okay if this command fails
+    nl-qdisc-add --dev="$ifb" --parent root \
+        --update plug --limit=10000000 >/dev/null 2>&1 || true
+}
+
+teardown_netbuf() {
+    local vif=$1
+    local ifb=$2
+
+    if [ "$ifb" ]; then
+        do_without_error ip link set dev "$ifb" down
+        do_without_error nl-qdisc-delete --dev="$ifb" --parent root plug >/dev/null 2>&1
+        xenstore-rm -t "$XENBUS_PATH/ifb" 2>/dev/null || true
+    fi
+    do_without_error tc qdisc del dev "$vif" ingress
+    xenstore-rm -t "$XENBUS_PATH/hotplug-status" 2>/dev/null || true
+    xenstore-rm -t "$XENBUS_PATH/hotplug-error" 2>/dev/null || true
+}
+
+case "$command" in
+    setup)
+        check_libnl_tools
+        check_modules
+
+        claim_lock "pickifb"
+        setup_ifb
+        redirect_vif_traffic "$vifname" "$IFB"
+        add_plug_qdisc "$vifname" "$IFB"
+        release_lock "pickifb"
+
+        success
+        ;;
+    teardown)
+        teardown_netbuf "$vifname" "$IFB"
+        ;;
+esac
+
+log debug "Successful remus-netbuf-setup $command for $vifname, ifb $IFB."
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index bcbd02b..b7d62c1 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -788,6 +788,24 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     /* Convenience aliases */
     libxl__remus_state *const rs = &dss->rs;
+
+    /* Setup network buffering */
+    if (info->netbuf) {
+        if (!libxl__netbuffer_enabled(gc)) {
+            LOG(ERROR, "Remus: No support for network buffering");
+            goto out;
+        }
+
+        if (info->netbufscript) {
+            rs->netbufscript =
+                libxl__strdup(gc, info->netbufscript);
+        } else {
+            rs->netbufscript =
+                GCSPRINTF("%s/remus-netbuf-setup",
+                libxl__xen_script_dir_path());
+        }
+    }
+
     rs->ao = ao;
     rs->domid = domid;
     rs->saved_rc = 0;
@@ -811,9 +829,6 @@ static void remus_failover_cb(libxl__egc *egc,
      * from sending checkpoints.
      */
 
-    /* TBD: Remus cleanup - i.e. detach qdisc, release other
-     * resources.
-     */
     libxl__ao_complete(egc, ao, rc);
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 8ef20a0..2fe36a6 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -310,6 +310,10 @@ struct libxl__gc {
     libxl_ctx *owner;
 };
 
+/* remus device ops specific structures start */
+typedef struct libxl__remus_netbuf_state libxl__remus_netbuf_state;
+/* remus device ops specific structures end */
+
 struct libxl__ctx {
     xentoollog_logger *lg;
     xc_interface *xch;
@@ -374,6 +378,9 @@ struct libxl__ctx {
     LIBXL_LIST_ENTRY(libxl_ctx) sigchld_users_entry;
 
     libxl_version_info version_info;
+
+    /* remus device ops specific structures */
+    libxl__remus_netbuf_state *rns;
 };
 
 typedef struct {
@@ -2576,6 +2583,7 @@ struct libxl__remus_device_state {
     /* devices that have been setuped */
     libxl__remus_device **dev;
 
+    libxl_device_nic *nics;
     int num_nics;
     int num_disks;
 
@@ -2631,6 +2639,8 @@ struct libxl__remus_state {
     libxl__ao *ao;
     uint32_t domid;
     libxl__remus_callback *callback;
+    /* Script to setup/teardown network buffers */
+    const char *netbufscript;
 
     /* private */
     int saved_rc;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 52d593c..025ee89 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -17,11 +17,486 @@
 
 #include "libxl_internal.h"
 
+#include <netlink/cache.h>
+#include <netlink/socket.h>
+#include <netlink/attr.h>
+#include <netlink/route/link.h>
+#include <netlink/route/route.h>
+#include <netlink/route/qdisc.h>
+#include <netlink/route/qdisc/plug.h>
+
+struct libxl__remus_netbuf_state {
+    libxl__ao *ao;
+    uint32_t domid;
+    const char *netbufscript;
+
+    struct nl_sock *nlsock;
+    struct nl_cache *qdisc_cache;
+};
+
+typedef struct libxl__remus_device_nic {
+    int devid;
+    const char *vif;
+    const char *ifb;
+    struct rtnl_qdisc *qdisc;
+} libxl__remus_device_nic;
+
 int libxl__netbuffer_enabled(libxl__gc *gc)
 {
     return 1;
 }
 
+/*----- init() and destroy() -----*/
+
+static int nic_init(const libxl__remus_device_ops *self,
+                    libxl__remus_state *rs)
+{
+    int rc;
+    libxl__remus_netbuf_state *ns;
+
+    STATE_AO_GC(rs->ao);
+
+    GCNEW(ns);
+    CTX->rns = ns;
+
+    ns->nlsock = nl_socket_alloc();
+    if (!ns->nlsock) {
+        LOG(ERROR, "cannot allocate nl socket");
+        goto out;
+    }
+
+    rc = nl_connect(ns->nlsock, NETLINK_ROUTE);
+    if (rc) {
+        LOG(ERROR, "failed to open netlink socket: %s",
+            nl_geterror(rc));
+        goto out;
+    }
+
+    /* get list of all qdiscs installed on network devs. */
+    rc = rtnl_qdisc_alloc_cache(ns->nlsock, &ns->qdisc_cache);
+    if (rc) {
+        LOG(ERROR, "failed to allocate qdisc cache: %s",
+            nl_geterror(rc));
+        goto out;
+    }
+
+    ns->ao = rs->ao;
+    ns->domid = rs->domid;
+    ns->netbufscript = rs->netbufscript;
+
+    return 0;
+
+out:
+    return ERROR_FAIL;
+}
+
+static void nic_destroy(const libxl__remus_device_ops *self,
+                        libxl__remus_state *rs)
+{
+    STATE_AO_GC(rs->ao);
+    libxl__remus_netbuf_state *ns = CTX->rns;
+
+    if (!ns)
+        return;
+
+    /* free qdisc cache */
+    if (ns->qdisc_cache) {
+        nl_cache_clear(ns->qdisc_cache);
+        nl_cache_free(ns->qdisc_cache);
+        ns->qdisc_cache = NULL;
+    }
+
+    /* close & free nlsock */
+    if (ns->nlsock) {
+        nl_close(ns->nlsock);
+        nl_socket_free(ns->nlsock);
+        ns->nlsock = NULL;
+    }
+}
+
+/*----- checkpointing APIs -----*/
+
+/* The buffer_op's value, not the value passed to kernel */
+enum {
+    tc_buffer_start,
+    tc_buffer_release
+};
+
+/* API implementations */
+
+static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
+                           libxl__remus_netbuf_state *netbuf_state,
+                           int buffer_op)
+{
+    int rc;
+
+    STATE_AO_GC(netbuf_state->ao);
+
+    if (buffer_op == tc_buffer_start)
+        rc = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
+    else
+        rc = rtnl_qdisc_plug_release_one(remus_nic->qdisc);
+
+    if (rc)
+        goto out;
+
+    rc = rtnl_qdisc_add(netbuf_state->nlsock,
+                        remus_nic->qdisc,
+                        NLM_F_REQUEST);
+    if (rc)
+        goto out;
+
+    return 0;
+
+out:
+    LOG(ERROR, "Remus: cannot do netbuf op %s on %s:%s",
+        ((buffer_op == tc_buffer_start) ?
+        "start_new_epoch" : "release_prev_epoch"),
+        remus_nic->ifb, nl_geterror(rc));
+    return ERROR_FAIL;
+}
+
+static void nic_postsuspend(libxl__remus_device *dev)
+{
+    int rc;
+    libxl__remus_device_nic *remus_nic = dev->data;
+    STATE_AO_GC(dev->rds->ao);
+    libxl__remus_netbuf_state *ns = CTX->rns;
+
+    rc = remus_netbuf_op(remus_nic, ns, tc_buffer_start);
+    dev->callback(dev->rds->egc, dev, rc);
+}
+
+static void nic_commit(libxl__remus_device *dev)
+{
+    int rc;
+    libxl__remus_device_nic *remus_nic = dev->data;
+    STATE_AO_GC(dev->rds->ao);
+    libxl__remus_netbuf_state *ns = CTX->rns;
+
+    rc = remus_netbuf_op(remus_nic, ns, tc_buffer_release);
+    dev->callback(dev->rds->egc, dev, rc);
+}
+
+/*----- main flow of control -----*/
+
+/* helper functions */
+
+/*
+ * If the device has a vifname, then use that instead of
+ * the vifX.Y format.
+ * it must ONLY be used for remus because if driver domains
+ * were in use it would constitute a security vulnerability.
+ */
+static const char *get_vifname(libxl__remus_device *dev,
+                               const libxl_device_nic *nic)
+{
+    const char *vifname = NULL;
+    const char *path;
+    int rc;
+
+    STATE_AO_GC(dev->rds->ao);
+
+    /* Convenience aliases */
+    libxl__remus_netbuf_state *netbuf_state = CTX->rns;
+    const uint32_t domid = netbuf_state->domid;
+
+    path = libxl__sprintf(gc, "%s/backend/vif/%d/%d/vifname",
+                          libxl__xs_get_dompath(gc, 0), domid, nic->devid);
+    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
+    if (!rc && !vifname) {
+        /* use the default name */
+        vifname = libxl__device_nic_devname(gc, domid,
+                                            nic->devid,
+                                            nic->nictype);
+    }
+
+    return vifname;
+}
+
+static void free_qdisc(libxl__remus_device_nic *remus_nic)
+{
+    /* free qdiscs */
+    if (remus_nic->qdisc == NULL)
+        return;
+
+    nl_object_put((struct nl_object *)(remus_nic->qdisc));
+    remus_nic->qdisc = NULL;
+}
+
+static int init_qdisc(libxl__remus_netbuf_state *netbuf_state,
+                      libxl__remus_device_nic *remus_nic)
+{
+    int rc, ifindex;
+    struct rtnl_link *ifb = NULL;
+    struct rtnl_qdisc *qdisc = NULL;
+
+    STATE_AO_GC(netbuf_state->ao);
+
+    /* Now that we have brought up IFB device with plug qdisc for
+     * this vif, so we need to refill the qdisc cache.
+     */
+    rc = nl_cache_refill(netbuf_state->nlsock, netbuf_state->qdisc_cache);
+    if (rc) {
+        LOG(ERROR, "cannot refill qdisc cache: %s", nl_geterror(rc));
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    /* get a handle to the IFB interface */
+    ifb = NULL;
+    rc = rtnl_link_get_kernel(netbuf_state->nlsock, 0,
+                               remus_nic->ifb, &ifb);
+    if (rc) {
+        LOG(ERROR, "cannot obtain handle for %s: %s", remus_nic->ifb,
+            nl_geterror(rc));
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    rc = ERROR_FAIL;
+    ifindex = rtnl_link_get_ifindex(ifb);
+    if (!ifindex) {
+        LOG(ERROR, "interface %s has no index", remus_nic->ifb);
+        goto out;
+    }
+
+    /* Get a reference to the root qdisc installed on the IFB, by
+     * querying the qdisc list we obtained earlier. The netbufscript
+     * sets up the plug qdisc as the root qdisc, so we don't have to
+     * search the entire qdisc tree on the IFB dev.
+
+     * There is no need to explicitly free this qdisc as its just a
+     * reference from the qdisc cache we allocated earlier.
+     */
+    qdisc = rtnl_qdisc_get_by_parent(netbuf_state->qdisc_cache, ifindex,
+                                     TC_H_ROOT);
+
+    if (qdisc) {
+        const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
+        /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
+        if (!tc_kind || strcmp(tc_kind, "plug")) {
+            nl_object_put((struct nl_object *)qdisc);
+            LOG(ERROR, "plug qdisc is not installed on %s", remus_nic->ifb);
+            goto out;
+        }
+        remus_nic->qdisc = qdisc;
+        rc = 0;
+    } else {
+        LOG(ERROR, "Cannot get qdisc handle from ifb %s", remus_nic->ifb);
+    }
+
+out:
+    if (ifb)
+        rtnl_link_put(ifb);
+
+    return rc;
+}
+
+/* callbacks */
+
+/*
+ * In return, the script writes the name of IFB device (during setup) to be
+ * used for output buffering into XENBUS_PATH/ifb
+ */
+static void netbuf_setup_script_cb(libxl__egc *egc,
+                                   libxl__async_exec_state *aes,
+                                   int status)
+{
+    libxl__remus_device *dev = CONTAINER_OF(aes, *dev, aes);
+    libxl__remus_device_nic *remus_nic = dev->data;
+    const char *out_path_base, *hotplug_error = NULL;
+    int rc;
+
+    STATE_AO_GC(dev->rds->ao);
+
+    /* Convenience aliases */
+    libxl__remus_netbuf_state *netbuf_state = CTX->rns;
+    const uint32_t domid = netbuf_state->domid;
+    const int devid = remus_nic->devid;
+    const char *const vif = remus_nic->vif;
+    const char **const ifb = &remus_nic->ifb;
+
+    /*
+     * we need to get ifb first because it's needed for teardown
+     */
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/remus/netbuf/%d/ifb",
+                                          libxl__xs_libxl_path(gc, domid),
+                                          devid),
+                                ifb);
+    if (rc)
+        goto out;
+
+    if (!(*ifb)) {
+        LOG(ERROR, "Cannot get ifb dev name for domain %u dev %s",
+            domid, vif);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    out_path_base = GCSPRINTF("%s/remus/netbuf/%d",
+                              libxl__xs_libxl_path(gc, domid), devid);
+
+    rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                GCSPRINTF("%s/hotplug-error", out_path_base),
+                                &hotplug_error);
+    if (rc)
+        goto out;
+
+    if (hotplug_error) {
+        LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
+            netbuf_state->netbufscript, vif, hotplug_error);
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    if (status) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
+    rc = init_qdisc(netbuf_state, remus_nic);
+
+out:
+    dev->callback(egc, dev, rc);
+}
+
+static void netbuf_teardown_script_cb(libxl__egc *egc,
+                                      libxl__async_exec_state *aes,
+                                      int status)
+{
+    int rc;
+    libxl__remus_device *dev = CONTAINER_OF(aes, *dev, aes);
+    libxl__remus_device_nic *remus_nic = dev->data;
+
+    if (status)
+        rc = ERROR_FAIL;
+    else
+        rc = 0;
+
+    free_qdisc(remus_nic);
+
+    dev->callback(egc, dev, rc);
+}
+
+/* setup and teardown */
+
+/*
+ * the script needs the following env & args
+ * $vifname
+ * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/)
+ * $IFB (for teardown)
+ * setup/teardown as command line arg.
+ */
+static void setup_async_exec(libxl__async_exec_state *aes,
+                             char *op, libxl__remus_device *dev)
+{
+    int arraysize, nr = 0;
+    char **env = NULL, **args = NULL;
+    libxl__remus_device_nic *remus_nic = dev->data;
+    STATE_AO_GC(dev->rds->ao);
+
+    /* Convenience aliases */
+    libxl__remus_netbuf_state *ns = CTX->rns;
+    char *const script = libxl__strdup(gc, ns->netbufscript);
+    const uint32_t domid = ns->domid;
+    const int dev_id = remus_nic->devid;
+    const char *const vif = remus_nic->vif;
+    const char *const ifb = remus_nic->ifb;
+
+    arraysize = 7;
+    GCNEW_ARRAY(env, arraysize);
+    env[nr++] = "vifname";
+    env[nr++] = libxl__strdup(gc, vif);
+    env[nr++] = "XENBUS_PATH";
+    env[nr++] = GCSPRINTF("%s/remus/netbuf/%d",
+                          libxl__xs_libxl_path(gc, domid), dev_id);
+    if (!strcmp(op, "teardown") && ifb) {
+        env[nr++] = "IFB";
+        env[nr++] = libxl__strdup(gc, ifb);
+    }
+    env[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    arraysize = 3; nr = 0;
+    GCNEW_ARRAY(args, arraysize);
+    args[nr++] = script;
+    args[nr++] = op;
+    args[nr++] = NULL;
+    assert(nr == arraysize);
+
+    aes->ao = dev->rds->ao;
+    aes->what = GCSPRINTF("%s %s", args[0], args[1]);
+    aes->env = env;
+    aes->args = args;
+    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+    aes->stdfds[0] = -1;
+    aes->stdfds[1] = -1;
+    aes->stdfds[2] = -1;
+
+    if (!strcmp(op, "teardown"))
+        aes->callback = netbuf_teardown_script_cb;
+    else
+        aes->callback = netbuf_setup_script_cb;
+}
+
+static void nic_setup(libxl__remus_device *dev)
+{
+    int rc;
+    libxl__remus_device_nic *remus_nic;
+    const libxl_device_nic *nic = dev->backend_dev;
+
+    STATE_AO_GC(dev->rds->ao);
+
+    GCNEW(remus_nic);
+    dev->data = remus_nic;
+    remus_nic->devid = nic->devid;
+    remus_nic->vif = get_vifname(dev, nic);
+    if (!remus_nic->vif) {
+        rc = ERROR_FAIL;
+        goto out;
+    }
+
+    setup_async_exec(&dev->aes, "setup", dev);
+    rc = libxl__async_exec_start(gc, &dev->aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->callback(dev->rds->egc, dev, rc);
+}
+
+static void nic_teardown(libxl__remus_device *dev)
+{
+    int rc;
+    STATE_AO_GC(dev->rds->ao);
+
+    setup_async_exec(&dev->aes, "teardown", dev);
+
+    rc = libxl__async_exec_start(gc, &dev->aes);
+    if (rc)
+        goto out;
+
+    return;
+
+out:
+    dev->callback(dev->rds->egc, dev, rc);
+}
+
+const libxl__remus_device_ops remus_device_nic = {
+    .kind = LIBXL__REMUS_DEVICE_NIC,
+    .init = nic_init,
+    .destroy = nic_destroy,
+    .postsuspend = nic_postsuspend,
+    .commit = nic_commit,
+    .setup = nic_setup,
+    .teardown = nic_teardown,
+};
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
index 1c72a7f..5390274 100644
--- a/tools/libxl/libxl_nonetbuffer.c
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -22,6 +22,33 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
     return 0;
 }
 
+static void nic_match(const libxl__remus_device_ops *self,
+                      libxl__remus_device *dev)
+{
+    STATE_AO_GC(dev->rds->ao);
+
+    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+}
+
+static int nic_init(const libxl__remus_device_ops *self,
+                    libxl__remus_state *rs)
+{
+    return 0;
+}
+
+static void nic_destroy(const libxl__remus_device_ops *self,
+                        libxl__remus_state *rs)
+{
+    return;
+}
+
+const libxl__remus_device_ops remus_device_nic = {
+    .kind = LIBXL__REMUS_DEVICE_NIC,
+    .init = nic_init,
+    .destroy = nic_destroy,
+    .match = nic_match,
+};
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index cd71242..5ef11b9 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -17,7 +17,9 @@
 
 #include "libxl_internal.h"
 
+extern const libxl__remus_device_ops remus_device_nic;
 static const libxl__remus_device_ops *dev_ops[] = {
+    &remus_device_nic,
 };
 
 /*----- checkpointing APIs -----*/
@@ -227,6 +229,7 @@ static void device_teardown_cb(libxl__egc *egc,
                                libxl__remus_device *dev,
                                int rc)
 {
+    int i;
     libxl__remus_device_state *const rds = dev->rds;
     libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state);
 
@@ -236,6 +239,13 @@ static void device_teardown_cb(libxl__egc *egc,
     rds->num_set_up--;
 
     if (rds->num_set_up == 0) {
+        /* clean nic */
+        for (i = 0; i < rds->num_nics; i++)
+            libxl_device_nic_dispose(&rds->nics[i]);
+        free(rds->nics);
+        rds->nics = NULL;
+        rds->num_nics = 0;
+
         destroy_device_ops(rs);
         rs->callback(egc, rs, rs->saved_rc);
     }
@@ -243,7 +253,7 @@ static void device_teardown_cb(libxl__egc *egc,
 
 /* remus device setup and teardown */
 
-static __attribute__((unused)) void libxl__remus_device_init(libxl__egc *egc,
+static void libxl__remus_device_init(libxl__egc *egc,
                                      libxl__remus_device_state *rds,
                                      libxl__remus_device_kind kind,
                                      void *libxl_dev)
@@ -314,6 +324,7 @@ static void destroy_device_ops(libxl__remus_state *rs)
 
 void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
 {
+    int i;
     STATE_AO_GC(rs->ao);
 
     /* Convenience aliases */
@@ -332,7 +343,9 @@ void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
     rds->num_nics = 0;
     rds->num_disks = 0;
 
-    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+    /* TBD: Remus setup - i.e. enable disk buffering, etc */
+    if (rs->netbufscript)
+        rds->nics = libxl_device_nic_list(CTX, rs->domid, &rds->num_nics);
 
     if (rds->num_nics == 0 && rds->num_disks == 0)
         goto out;
@@ -340,6 +353,10 @@ void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
     GCNEW_ARRAY(rds->dev, rds->num_nics + rds->num_disks);
 
     /* TBD: CALL libxl__remus_device_init to init remus devices */
+    for (i = 0; i < rds->num_nics; i++) {
+        libxl__remus_device_init(egc, rds,
+                                 LIBXL__REMUS_DEVICE_NIC, &rds->nics[i]);
+    }
 
     return;
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index e56567f..e85a636 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -575,6 +575,8 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
     ("interval",     integer),
     ("blackhole",    bool),
     ("compression",  bool),
+    ("netbuf",       bool),
+    ("netbufscript", string),
     ])
 
 libxl_event_type = Enumeration("event_type", [
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v15 4/7] remus drbd: Implement remus drbd replicated disk
  2014-07-07  8:26 [PATCH v15 0/7] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (2 preceding siblings ...)
  2014-07-07  8:26 ` [PATCH v15 3/7] remus netbuffer: implement remus network buffering for nic devices Yang Hongyang
@ 2014-07-07  8:26 ` Yang Hongyang
  2014-07-07  8:26 ` [PATCH v15 5/7] libxl: network buffering cmdline switch Yang Hongyang
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Yang Hongyang @ 2014-07-07  8:26 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
	eddie.dong, rshriram, laijs

Implement remus-drbd-replicated-checkpointing-disk based on
generic remus devices framework.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/hotplug/Linux/Makefile         |   1 +
 tools/hotplug/Linux/block-drbd-probe |  85 +++++++++++++
 tools/libxl/Makefile                 |   2 +-
 tools/libxl/libxl_internal.h         |   3 +
 tools/libxl/libxl_remus_device.c     |  18 ++-
 tools/libxl/libxl_remus_disk_drbd.c  | 239 +++++++++++++++++++++++++++++++++++
 6 files changed, 345 insertions(+), 3 deletions(-)
 create mode 100755 tools/hotplug/Linux/block-drbd-probe
 create mode 100644 tools/libxl/libxl_remus_disk_drbd.c

diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index 721f8c0..15d1b37 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -24,6 +24,7 @@ XEN_SCRIPTS += xen-hotplug-cleanup
 XEN_SCRIPTS += external-device-migrate
 XEN_SCRIPTS += vscsi
 XEN_SCRIPTS += block-iscsi
+XEN_SCRIPTS += block-drbd-probe
 XEN_SCRIPTS += $(XEN_SCRIPTS-y)
 
 XEN_SCRIPT_DATA = xen-script-common.sh locking.sh logging.sh
diff --git a/tools/hotplug/Linux/block-drbd-probe b/tools/hotplug/Linux/block-drbd-probe
new file mode 100755
index 0000000..3a3d446
--- /dev/null
+++ b/tools/hotplug/Linux/block-drbd-probe
@@ -0,0 +1,85 @@
+#! /bin/bash
+#
+# Copyright (C) 2014 FUJITSU LIMITED
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of version 2.1 of the GNU Lesser General Public
+# License as published by the Free Software Foundation.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+#
+# Usage:
+#     block-drbd-probe devicename
+#
+# Return value:
+#     0: the device is drbd device
+#     1: the device is not drbd device
+#     2: unkown error
+#     3: the drbd device does not use protocol D
+#     4: the drbd device is not ready
+
+drbd_res=
+
+function get_res_name()
+{
+    local drbd_dev=$1
+    local drbd_dev_list=($(drbdadm sh-dev all))
+    local drbd_res_list=($(drbdadm sh-resource all))
+    local temp_drbd_dev temp_drbd_res
+    local found=0
+
+    for temp_drbd_dev in ${drbd_dev_list[@]}; do
+        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
+            found=1
+            break
+        fi
+    done
+
+    if [[ $found -eq 0 ]]; then
+        return 1
+    fi
+
+    for temp_drbd_res in ${drbd_res_list[@]}; do
+        temp_drbd_dev=$(drbdadm sh-dev $temp_drbd_res)
+        if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
+            drbd_res="$temp_drbd_res"
+            return 0
+        fi
+    done
+
+    # OOPS
+    return 2
+}
+
+get_res_name $1
+rc=$?
+if [[ $rc -ne 0 ]]; then
+    exit $rc
+fi
+
+# check protocol
+drbdsetup $1 show | grep -q "protocol D;"
+if [[ $? -ne 0 ]]; then
+    exit 3
+fi
+
+# check connect status
+state=$(drbdadm cstate "$drbd_res")
+if [[ "$state" != "Connected" ]]; then
+    exit 4
+fi
+
+# check role
+role=$(drbdadm role "$drbd_res")
+if [[ "$role" != "Primary/Secondary" ]]; then
+    exit 4
+fi
+
+exit 0
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 9c81a59..7a246af 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -56,7 +56,7 @@ else
 LIBXL_OBJS-y += libxl_nonetbuffer.o
 endif
 
-LIBXL_OBJS-y += libxl_remus_device.o
+LIBXL_OBJS-y += libxl_remus_device.o libxl_remus_disk_drbd.o
 
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 2fe36a6..2941329 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -312,6 +312,7 @@ struct libxl__gc {
 
 /* remus device ops specific structures start */
 typedef struct libxl__remus_netbuf_state libxl__remus_netbuf_state;
+typedef struct libxl__remus_drbd_state libxl__remus_drbd_state;
 /* remus device ops specific structures end */
 
 struct libxl__ctx {
@@ -381,6 +382,7 @@ struct libxl__ctx {
 
     /* remus device ops specific structures */
     libxl__remus_netbuf_state *rns;
+    libxl__remus_drbd_state *drbd_state;
 };
 
 typedef struct {
@@ -2585,6 +2587,7 @@ struct libxl__remus_device_state {
 
     libxl_device_nic *nics;
     int num_nics;
+    libxl_device_disk *disks;
     int num_disks;
 
     /* for counting devices that have been handled */
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index 5ef11b9..d940de1 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -18,8 +18,10 @@
 #include "libxl_internal.h"
 
 extern const libxl__remus_device_ops remus_device_nic;
+extern const libxl__remus_device_ops remus_device_drbd_disk;
 static const libxl__remus_device_ops *dev_ops[] = {
     &remus_device_nic,
+    &remus_device_drbd_disk,
 };
 
 /*----- checkpointing APIs -----*/
@@ -246,6 +248,13 @@ static void device_teardown_cb(libxl__egc *egc,
         rds->nics = NULL;
         rds->num_nics = 0;
 
+        /* clean disk */
+        for (i = 0; i < rds->num_disks; i++)
+            libxl_device_disk_dispose(&rds->disks[i]);
+        free(rds->disks);
+        rds->disks = NULL;
+        rds->num_disks = 0;
+
         destroy_device_ops(rs);
         rs->callback(egc, rs, rs->saved_rc);
     }
@@ -343,21 +352,26 @@ void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
     rds->num_nics = 0;
     rds->num_disks = 0;
 
-    /* TBD: Remus setup - i.e. enable disk buffering, etc */
     if (rs->netbufscript)
         rds->nics = libxl_device_nic_list(CTX, rs->domid, &rds->num_nics);
 
+    rds->disks = libxl_device_disk_list(CTX, rs->domid, &rds->num_disks);
+
     if (rds->num_nics == 0 && rds->num_disks == 0)
         goto out;
 
     GCNEW_ARRAY(rds->dev, rds->num_nics + rds->num_disks);
 
-    /* TBD: CALL libxl__remus_device_init to init remus devices */
     for (i = 0; i < rds->num_nics; i++) {
         libxl__remus_device_init(egc, rds,
                                  LIBXL__REMUS_DEVICE_NIC, &rds->nics[i]);
     }
 
+    for (i = 0; i < rds->num_disks; i++) {
+        libxl__remus_device_init(egc, rds,
+                                 LIBXL__REMUS_DEVICE_DISK, &rds->disks[i]);
+    }
+
     return;
 
 out:
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
new file mode 100644
index 0000000..aa6891f
--- /dev/null
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -0,0 +1,239 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author Lai Jiangshan <laijs@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+/*** drbd implementation ***/
+const int DRBD_SEND_CHECKPOINT = 20;
+const int DRBD_WAIT_CHECKPOINT_ACK = 30;
+
+struct libxl__remus_drbd_state {
+    libxl__ao *ao;
+    char *drbd_probe_script;
+};
+
+typedef struct libxl__remus_drbd_disk {
+    libxl__remus_device remus_dev;
+    int ctl_fd;
+    int ackwait;
+    const char *path;
+} libxl__remus_drbd_disk;
+
+/*----- helper functions, for async calls -----*/
+static void drbd_async_call(libxl__remus_device *dev,
+                            void func(libxl__remus_device *),
+                            libxl__ev_child_callback callback)
+{
+    int pid = -1;
+    STATE_AO_GC(dev->rds->ao);
+
+    /* Fork and call */
+    pid = libxl__ev_child_fork(gc, &dev->child, callback);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork");
+        goto out;
+    }
+
+    if (!pid) {
+        /* child */
+        func(dev);
+        /* notreached */
+        abort();
+    }
+
+    return;
+
+out:
+    dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+}
+
+/*----- init() and destroy() -----*/
+static int drbd_init(const libxl__remus_device_ops *self,
+                     libxl__remus_state *rs)
+{
+    libxl__remus_drbd_state *drbd_state;
+
+    STATE_AO_GC(rs->ao);
+
+    GCNEW(drbd_state);
+    CTX->drbd_state = drbd_state;
+    drbd_state->ao = ao;
+    drbd_state->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
+                                              libxl__xen_script_dir_path());
+
+
+    return 0;
+}
+
+static void drbd_destroy(const libxl__remus_device_ops *self,
+                         libxl__remus_state *rs)
+{
+    return;
+}
+
+/*----- checkpointing APIs -----*/
+
+/* callbacks */
+
+static void chekpoint_async_call_done(libxl__egc *egc,
+                                      libxl__ev_child *child,
+                                      pid_t pid, int status)
+{
+    libxl__remus_device *dev = CONTAINER_OF(child, *dev, child);
+    libxl__remus_drbd_disk *rdd = dev->data;
+    STATE_AO_GC(dev->rds->ao);
+
+    if (WIFEXITED(status)) {
+        rdd->ackwait = WEXITSTATUS(status);
+        dev->callback(egc, dev, 0);
+    } else {
+        dev->callback(egc, dev, ERROR_FAIL);
+    }
+}
+
+/* API implementations */
+
+/* this op will not wait and block, so implement as sync op */
+static void drbd_postsuspend(libxl__remus_device *dev)
+{
+    libxl__remus_drbd_disk *rdd = dev->data;
+
+    if (!rdd->ackwait) {
+        if (ioctl(rdd->ctl_fd, DRBD_SEND_CHECKPOINT, 0) <= 0)
+            rdd->ackwait = 1;
+    }
+
+    dev->callback(dev->rds->egc, dev, 0);
+}
+
+static void drbd_preresume_async(libxl__remus_device *dev)
+{
+    libxl__remus_drbd_disk *rdd = dev->data;
+    int ackwait = rdd->ackwait;
+
+    if (ackwait) {
+        ioctl(rdd->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
+        ackwait = 0;
+    }
+
+    _exit(ackwait);
+}
+
+static void drbd_preresume(libxl__remus_device *dev)
+{
+    drbd_async_call(dev, drbd_preresume_async, chekpoint_async_call_done);
+}
+
+/*----- main flow of control -----*/
+
+/* callbacks */
+
+static void match_async_exec_cb(libxl__egc *egc,
+                                libxl__async_exec_state *aes,
+                                int status)
+{
+    libxl__remus_device *dev = CONTAINER_OF(aes, *dev, aes);
+
+    if (status) {
+        dev->callback(egc, dev, ERROR_REMUS_DEVOPS_NOT_MATCH);
+    } else {
+        dev->callback(egc, dev, 0);
+    }
+}
+
+/* implementations, match, setup and teardown */
+
+static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev)
+{
+    int arraysize, nr = 0;
+    const libxl_device_disk *disk = dev->backend_dev;
+    libxl__async_exec_state *aes = &dev->aes;
+    STATE_AO_GC(dev->rds->ao);
+
+    libxl__remus_drbd_state *drbd_state = CTX->drbd_state;
+    /* setup env & args */
+    arraysize = 1;
+    GCNEW_ARRAY(aes->env, arraysize);
+    aes->env[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    arraysize = 3;
+    nr = 0;
+    GCNEW_ARRAY(aes->args, arraysize);
+    aes->args[nr++] = drbd_state->drbd_probe_script;
+    aes->args[nr++] = disk->pdev_path;
+    aes->args[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    aes->ao = drbd_state->ao;
+    aes->what = GCSPRINTF("%s %s", aes->args[0], aes->args[1]);
+    aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+    aes->callback = match_async_exec_cb;
+    aes->stdfds[0] = -1;
+    aes->stdfds[1] = -1;
+    aes->stdfds[2] = -1;
+
+    if (libxl__async_exec_start(gc, aes))
+        goto out;
+
+    return;
+
+out:
+    dev->callback(egc, dev, ERROR_FAIL);
+}
+
+static void drbd_match(const libxl__remus_device_ops *self,
+                       libxl__remus_device *dev)
+{
+    match_async_exec(dev->rds->egc, dev);
+}
+
+static void drbd_setup(libxl__remus_device *dev)
+{
+    libxl__remus_drbd_disk *drbd_disk;
+    const libxl_device_disk *disk = dev->backend_dev;
+    STATE_AO_GC(dev->rds->ao);
+
+    GCNEW(drbd_disk);
+    dev->data = drbd_disk;
+    drbd_disk->path = disk->pdev_path;
+    drbd_disk->ackwait = 0;
+    drbd_disk->ctl_fd = open(drbd_disk->path, O_RDONLY);
+    if (drbd_disk->ctl_fd < 0)
+        dev->callback(dev->rds->egc, dev, ERROR_FAIL);
+    else
+        dev->callback(dev->rds->egc, dev, 0);
+}
+
+static void drbd_teardown(libxl__remus_device *dev)
+{
+    libxl__remus_drbd_disk *drbd_disk = dev->data;
+
+    close(drbd_disk->ctl_fd);
+    dev->callback(dev->rds->egc, dev, 0);
+}
+
+const libxl__remus_device_ops remus_device_drbd_disk = {
+    .kind = LIBXL__REMUS_DEVICE_DISK,
+    .init = drbd_init,
+    .destroy = drbd_destroy,
+    .postsuspend = drbd_postsuspend,
+    .preresume = drbd_preresume,
+    .match = drbd_match,
+    .setup = drbd_setup,
+    .teardown = drbd_teardown,
+};
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v15 5/7] libxl: network buffering cmdline switch
  2014-07-07  8:26 [PATCH v15 0/7] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (3 preceding siblings ...)
  2014-07-07  8:26 ` [PATCH v15 4/7] remus drbd: Implement remus drbd replicated disk Yang Hongyang
@ 2014-07-07  8:26 ` Yang Hongyang
  2014-07-07  8:26 ` [PATCH v15 6/7] libxl: disk " Yang Hongyang
  2014-07-07  8:26 ` [PATCH v15 7/7] MAINTAINERS: Update maintained files of REMUS Yang Hongyang
  6 siblings, 0 replies; 20+ messages in thread
From: Yang Hongyang @ 2014-07-07  8:26 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
	eddie.dong, rshriram, laijs

Command line switch to 'xl remus' command.
Also update man pages to reflect the addition of a new option to
'xl remus' command.

Note: the network buffering is enabled as default. If you want to
disable it, please use -n option.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
 docs/man/xl.conf.pod.5    |  6 ++++++
 docs/man/xl.pod.1         | 11 ++++++++++-
 tools/libxl/xl.c          |  4 ++++
 tools/libxl/xl.h          |  1 +
 tools/libxl/xl_cmdimpl.c  | 28 ++++++++++++++++++++++------
 tools/libxl/xl_cmdtable.c |  3 +++
 6 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
index 7c43bde..8ae19bb 100644
--- a/docs/man/xl.conf.pod.5
+++ b/docs/man/xl.conf.pod.5
@@ -105,6 +105,12 @@ Configures the default gateway device to set for virtual network devices.
 
 Default: C<None>
 
+=item B<remus.default.netbufscript="PATH">
+
+Configures the default script used by Remus to setup network buffering.
+
+Default: C</etc/xen/scripts/remus-netbuf-setup>
+
 =item B<output_format="json|sxp">
 
 Configures the default output format used by xl when printing "machine
diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 30bd4bf..e9e4a83 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -431,7 +431,7 @@ Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
 mechanism between the two hosts.
 
 N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
-     There is no support for network or disk buffering at the moment.
+     Disk buffering support is limited to drbd disks.
 
 B<OPTIONS>
 
@@ -450,6 +450,15 @@ Generally useful for debugging.
 
 Disable memory checkpoint compression.
 
+=item B<-n>
+
+Disable network output buffering.
+
+=item B<-N> I<netbufscript>
+
+Use <netbufscript> to setup network buffering instead of the instead of
+the default (/etc/xen/scripts/remus-netbuf-setup).
+
 =item B<-s> I<sshcommand>
 
 Use <sshcommand> instead of ssh.  String will be passed to sh.
diff --git a/tools/libxl/xl.c b/tools/libxl/xl.c
index 4c5a5ee..f014306 100644
--- a/tools/libxl/xl.c
+++ b/tools/libxl/xl.c
@@ -44,6 +44,7 @@ char *default_vifscript = NULL;
 char *default_bridge = NULL;
 char *default_gatewaydev = NULL;
 char *default_vifbackend = NULL;
+char *default_remus_netbufscript = NULL;
 enum output_format default_output_format = OUTPUT_FORMAT_JSON;
 int claim_mode = 1;
 bool progress_use_cr = 0;
@@ -176,6 +177,9 @@ static void parse_global_config(const char *configfile,
     if (!xlu_cfg_get_long (config, "claim_mode", &l, 0))
         claim_mode = l;
 
+    xlu_cfg_replace_string (config, "remus.default.netbufscript",
+        &default_remus_netbufscript, 0);
+
     xlu_cfg_destroy(config);
 }
 
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 10a2e66..087eb8c 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -170,6 +170,7 @@ extern char *default_vifscript;
 extern char *default_bridge;
 extern char *default_gatewaydev;
 extern char *default_vifbackend;
+extern char *default_remus_netbufscript;
 extern char *blkdev_start;
 
 enum output_format {
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 68df548..b324f34 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7175,8 +7175,9 @@ int main_remus(int argc, char **argv)
     r_info.interval = 200;
     r_info.blackhole = 0;
     r_info.compression = 1;
+    r_info.netbuf = 1;
 
-    SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "buni:s:N:e", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
@@ -7186,6 +7187,12 @@ int main_remus(int argc, char **argv)
     case 'u':
         r_info.compression = 0;
         break;
+    case 'n':
+        r_info.netbuf = 0;
+        break;
+    case 'N':
+        r_info.netbufscript = optarg;
+        break;
     case 's':
         ssh_command = optarg;
         break;
@@ -7197,6 +7204,9 @@ int main_remus(int argc, char **argv)
     domid = find_domain(argv[optind]);
     host = argv[optind + 1];
 
+    if (!r_info.netbufscript)
+        r_info.netbufscript = default_remus_netbufscript;
+
     if (r_info.blackhole) {
         send_fd = open("/dev/null", O_RDWR, 0644);
         if (send_fd < 0) {
@@ -7234,13 +7244,19 @@ int main_remus(int argc, char **argv)
     /* Point of no return */
     rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd, 0);
 
-    /* If we are here, it means backup has failed/domain suspend failed.
-     * Try to resume the domain and exit gracefully.
-     * TODO: Split-Brain check.
+    /* check if the domain exists. User may have xl destroyed the
+     * domain to force failover
      */
-    fprintf(stderr, "remus sender: libxl_domain_suspend failed"
-            " (rc=%d)\n", rc);
+    if (libxl_domain_info(ctx, 0, domid)) {
+        fprintf(stderr, "Remus: Primary domain has been destroyed.\n");
+        close(send_fd);
+        return 0;
+    }
 
+    /* If we are here, it means remus setup/domain suspend/backup has
+     * failed. Try to resume the domain and exit gracefully.
+     * TODO: Split-Brain check.
+     */
     if (rc == ERROR_GUEST_TIMEDOUT)
         fprintf(stderr, "Failed to suspend domain at primary.\n");
     else {
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 4279b9f..3f7520d 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -487,6 +487,9 @@ struct cmd_spec cmd_table[] = {
       "-i MS                   Checkpoint domain memory every MS milliseconds (def. 200ms).\n"
       "-b                      Replicate memory checkpoints to /dev/null (blackhole)\n"
       "-u                      Disable memory checkpoint compression.\n"
+      "-n                      Disable network output buffering.\n"
+      "-N <netbufscript>       Use netbufscript to setup network buffering instead of the\n"
+      "                        instead of the default (/etc/xen/scripts/remus-netbuf-setup).\n"
       "-s <sshcommand>         Use <sshcommand> instead of ssh.  String will be passed\n"
       "                        to sh. If empty, run <host> instead of \n"
       "                        ssh <host> xl migrate-receive -r [-e]\n"
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v15 6/7] libxl: disk buffering cmdline switch
  2014-07-07  8:26 [PATCH v15 0/7] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (4 preceding siblings ...)
  2014-07-07  8:26 ` [PATCH v15 5/7] libxl: network buffering cmdline switch Yang Hongyang
@ 2014-07-07  8:26 ` Yang Hongyang
  2014-07-07  8:26 ` [PATCH v15 7/7] MAINTAINERS: Update maintained files of REMUS Yang Hongyang
  6 siblings, 0 replies; 20+ messages in thread
From: Yang Hongyang @ 2014-07-07  8:26 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
	eddie.dong, rshriram, laijs

disk buffering is enabled as default.
add an option "-d" to disable it.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 docs/man/xl.pod.1                | 4 ++++
 tools/libxl/libxl.c              | 3 +++
 tools/libxl/libxl_internal.h     | 1 +
 tools/libxl/libxl_remus_device.c | 3 ++-
 tools/libxl/libxl_types.idl      | 1 +
 tools/libxl/xl_cmdimpl.c         | 6 +++++-
 tools/libxl/xl_cmdtable.c        | 1 +
 7 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index e9e4a83..547e8a2 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -459,6 +459,10 @@ Disable network output buffering.
 Use <netbufscript> to setup network buffering instead of the instead of
 the default (/etc/xen/scripts/remus-netbuf-setup).
 
+=item B<-d>
+
+Disable disk buffering.
+
 =item B<-s> I<sshcommand>
 
 Use <sshcommand> instead of ssh.  String will be passed to sh.
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index b7d62c1..331689f 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -806,6 +806,9 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
         }
     }
 
+    /* Setup disk buffering */
+    rs->diskbuf = info->diskbuf;
+
     rs->ao = ao;
     rs->domid = domid;
     rs->saved_rc = 0;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 2941329..ebe486b 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2644,6 +2644,7 @@ struct libxl__remus_state {
     libxl__remus_callback *callback;
     /* Script to setup/teardown network buffers */
     const char *netbufscript;
+    bool diskbuf;
 
     /* private */
     int saved_rc;
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index d940de1..c542833 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -355,7 +355,8 @@ void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
     if (rs->netbufscript)
         rds->nics = libxl_device_nic_list(CTX, rs->domid, &rds->num_nics);
 
-    rds->disks = libxl_device_disk_list(CTX, rs->domid, &rds->num_disks);
+    if (rs->diskbuf)
+        rds->disks = libxl_device_disk_list(CTX, rs->domid, &rds->num_disks);
 
     if (rds->num_nics == 0 && rds->num_disks == 0)
         goto out;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index e85a636..1056702 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -577,6 +577,7 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
     ("compression",  bool),
     ("netbuf",       bool),
     ("netbufscript", string),
+    ("diskbuf",      bool),
     ])
 
 libxl_event_type = Enumeration("event_type", [
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index b324f34..167b65b 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7176,8 +7176,9 @@ int main_remus(int argc, char **argv)
     r_info.blackhole = 0;
     r_info.compression = 1;
     r_info.netbuf = 1;
+    r_info.diskbuf = 1;
 
-    SWITCH_FOREACH_OPT(opt, "buni:s:N:e", NULL, "remus", 2) {
+    SWITCH_FOREACH_OPT(opt, "bundi:s:N:e", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
@@ -7193,6 +7194,9 @@ int main_remus(int argc, char **argv)
     case 'N':
         r_info.netbufscript = optarg;
         break;
+    case 'd':
+        r_info.diskbuf = 0;
+        break;
     case 's':
         ssh_command = optarg;
         break;
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 3f7520d..246aa11 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -490,6 +490,7 @@ struct cmd_spec cmd_table[] = {
       "-n                      Disable network output buffering.\n"
       "-N <netbufscript>       Use netbufscript to setup network buffering instead of the\n"
       "                        instead of the default (/etc/xen/scripts/remus-netbuf-setup).\n"
+      "-d                      Disable disk buffering.\n"
       "-s <sshcommand>         Use <sshcommand> instead of ssh.  String will be passed\n"
       "                        to sh. If empty, run <host> instead of \n"
       "                        ssh <host> xl migrate-receive -r [-e]\n"
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v15 7/7] MAINTAINERS: Update maintained files of REMUS
  2014-07-07  8:26 [PATCH v15 0/7] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
                   ` (5 preceding siblings ...)
  2014-07-07  8:26 ` [PATCH v15 6/7] libxl: disk " Yang Hongyang
@ 2014-07-07  8:26 ` Yang Hongyang
  6 siblings, 0 replies; 20+ messages in thread
From: Yang Hongyang @ 2014-07-07  8:26 UTC (permalink / raw)
  To: xen-devel
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
	eddie.dong, rshriram, laijs

Update maintained files of REMUS.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 MAINTAINERS | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 266e47b..9c407dc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -262,6 +262,11 @@ S:	Maintained
 F:	docs/README.remus
 F:	tools/blktap2/drivers/block-remus.c
 F:	tools/blktap2/drivers/hashtable*
+F:	tools/libxl/libxl_remus_*
+F:	tools/libxl/libxl_netbuffer.c
+F:	tools/libxl/libxl_nonetbuffer.c
+F:	tools/hotplug/Linux/remus-netbuf-setup
+F:	tools/hotplug/Linux/block-drbd-probe
 
 SCHEDULING
 M:	George Dunlap <george.dunlap@eu.citrix.com>
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v15 2/7] remus: introduce remus device
  2014-07-07  8:26 ` [PATCH v15 2/7] remus: introduce remus device Yang Hongyang
@ 2014-07-09 17:26   ` Ian Jackson
  2014-07-10  2:18     ` Hongyang Yang
  2014-07-11  8:18     ` Hongyang Yang
  0 siblings, 2 replies; 20+ messages in thread
From: Ian Jackson @ 2014-07-09 17:26 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: laijs, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, rshriram, ian.campbell

Yang Hongyang writes ("[PATCH v15 2/7] remus: introduce remus device"):
> introduce remus device, an abstract layer of remus devices(nic, disk,
> etc).It provides the following APIs for libxl:
>   >libxl__remus_device_setup
>     setup remus devices, like attach qdisc, enable disk buffering, etc
>   >libxl__remus_device_teardown
>     teardown devices
>   >libxl__remus_device_postsuspend
>   >libxl__remus_device_preresume
>   >libxl__remus_device_commit
>     above three are for checkpoint.

Sorry to mention this now, but I think it would be clearer if all
these libxl__remus_device_* functions which manipulate _all_ the
devices for a domain used the plural of "device", ie
libxl__remus_devices_setup, etc.

> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 39f1c28..bcbd02b 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -733,6 +733,31 @@ out:
>  static void remus_failover_cb(libxl__egc *egc,
>                                libxl__domain_suspend_state *dss, int rc);
>  
> +static void libxl__remus_setup_failed(libxl__egc *egc,
> +                                      libxl__remus_state *rs, int rc)
> +{
> +    STATE_AO_GC(rs->ao);
> +    libxl__ao_complete(egc, ao, rc);
> +}
> +
> +static void libxl__remus_setup_done(libxl__egc *egc,
> +                                    libxl__remus_state *rs, int rc)

This callback appears after the code which sets it up, which naturally
executes first.  Things should be in chronological order, as I said.

> +{
> +    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
> +    STATE_AO_GC(rs->ao);
> +
> +    if (!rc) {
> +        libxl__domain_suspend(egc, dss);
> +        return;
> +    }
> +
> +    LOG(ERROR, "Remus: failed to setup device for guest with domid %u",
> +        dss->domid);

This log message should report the rc.

> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 83eb29a..8cc90dc 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -1454,6 +1454,17 @@ static void libxl__domain_suspend_callback(void *data)
>      domain_suspend_callback_common(egc, dss);
>  }
>  
> +static void remus_device_postsuspend_cb(libxl__egc *egc,
> +                                        libxl__remus_state *rs, int rc)
> +{

I can't quite tell, but isn't this code not in chronological order ?

> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index f8441f6..8ef20a0 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
...
> +/*
> + * This structure is for remus device layer, it records remus devices
> + * that have been setuped.
> + */

Still lots of uses of "setuped".  Correct would be "devices that have
been set up".

> +/*----- checkpointing APIs -----*/
> +
> +/* callbacks */
> +
> +static void device_common_cb(libxl__egc *egc,
> +                             libxl__remus_device *dev,
> +                             int rc)
> +{

I asked for things to be in chronological order in the file.  I think
this function therefore ought to be below these checkpoint functions.

Also its name isn't very suggestive.  Perhaps it should have
"checkpoint" in the name ?

> +/* API implementations */
> +
> +void libxl__remus_device_postsuspend(libxl__egc *egc, libxl__remus_state *rs)
...
> +void libxl__remus_device_preresume(libxl__egc *egc, libxl__remus_state *rs)
...
> +void libxl__remus_device_commit(libxl__egc *egc, libxl__remus_state *rs)

These three functions are nearly identical.  They should be combined
somehow.  Perhaps there should be a single function which takes the
offset of the per-device operation vtable member as a parameter.  Or
perhaps the three functions should be generated by a macro.  Currently
I'm leaning towards the former.

> +    if(rds->num_set_up == 0)
         ^
Coding style.

> +    for (i = 0; i < rds->num_set_up; i++) {
> +        dev = rds->dev[i];
> +        dev->callback = device_common_cb;
> +        if (dev->ops->commit) {
> +            dev->ops->commit(dev);
> +        } else {
> +            rds->num_devices++;
> +            if (rds->num_devices == rds->num_set_up)
> +                rs->callback(egc, rs, rs->saved_rc);

Why not have this call device_common_cb rather than open-coding it ?

> +/*----- main flow of control -----*/

This is for setup, right ?  Or teardown ?  Or both ?

> +/* callbacks */
> +
> +static void device_setup_cb(libxl__egc *egc,
> +                            libxl__remus_device *dev,
> +                            int rc);
> +static void device_match_cb(libxl__egc *egc,
> +                            libxl__remus_device *dev,
> +                            int rc)
> +{

This is still not in chronological order.  You must put callbacks
after the things that set them up, since obviously the callback
happens after it is set up.

Seeing things in the wrong order makes it hard to see what's going on.
In the interests of trying to reduce the number of iterations of this
series I'm going to do a kind of peephole review of what I see here;
you should bear in mind that I'm not always trying to understand what
the purpose of these functions is so I may have structural comments on
your next version.

> +    libxl__remus_device_state *const rds = dev->rds;
> +    libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state);
> +
> +    STATE_AO_GC(rs->ao);
> +
> +    if (rc) {
> +        if (++dev->ops_index >= ARRAY_SIZE(dev_ops) ||

This use of ++ looks confusing to me.

> +            rc != ERROR_REMUS_DEVOPS_NOT_MATCH) {
> +            /* the device can not be matched */
> +            rds->num_devices++;
> +            rs->saved_rc = ERROR_FAIL;

Please can we not return more ERROR_FAIL.  Instead, why not return
ERROR_REMUS_DEVICE_NOT_SUPPORTED as I suggested before ?

> +    rds->num_devices++;
> +    /*
> +     * we add devices that have been setuped to the array no matter
> +     * the setup process succeed or failed because we need to ensure
> +     * the device been teardown while setup failed. If any of the
> +     * device setup failed, we will quit remus, but before we exit,
> +     * we will teardown the devices that have been added to **dev
> +     */
> +    rds->dev[rds->num_set_up++] = dev;
> +    if (rc) {
> +        /* setup failed */
> +        rs->saved_rc = ERROR_FAIL;
> +    }

This doesn't look right.  If the setup fails, presumably we shouldn't
put the device in the array ?  If we do it will presumably be torn
down later.

> +static void destroy_device_ops(libxl__remus_state *rs);
> +static void device_teardown_cb(libxl__egc *egc,
> +                               libxl__remus_device *dev,
> +                               int rc)
> +{
> +    libxl__remus_device_state *const rds = dev->rds;
> +    libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state);
> +
> +    STATE_AO_GC(rs->ao);
> +
> +    /* ignore teardown errors to teardown as many devs as possible*/
> +    rds->num_set_up--;

We should at least preserve the rc so that we don't pretend that
everything worked if it didn't.

You may need to invent a new teardown rc variable, and/or add logging,
so that you get the right error code out, and the right log messages,
if a setup fails and causes a teardown which also fails.

> +static int init_device_ops(libxl__remus_state *rs)
> +{
> +    int i, rc;
> +    const libxl__remus_device_ops *ops;
> +
> +    for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
> +        ops = dev_ops[i];
> +        if (ops->init(ops, rs)) {
> +            rc = ERROR_FAIL;

In general, you should preserve the rc, not turn everything into
ERROR_FAIL.

I still don't understand why libxl__remus_device_state is not part of
libxl__remus_state.

I wonder if you need a different name for "the abstract thing which is
represented by a libxl__remus_device_ops".  Since each ops only deals
with one kind (but perhaps a kind may have several ops), I suggest
calling them "subkind"s or "devsubkinds" or something.  At the moment
you seem to be calling them "ops".

As a result I think this function is poorly named.  It doesn't
initialise any ops.  It initialises each subkind.  Perhaps it should
be called init_device_subkind.

You might consider whether libxl__remus_device_ops should be
libxl__remus_subkind_ops perhaps or libxl__remus_devsubkind_ops or
libxl__remus_device_subkind_ops.

> +void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
> +{
> +    STATE_AO_GC(rs->ao);
> +
> +    /* Convenience aliases */
> +    libxl__remus_device_state *const rds = &rs->dev_state;
> +
> +    if (ARRAY_SIZE(dev_ops) == 0)
> +        goto out;

Why is this special case necessary ?

> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index de25f42..e56567f 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -44,6 +44,7 @@ libxl_error = Enumeration("error", [
>      (-12, "OSEVENT_REG_FAIL"),
>      (-13, "BUFFERFULL"),
>      (-14, "UNKNOWN_CHILD"),
> +    (-15, "REMUS_DEVOPS_NOT_MATCH"),

I don't object to this name for this exact case and it's fine to have
a specific error code.

But if you are going to make the whole machinery return
ERROR_REMUS_DEVICE_NOT_SUPPORTED if no subkind matches, then you can
reuse that error code for the subkind methods.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v15 3/7] remus netbuffer: implement remus network buffering for nic devices
  2014-07-07  8:26 ` [PATCH v15 3/7] remus netbuffer: implement remus network buffering for nic devices Yang Hongyang
@ 2014-07-09 23:15   ` Ian Jackson
  2014-07-10  3:03     ` Hongyang Yang
  2014-07-10 12:53     ` Ian Campbell
  0 siblings, 2 replies; 20+ messages in thread
From: Ian Jackson @ 2014-07-09 23:15 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
	xen-devel, eddie.dong, rshriram, laijs

Yang Hongyang writes ("[PATCH v15 3/7] remus netbuffer: implement remus network buffering for nic devices"):
> diff --git a/tools/hotplug/Linux/remus-netbuf-setup b/tools/hotplug/Linux/remus-netbuf-setup
> new file mode 100644
> index 0000000..58c46f3
> --- /dev/null
> +++ b/tools/hotplug/Linux/remus-netbuf-setup
> @@ -0,0 +1,203 @@
> +#!/bin/bash
> +#============================================================================
> +# ${XEN_SCRIPT_DIR}/remus-netbuf-setup
...
> +# IFB         ifb interface to be cleaned up (required). [for teardown op only]

It occurs to me to ask: Is it wise to use such a bare name for the
variable IFB ?  I would suggest that it might be better to use a
variable with XEN or REMUS or something in it.  Otherwise perhaps
someone somewhere might think IFB stood for something else and would
set it to something irrelevant (or your setting of it might break
their usage...)

I had some other comments about this script in v10, which we were
having a conversation about.  I'm afraid I seem to have dropped the
thread of that.  I will reply separately in that thread.

> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index bcbd02b..b7d62c1 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -788,6 +788,24 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
...
> +    /* Setup network buffering */

This comment is misleading.  This code doesn't actually set up any
network buffering.  It doesn't call the setup script.  All it does is
some checks and computing the script path.

It would probably be better to simply remove the comment.  Or you
could replace it with something more accurate.

> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 8ef20a0..2fe36a6 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -310,6 +310,10 @@ struct libxl__gc {
>      libxl_ctx *owner;
>  };
>  
> +/* remus device ops specific structures start */
> +typedef struct libxl__remus_netbuf_state libxl__remus_netbuf_state;
> +/* remus device ops specific structures end */

I don't think these comments add anything.  You should move these
struct typedefs alongside the other struct typedefs (libxl__ao et al).
There is then no need to comment what they are.

>  struct libxl__ctx {
>      xentoollog_logger *lg;
>      xc_interface *xch;
> @@ -374,6 +378,9 @@ struct libxl__ctx {
>      LIBXL_LIST_ENTRY(libxl_ctx) sigchld_users_entry;
>  
>      libxl_version_info version_info;
> +
> +    /* remus device ops specific structures */
> +    libxl__remus_netbuf_state *rns;

Again, I think this comment adds nothing that isn't in the type name.
There is no need to say in a comment anything that is plain from the
code.

But before you delete it (and other comments that I'm going to quibble
about), please wait for a comment from the other tools maintainer, Ian
Campbell, who may disagree.

> @@ -2631,6 +2639,8 @@ struct libxl__remus_state {
>      libxl__ao *ao;
>      uint32_t domid;
>      libxl__remus_callback *callback;
> +    /* Script to setup/teardown network buffers */
> +    const char *netbufscript;

And again.

> diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
> index 52d593c..025ee89 100644
> --- a/tools/libxl/libxl_netbuffer.c
> +++ b/tools/libxl/libxl_netbuffer.c
> @@ -17,11 +17,486 @@
>  
>  #include "libxl_internal.h"
>  
> +#include <netlink/cache.h>
> +#include <netlink/socket.h>
> +#include <netlink/attr.h>
> +#include <netlink/route/link.h>
> +#include <netlink/route/route.h>
> +#include <netlink/route/qdisc.h>
> +#include <netlink/route/qdisc/plug.h>
> +
> +struct libxl__remus_netbuf_state {
> +    libxl__ao *ao;
> +    uint32_t domid;
> +    const char *netbufscript;

Why is there a copy of netbufscript here ?

> +/*----- init() and destroy() -----*/
> +
> +static int nic_init(const libxl__remus_device_ops *self,
> +                    libxl__remus_state *rs)
> +{

This function seems to leak the things it allocates, on error.
The `out' section should probably call destroy.

> +    ns->nlsock = nl_socket_alloc();
> +    if (!ns->nlsock) {
> +        LOG(ERROR, "cannot allocate nl socket");
> +        goto out;
> +    }

Last time I commented on the error handling in this patch.  I see that
you have added an nl_geterror in some places.

When I make a comment like that, you need to apply it whereever
relevant in the code, and to the whole patch series.

> +static void nic_destroy(const libxl__remus_device_ops *self,
> +                        libxl__remus_state *rs)
...
> +    /* free qdisc cache */
> +    if (ns->qdisc_cache) {
> +        nl_cache_clear(ns->qdisc_cache);
> +        nl_cache_free(ns->qdisc_cache);

Can these functions fail ?  If so, what does it mean and what should
the program do ?

> +/*----- checkpointing APIs -----*/
> +
> +/* The buffer_op's value, not the value passed to kernel */

I would say "The value of buffer_op, not the value passed to kernel".

> +static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
> +                           libxl__remus_netbuf_state *netbuf_state,
> +                           int buffer_op)
> +{
> +    int rc;
> +
> +    STATE_AO_GC(netbuf_state->ao);
> +
> +    if (buffer_op == tc_buffer_start)
> +        rc = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
> +    else
> +        rc = rtnl_qdisc_plug_release_one(remus_nic->qdisc);

Error handling again.

Also do not put anything other than a libxl error code in a variable
called "rc".

> +/*----- main flow of control -----*/
> +
> +/* helper functions */
> +
> +/*
> + * If the device has a vifname, then use that instead of
> + * the vifX.Y format.
> + * it must ONLY be used for remus because if driver domains
> + * were in use it would constitute a security vulnerability.
> + */
> +static const char *get_vifname(libxl__remus_device *dev,
> +                               const libxl_device_nic *nic)
> +{
...
> +    path = libxl__sprintf(gc, "%s/backend/vif/%d/%d/vifname",
> +                          libxl__xs_get_dompath(gc, 0), domid, nic->devid);

Please use GCSPRINTF where applicable.

> +    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
> +    if (!rc && !vifname) {
> +        /* use the default name */

This comment adds nothing to the comment above.

> +        vifname = libxl__device_nic_devname(gc, domid,
> +                                            nic->devid,
> +                                            nic->nictype);

> +    rc = ERROR_FAIL;
> +    ifindex = rtnl_link_get_ifindex(ifb);
> +    if (!ifindex) {
> +        LOG(ERROR, "interface %s has no index", remus_nic->ifb);
> +        goto out;

How does rtnl_link_get_ifindex report its error code ?  Or can it fail
only one way ?

> +    qdisc = rtnl_qdisc_get_by_parent(netbuf_state->qdisc_cache, ifindex,
> +                                     TC_H_ROOT);
> +
> +    if (qdisc) {
> +        const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
> +        /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
> +        if (!tc_kind || strcmp(tc_kind, "plug")) {
> +            nl_object_put((struct nl_object *)qdisc);

This freeing of qdisc should be in the `out' section.

You can do this either by arranging that the `out' section is not used
at all on successful completion, or by saying something like
    if (rc) {
        if (qdisc) nl_object_put((struct nl_object *)qdisc);
    }

> +            LOG(ERROR, "plug qdisc is not installed on %s", remus_nic->ifb);
> +            goto out;
> +        }
> +        remus_nic->qdisc = qdisc;
> +        rc = 0;
> +    } else {
> +        LOG(ERROR, "Cannot get qdisc handle from ifb %s", remus_nic->ifb);

This path out of the function seems not to set rc.

... In fact, I see that you initialise rc to ERROR_FAIL above.  IMO
this is not a good programming paradigm.  rc should be set, obviously,
on every exit path, at the point the error is generated.

> +/*
> + * In return, the script writes the name of IFB device (during setup) to be
> + * used for output buffering into XENBUS_PATH/ifb
> + */
> +static void netbuf_setup_script_cb(libxl__egc *egc,
> +                                   libxl__async_exec_state *aes,
> +                                   int status)

This callback is not in chronological order.

I'm afraid I've run out of time now so I'm going to stop here.  I hope
these comments have been helpful.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v15 2/7] remus: introduce remus device
  2014-07-09 17:26   ` Ian Jackson
@ 2014-07-10  2:18     ` Hongyang Yang
  2014-07-10 17:34       ` Ian Jackson
  2014-07-11  8:18     ` Hongyang Yang
  1 sibling, 1 reply; 20+ messages in thread
From: Hongyang Yang @ 2014-07-10  2:18 UTC (permalink / raw)
  To: Ian Jackson
  Cc: laijs, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, rshriram, ian.campbell



On 07/10/2014 01:26 AM, Ian Jackson wrote:
> Yang Hongyang writes ("[PATCH v15 2/7] remus: introduce remus device"):
>> introduce remus device, an abstract layer of remus devices(nic, disk,
>> etc).It provides the following APIs for libxl:
>>    >libxl__remus_device_setup
>>      setup remus devices, like attach qdisc, enable disk buffering, etc
>>    >libxl__remus_device_teardown
>>      teardown devices
>>    >libxl__remus_device_postsuspend
>>    >libxl__remus_device_preresume
>>    >libxl__remus_device_commit
>>      above three are for checkpoint.
>
> Sorry to mention this now, but I think it would be clearer if all
> these libxl__remus_device_* functions which manipulate _all_ the
> devices for a domain used the plural of "device", ie
> libxl__remus_devices_setup, etc.

Ok

>
>> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
>> index 39f1c28..bcbd02b 100644
>> --- a/tools/libxl/libxl.c
>> +++ b/tools/libxl/libxl.c
>> @@ -733,6 +733,31 @@ out:
>>   static void remus_failover_cb(libxl__egc *egc,
>>                                 libxl__domain_suspend_state *dss, int rc);
>>
>> +static void libxl__remus_setup_failed(libxl__egc *egc,
>> +                                      libxl__remus_state *rs, int rc)
>> +{
>> +    STATE_AO_GC(rs->ao);
>> +    libxl__ao_complete(egc, ao, rc);
>> +}
>> +
>> +static void libxl__remus_setup_done(libxl__egc *egc,
>> +                                    libxl__remus_state *rs, int rc)
>
> This callback appears after the code which sets it up, which naturally
> executes first.  Things should be in chronological order, as I said.
>
>> +{
>> +    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
>> +    STATE_AO_GC(rs->ao);
>> +
>> +    if (!rc) {
>> +        libxl__domain_suspend(egc, dss);
>> +        return;
>> +    }
>> +
>> +    LOG(ERROR, "Remus: failed to setup device for guest with domid %u",
>> +        dss->domid);
>
> This log message should report the rc.
>
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index 83eb29a..8cc90dc 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -1454,6 +1454,17 @@ static void libxl__domain_suspend_callback(void *data)
>>       domain_suspend_callback_common(egc, dss);
>>   }
>>
>> +static void remus_device_postsuspend_cb(libxl__egc *egc,
>> +                                        libxl__remus_state *rs, int rc)
>> +{
>
> I can't quite tell, but isn't this code not in chronological order ?
>
>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>> index f8441f6..8ef20a0 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
> ...
>> +/*
>> + * This structure is for remus device layer, it records remus devices
>> + * that have been setuped.
>> + */
>
> Still lots of uses of "setuped".  Correct would be "devices that have
> been set up".
>
>> +/*----- checkpointing APIs -----*/
>> +
>> +/* callbacks */
>> +
>> +static void device_common_cb(libxl__egc *egc,
>> +                             libxl__remus_device *dev,
>> +                             int rc)
>> +{
>
> I asked for things to be in chronological order in the file.  I think
> this function therefore ought to be below these checkpoint functions.

I used to thought that the definitions of callbacks should be before
the functions that set them, sorry for the misunderstanding, I will
correct them.

>
> Also its name isn't very suggestive.  Perhaps it should have
> "checkpoint" in the name ?
>
>> +/* API implementations */
>> +
>> +void libxl__remus_device_postsuspend(libxl__egc *egc, libxl__remus_state *rs)
> ...
>> +void libxl__remus_device_preresume(libxl__egc *egc, libxl__remus_state *rs)
> ...
>> +void libxl__remus_device_commit(libxl__egc *egc, libxl__remus_state *rs)
>
> These three functions are nearly identical.  They should be combined
> somehow.  Perhaps there should be a single function which takes the
> offset of the per-device operation vtable member as a parameter.  Or
> perhaps the three functions should be generated by a macro.  Currently
> I'm leaning towards the former.

Good suggestions

>
>> +    if(rds->num_set_up == 0)
>           ^
> Coding style.
>
>> +    for (i = 0; i < rds->num_set_up; i++) {
>> +        dev = rds->dev[i];
>> +        dev->callback = device_common_cb;
>> +        if (dev->ops->commit) {
>> +            dev->ops->commit(dev);
>> +        } else {
>> +            rds->num_devices++;
>> +            if (rds->num_devices == rds->num_set_up)
>> +                rs->callback(egc, rs, rs->saved_rc);
>
> Why not have this call device_common_cb rather than open-coding it ?

Sorry for the poor named function, device_common_cb should be used at
checkpoint. will add checkpoint to the function name.

>
>> +/*----- main flow of control -----*/
>
> This is for setup, right ?  Or teardown ?  Or both ?
>
>> +/* callbacks */
>> +
>> +static void device_setup_cb(libxl__egc *egc,
>> +                            libxl__remus_device *dev,
>> +                            int rc);
>> +static void device_match_cb(libxl__egc *egc,
>> +                            libxl__remus_device *dev,
>> +                            int rc)
>> +{
>
> This is still not in chronological order.  You must put callbacks
> after the things that set them up, since obviously the callback
> happens after it is set up.
>
> Seeing things in the wrong order makes it hard to see what's going on.
> In the interests of trying to reduce the number of iterations of this
> series I'm going to do a kind of peephole review of what I see here;
> you should bear in mind that I'm not always trying to understand what
> the purpose of these functions is so I may have structural comments on
> your next version.

Ok, just don't quite follow the meaning of chronological order, but it's
more clearer for me now.

>
>> +    libxl__remus_device_state *const rds = dev->rds;
>> +    libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state);
>> +
>> +    STATE_AO_GC(rs->ao);
>> +
>> +    if (rc) {
>> +        if (++dev->ops_index >= ARRAY_SIZE(dev_ops) ||
>
> This use of ++ looks confusing to me.
>
>> +            rc != ERROR_REMUS_DEVOPS_NOT_MATCH) {
>> +            /* the device can not be matched */
>> +            rds->num_devices++;
>> +            rs->saved_rc = ERROR_FAIL;
>
> Please can we not return more ERROR_FAIL.  Instead, why not return
> ERROR_REMUS_DEVICE_NOT_SUPPORTED as I suggested before ?

I will add this error code.

>
>> +    rds->num_devices++;
>> +    /*
>> +     * we add devices that have been setuped to the array no matter
>> +     * the setup process succeed or failed because we need to ensure
>> +     * the device been teardown while setup failed. If any of the
>> +     * device setup failed, we will quit remus, but before we exit,
>> +     * we will teardown the devices that have been added to **dev
>> +     */
>> +    rds->dev[rds->num_set_up++] = dev;
>> +    if (rc) {
>> +        /* setup failed */
>> +        rs->saved_rc = ERROR_FAIL;
>> +    }
>
> This doesn't look right.  If the setup fails, presumably we shouldn't
> put the device in the array ?  If we do it will presumably be torn
> down later.

the netbuf script was designed as below:
1. when setup failed, the script won't teardown the device itself.
2. the teardown op is ok to be excute many times.

In the remus device layer, if one device setup failed(whether script
exit or the script get killed or something), we will quit
remus, but before that, we will teardown all device that has been set
up, whether it's correctly set up or not. So if we don't put the
device in the array, we will get a leaked device, that has not been
teardown.

>
>> +static void destroy_device_ops(libxl__remus_state *rs);
>> +static void device_teardown_cb(libxl__egc *egc,
>> +                               libxl__remus_device *dev,
>> +                               int rc)
>> +{
>> +    libxl__remus_device_state *const rds = dev->rds;
>> +    libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state);
>> +
>> +    STATE_AO_GC(rs->ao);
>> +
>> +    /* ignore teardown errors to teardown as many devs as possible*/
>> +    rds->num_set_up--;
>
> We should at least preserve the rc so that we don't pretend that
> everything worked if it didn't.
>
> You may need to invent a new teardown rc variable, and/or add logging,
> so that you get the right error code out, and the right log messages,
> if a setup fails and causes a teardown which also fails.

Ok, thanks.

>
>> +static int init_device_ops(libxl__remus_state *rs)
>> +{
>> +    int i, rc;
>> +    const libxl__remus_device_ops *ops;
>> +
>> +    for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
>> +        ops = dev_ops[i];
>> +        if (ops->init(ops, rs)) {
>> +            rc = ERROR_FAIL;
>
> In general, you should preserve the rc, not turn everything into
> ERROR_FAIL.

Ok, thanks.

>
> I still don't understand why libxl__remus_device_state is not part of
> libxl__remus_state.

libxl__remus_device_state is part of libxl__remus_state:
+struct libxl__remus_state {
+    /* must set by caller of libxl__remus_device_(setup|teardown) */
+    libxl__ao *ao;
+    uint32_t domid;
+    libxl__remus_callback *callback;
+
+    /* private */
+    int saved_rc;
+    /* context containing device related stuff */
+    libxl__remus_device_state dev_state;
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+    libxl__ev_time timeout; /* used for checkpoint */
+};

>
> I wonder if you need a different name for "the abstract thing which is
> represented by a libxl__remus_device_ops".  Since each ops only deals
> with one kind (but perhaps a kind may have several ops), I suggest
> calling them "subkind"s or "devsubkinds" or something.  At the moment
> you seem to be calling them "ops".
>
> As a result I think this function is poorly named.  It doesn't
> initialise any ops.  It initialises each subkind.  Perhaps it should
> be called init_device_subkind.
>
> You might consider whether libxl__remus_device_ops should be
> libxl__remus_subkind_ops perhaps or libxl__remus_devsubkind_ops or
> libxl__remus_device_subkind_ops.

Ok, thanks.

>
>> +void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
>> +{
>> +    STATE_AO_GC(rs->ao);
>> +
>> +    /* Convenience aliases */
>> +    libxl__remus_device_state *const rds = &rs->dev_state;
>> +
>> +    if (ARRAY_SIZE(dev_ops) == 0)
>> +        goto out;
>
> Why is this special case necessary ?
>
>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
>> index de25f42..e56567f 100644
>> --- a/tools/libxl/libxl_types.idl
>> +++ b/tools/libxl/libxl_types.idl
>> @@ -44,6 +44,7 @@ libxl_error = Enumeration("error", [
>>       (-12, "OSEVENT_REG_FAIL"),
>>       (-13, "BUFFERFULL"),
>>       (-14, "UNKNOWN_CHILD"),
>> +    (-15, "REMUS_DEVOPS_NOT_MATCH"),
>
> I don't object to this name for this exact case and it's fine to have
> a specific error code.
>
> But if you are going to make the whole machinery return
> ERROR_REMUS_DEVICE_NOT_SUPPORTED if no subkind matches, then you can
> reuse that error code for the subkind methods.

Will add ERROR_REMUS_DEVICE_NOT_SUPPORTED error code.

>
> Thanks,
> Ian.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v15 3/7] remus netbuffer: implement remus network buffering for nic devices
  2014-07-09 23:15   ` Ian Jackson
@ 2014-07-10  3:03     ` Hongyang Yang
  2014-07-10 17:43       ` Ian Jackson
  2014-07-10 12:53     ` Ian Campbell
  1 sibling, 1 reply; 20+ messages in thread
From: Hongyang Yang @ 2014-07-10  3:03 UTC (permalink / raw)
  To: Ian Jackson
  Cc: laijs, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, rshriram, ian.campbell



On 07/10/2014 07:15 AM, Ian Jackson wrote:
> Yang Hongyang writes ("[PATCH v15 3/7] remus netbuffer: implement remus network buffering for nic devices"):
>> diff --git a/tools/hotplug/Linux/remus-netbuf-setup b/tools/hotplug/Linux/remus-netbuf-setup
>> new file mode 100644
>> index 0000000..58c46f3
>> --- /dev/null
>> +++ b/tools/hotplug/Linux/remus-netbuf-setup
>> @@ -0,0 +1,203 @@
>> +#!/bin/bash
>> +#============================================================================
>> +# ${XEN_SCRIPT_DIR}/remus-netbuf-setup
> ...
>> +# IFB         ifb interface to be cleaned up (required). [for teardown op only]
>
> It occurs to me to ask: Is it wise to use such a bare name for the
> variable IFB ?  I would suggest that it might be better to use a
> variable with XEN or REMUS or something in it.  Otherwise perhaps
> someone somewhere might think IFB stood for something else and would
> set it to something irrelevant (or your setting of it might break
> their usage...)

Maybe REMUS_IFB?

>
> I had some other comments about this script in v10, which we were
> having a conversation about.  I'm afraid I seem to have dropped the
> thread of that.  I will reply separately in that thread.

I have replied in that thread.

>
>> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
>> index bcbd02b..b7d62c1 100644
>> --- a/tools/libxl/libxl.c
>> +++ b/tools/libxl/libxl.c
>> @@ -788,6 +788,24 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
> ...
>> +    /* Setup network buffering */
>
> This comment is misleading.  This code doesn't actually set up any
> network buffering.  It doesn't call the setup script.  All it does is
> some checks and computing the script path.
>
> It would probably be better to simply remove the comment.  Or you
> could replace it with something more accurate.

Will remove the comment, thanks.

>
>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>> index 8ef20a0..2fe36a6 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
>> @@ -310,6 +310,10 @@ struct libxl__gc {
>>       libxl_ctx *owner;
>>   };
>>
>> +/* remus device ops specific structures start */
>> +typedef struct libxl__remus_netbuf_state libxl__remus_netbuf_state;
>> +/* remus device ops specific structures end */
>
> I don't think these comments add anything.  You should move these
> struct typedefs alongside the other struct typedefs (libxl__ao et al).
> There is then no need to comment what they are.
>
>>   struct libxl__ctx {
>>       xentoollog_logger *lg;
>>       xc_interface *xch;
>> @@ -374,6 +378,9 @@ struct libxl__ctx {
>>       LIBXL_LIST_ENTRY(libxl_ctx) sigchld_users_entry;
>>
>>       libxl_version_info version_info;
>> +
>> +    /* remus device ops specific structures */
>> +    libxl__remus_netbuf_state *rns;
>
> Again, I think this comment adds nothing that isn't in the type name.
> There is no need to say in a comment anything that is plain from the
> code.
>
> But before you delete it (and other comments that I'm going to quibble
> about), please wait for a comment from the other tools maintainer, Ian
> Campbell, who may disagree.
>
>> @@ -2631,6 +2639,8 @@ struct libxl__remus_state {
>>       libxl__ao *ao;
>>       uint32_t domid;
>>       libxl__remus_callback *callback;
>> +    /* Script to setup/teardown network buffers */
>> +    const char *netbufscript;
>
> And again.
>
>> diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
>> index 52d593c..025ee89 100644
>> --- a/tools/libxl/libxl_netbuffer.c
>> +++ b/tools/libxl/libxl_netbuffer.c
>> @@ -17,11 +17,486 @@
>>
>>   #include "libxl_internal.h"
>>
>> +#include <netlink/cache.h>
>> +#include <netlink/socket.h>
>> +#include <netlink/attr.h>
>> +#include <netlink/route/link.h>
>> +#include <netlink/route/route.h>
>> +#include <netlink/route/qdisc.h>
>> +#include <netlink/route/qdisc/plug.h>
>> +
>> +struct libxl__remus_netbuf_state {
>> +    libxl__ao *ao;
>> +    uint32_t domid;
>> +    const char *netbufscript;
>
> Why is there a copy of netbufscript here ?

We use netbuf state in device concrete layer, so we made a copy
of this.

>
>> +/*----- init() and destroy() -----*/
>> +
>> +static int nic_init(const libxl__remus_device_ops *self,
>> +                    libxl__remus_state *rs)
>> +{
>
> This function seems to leak the things it allocates, on error.
> The `out' section should probably call destroy.

If this failed, we will call teardown, and in teardown, the
nic_destroy will be called.

>
>> +    ns->nlsock = nl_socket_alloc();
>> +    if (!ns->nlsock) {
>> +        LOG(ERROR, "cannot allocate nl socket");
>> +        goto out;
>> +    }
>
> Last time I commented on the error handling in this patch.  I see that
> you have added an nl_geterror in some places.
>
> When I make a comment like that, you need to apply it whereever
> relevant in the code, and to the whole patch series.

Yes, actually I've checked the whole patch about this and
corrected them. for this special case, nl_socket_alloc() doesn't
return an error number, so we can't use nl_geterror, it returns
NULL if alloc failed, so I think this error message should be ok.

>
>> +static void nic_destroy(const libxl__remus_device_ops *self,
>> +                        libxl__remus_state *rs)
> ...
>> +    /* free qdisc cache */
>> +    if (ns->qdisc_cache) {
>> +        nl_cache_clear(ns->qdisc_cache);
>> +        nl_cache_free(ns->qdisc_cache);
>
> Can these functions fail ?  If so, what does it mean and what should
> the program do ?

No, they both return void.

>
>> +/*----- checkpointing APIs -----*/
>> +
>> +/* The buffer_op's value, not the value passed to kernel */
>
> I would say "The value of buffer_op, not the value passed to kernel".

Ok, thanks.

>
>> +static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
>> +                           libxl__remus_netbuf_state *netbuf_state,
>> +                           int buffer_op)
>> +{
>> +    int rc;
>> +
>> +    STATE_AO_GC(netbuf_state->ao);
>> +
>> +    if (buffer_op == tc_buffer_start)
>> +        rc = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
>> +    else
>> +        rc = rtnl_qdisc_plug_release_one(remus_nic->qdisc);
>
> Error handling again.

I can't quite follow you, what's wrong with the error handling in this
function except the name of the "rc"?

>
> Also do not put anything other than a libxl error code in a variable
> called "rc".

Ok.

>
>> +/*----- main flow of control -----*/
>> +
>> +/* helper functions */
>> +
>> +/*
>> + * If the device has a vifname, then use that instead of
>> + * the vifX.Y format.
>> + * it must ONLY be used for remus because if driver domains
>> + * were in use it would constitute a security vulnerability.
>> + */
>> +static const char *get_vifname(libxl__remus_device *dev,
>> +                               const libxl_device_nic *nic)
>> +{
> ...
>> +    path = libxl__sprintf(gc, "%s/backend/vif/%d/%d/vifname",
>> +                          libxl__xs_get_dompath(gc, 0), domid, nic->devid);
>
> Please use GCSPRINTF where applicable.

Ok, thanks.

>
>> +    rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
>> +    if (!rc && !vifname) {
>> +        /* use the default name */
>
> This comment adds nothing to the comment above.
>
>> +        vifname = libxl__device_nic_devname(gc, domid,
>> +                                            nic->devid,
>> +                                            nic->nictype);
>
>> +    rc = ERROR_FAIL;
>> +    ifindex = rtnl_link_get_ifindex(ifb);
>> +    if (!ifindex) {
>> +        LOG(ERROR, "interface %s has no index", remus_nic->ifb);
>> +        goto out;
>
> How does rtnl_link_get_ifindex report its error code ?  Or can it fail
> only one way ?

It Returns Interface index or 0 if not set. No error code will be returned.

>
>> +    qdisc = rtnl_qdisc_get_by_parent(netbuf_state->qdisc_cache, ifindex,
>> +                                     TC_H_ROOT);
>> +
>> +    if (qdisc) {
>> +        const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
>> +        /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
>> +        if (!tc_kind || strcmp(tc_kind, "plug")) {
>> +            nl_object_put((struct nl_object *)qdisc);
>
> This freeing of qdisc should be in the `out' section.
>
> You can do this either by arranging that the `out' section is not used
> at all on successful completion, or by saying something like
>      if (rc) {
>          if (qdisc) nl_object_put((struct nl_object *)qdisc);
>      }

Ok thanks.

>
>> +            LOG(ERROR, "plug qdisc is not installed on %s", remus_nic->ifb);
>> +            goto out;
>> +        }
>> +        remus_nic->qdisc = qdisc;
>> +        rc = 0;
>> +    } else {
>> +        LOG(ERROR, "Cannot get qdisc handle from ifb %s", remus_nic->ifb);
>
> This path out of the function seems not to set rc.
>
> ... In fact, I see that you initialise rc to ERROR_FAIL above.  IMO
> this is not a good programming paradigm.  rc should be set, obviously,
> on every exit path, at the point the error is generated.

Ok thanks.

>
>> +/*
>> + * In return, the script writes the name of IFB device (during setup) to be
>> + * used for output buffering into XENBUS_PATH/ifb
>> + */
>> +static void netbuf_setup_script_cb(libxl__egc *egc,
>> +                                   libxl__async_exec_state *aes,
>> +                                   int status)
>
> This callback is not in chronological order.
>
> I'm afraid I've run out of time now so I'm going to stop here.  I hope
> these comments have been helpful.

Thank you for patiently reviewing this.

>
> Thanks,
> Ian.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v15 3/7] remus netbuffer: implement remus network buffering for nic devices
  2014-07-09 23:15   ` Ian Jackson
  2014-07-10  3:03     ` Hongyang Yang
@ 2014-07-10 12:53     ` Ian Campbell
  1 sibling, 0 replies; 20+ messages in thread
From: Ian Campbell @ 2014-07-10 12:53 UTC (permalink / raw)
  To: Ian Jackson
  Cc: laijs, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, rshriram, Yang Hongyang

On Thu, 2014-07-10 at 00:15 +0100, Ian Jackson wrote:
> Again, I think this comment adds nothing that isn't in the type name.
> There is no need to say in a comment anything that is plain from the
> code.

> But before you delete it (and other comments that I'm going to quibble
> about), please wait for a comment from the other tools maintainer, Ian
> Campbell, who may disagree.

I've not looked at all of them but the random sampling I glanced at did
seem unnecessary.

Ian.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v15 2/7] remus: introduce remus device
  2014-07-10  2:18     ` Hongyang Yang
@ 2014-07-10 17:34       ` Ian Jackson
  2014-07-11  2:12         ` Hongyang Yang
  0 siblings, 1 reply; 20+ messages in thread
From: Ian Jackson @ 2014-07-10 17:34 UTC (permalink / raw)
  To: Hongyang Yang
  Cc: laijs, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, rshriram, ian.campbell

Hongyang Yang writes ("Re: [PATCH v15 2/7] remus: introduce remus device"):
> On 07/10/2014 01:26 AM, Ian Jackson wrote:
> > Sorry to mention this now, but I think it would be clearer if all
> > these libxl__remus_device_* functions which manipulate _all_ the
> > devices for a domain used the plural of "device", ie
> > libxl__remus_devices_setup, etc.
> 
> Ok

Thanks.  You may find git-filter-branch helpful to do this easily.
(Be careful not to blow your leg off.)

> >> +    rds->num_devices++;
> >> +    /*
> >> +     * we add devices that have been setuped to the array no matter
> >> +     * the setup process succeed or failed because we need to ensure
> >> +     * the device been teardown while setup failed. If any of the
> >> +     * device setup failed, we will quit remus, but before we exit,
> >> +     * we will teardown the devices that have been added to **dev
> >> +     */
> >> +    rds->dev[rds->num_set_up++] = dev;
> >> +    if (rc) {
> >> +        /* setup failed */
> >> +        rs->saved_rc = ERROR_FAIL;
> >> +    }
> >
> > This doesn't look right.  If the setup fails, presumably we shouldn't
> > put the device in the array ?  If we do it will presumably be torn
> > down later.
> 
> the netbuf script was designed as below:
> 1. when setup failed, the script won't teardown the device itself.
> 2. the teardown op is ok to be excute many times.

Aha.  Right.

I think you should state exactly that, probably in a comment here and
also in the script itself.  This can replace the comment you have
above, which is rather vague.

> In the remus device layer, if one device setup failed(whether script
> exit or the script get killed or something), we will quit
> remus, but before that, we will teardown all device that has been set
> up, whether it's correctly set up or not. So if we don't put the
> device in the array, we will get a leaked device, that has not been
> teardown.

That makes sense.

> > I still don't understand why libxl__remus_device_state is not part of
> > libxl__remus_state.
> 
> libxl__remus_device_state is part of libxl__remus_state:
> +struct libxl__remus_state {
...
> +    libxl__remus_device_state dev_state;
>       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I meant: why is it a separate structure, rather than the contents
simply being included there ?


Thanks for your other replies.  You don't seem to have replied to
everything I said, including some questions I asked.  Do you intend
to, later, perhaps ?

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v15 2/7] remus: introduce remus device
@ 2014-07-10 17:35 Ian Jackson
  0 siblings, 0 replies; 20+ messages in thread
From: Ian Jackson @ 2014-07-10 17:35 UTC (permalink / raw)
  To: Hongyang Yang
  Cc: laijs, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, rshriram, ian.campbell

[-- Attachment #1: Type: text/plain, Size: 250 bytes --]

Sender: xen-devel-bounces@lists.xen.org
On-Behalf-Of: Ian.Jackson@eu.citrix.com
Subject: Re: [Xen-devel] [PATCH v15 2/7] remus: introduce remus device
Message-Id: <21438.52873.790154.902193@mariner.uk.xensource.com>
Recipient: ibudiman@overstock.com

[-- Attachment #2: Type: message/rfc822, Size: 9776 bytes --]

From: Ian Jackson <Ian.Jackson@eu.citrix.com>
To: Hongyang Yang <yanghy@cn.fujitsu.com>
Cc: <laijs@cn.fujitsu.com>, <wency@cn.fujitsu.com>, <andrew.cooper3@citrix.com>, <yunhong.jiang@intel.com>, <eddie.dong@intel.com>, <xen-devel@lists.xen.org>, <rshriram@cs.ubc.ca>, <ian.campbell@citrix.com>
Subject: Re: [Xen-devel] [PATCH v15 2/7] remus: introduce remus device
Date: Thu, 10 Jul 2014 18:34:01 +0100
Message-ID: <21438.52873.790154.902193@mariner.uk.xensource.com>

Hongyang Yang writes ("Re: [PATCH v15 2/7] remus: introduce remus device"):
> On 07/10/2014 01:26 AM, Ian Jackson wrote:
> > Sorry to mention this now, but I think it would be clearer if all
> > these libxl__remus_device_* functions which manipulate _all_ the
> > devices for a domain used the plural of "device", ie
> > libxl__remus_devices_setup, etc.
> 
> Ok

Thanks.  You may find git-filter-branch helpful to do this easily.
(Be careful not to blow your leg off.)

> >> +    rds->num_devices++;
> >> +    /*
> >> +     * we add devices that have been setuped to the array no matter
> >> +     * the setup process succeed or failed because we need to ensure
> >> +     * the device been teardown while setup failed. If any of the
> >> +     * device setup failed, we will quit remus, but before we exit,
> >> +     * we will teardown the devices that have been added to **dev
> >> +     */
> >> +    rds->dev[rds->num_set_up++] = dev;
> >> +    if (rc) {
> >> +        /* setup failed */
> >> +        rs->saved_rc = ERROR_FAIL;
> >> +    }
> >
> > This doesn't look right.  If the setup fails, presumably we shouldn't
> > put the device in the array ?  If we do it will presumably be torn
> > down later.
> 
> the netbuf script was designed as below:
> 1. when setup failed, the script won't teardown the device itself.
> 2. the teardown op is ok to be excute many times.

Aha.  Right.

I think you should state exactly that, probably in a comment here and
also in the script itself.  This can replace the comment you have
above, which is rather vague.

> In the remus device layer, if one device setup failed(whether script
> exit or the script get killed or something), we will quit
> remus, but before that, we will teardown all device that has been set
> up, whether it's correctly set up or not. So if we don't put the
> device in the array, we will get a leaked device, that has not been
> teardown.

That makes sense.

> > I still don't understand why libxl__remus_device_state is not part of
> > libxl__remus_state.
> 
> libxl__remus_device_state is part of libxl__remus_state:
> +struct libxl__remus_state {
...
> +    libxl__remus_device_state dev_state;
>       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I meant: why is it a separate structure, rather than the contents
simply being included there ?


Thanks for your other replies.  You don't seem to have replied to
everything I said, including some questions I asked.  Do you intend
to, later, perhaps ?

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v15 3/7] remus netbuffer: implement remus network buffering for nic devices
  2014-07-10  3:03     ` Hongyang Yang
@ 2014-07-10 17:43       ` Ian Jackson
  2014-07-11  2:18         ` Hongyang Yang
  0 siblings, 1 reply; 20+ messages in thread
From: Ian Jackson @ 2014-07-10 17:43 UTC (permalink / raw)
  To: Hongyang Yang
  Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, Ian Jackson,
	xen-devel, eddie.dong, rshriram, laijs

Hongyang Yang writes ("Re: [PATCH v15 3/7] remus netbuffer: implement remus network buffering for nic devices"):
> On 07/10/2014 07:15 AM, Ian Jackson wrote:
> > It occurs to me to ask: Is it wise to use such a bare name for the
> > variable IFB ?  I would suggest that it might be better to use a
> > variable with XEN or REMUS or something in it.  Otherwise perhaps
> > someone somewhere might think IFB stood for something else and would
> > set it to something irrelevant (or your setting of it might break
> > their usage...)
> 
> Maybe REMUS_IFB?

Sounds good to me.

> >> +struct libxl__remus_netbuf_state {
> >> +    libxl__ao *ao;
> >> +    uint32_t domid;
> >> +    const char *netbufscript;
> >
> > Why is there a copy of netbufscript here ?
> 
> We use netbuf state in device concrete layer, so we made a copy
> of this.

What I mean is, why does the concrete layer not have access to the
remus_state ?  Is there something wrong with passing it the
remus_state ?

> >> +/*----- init() and destroy() -----*/
> >> +
> >> +static int nic_init(const libxl__remus_device_ops *self,
> >> +                    libxl__remus_state *rs)
> >> +{
> >
> > This function seems to leak the things it allocates, on error.
> > The `out' section should probably call destroy.
> 
> If this failed, we will call teardown, and in teardown, the
> nic_destroy will be called.

OIC.  I think this needs to be documented in a comment in
libxl__remus_device_ops.  Normally one wouldn't expect to have to call
destroy() if init() fails.

> > Last time I commented on the error handling in this patch.  I see that
> > you have added an nl_geterror in some places.
> >
> > When I make a comment like that, you need to apply it whereever
> > relevant in the code, and to the whole patch series.
> 
> Yes, actually I've checked the whole patch about this and
> corrected them. for this special case, nl_socket_alloc() doesn't
> return an error number, so we can't use nl_geterror, it returns
> NULL if alloc failed, so I think this error message should be ok.

Err, OK.

> >> +static void nic_destroy(const libxl__remus_device_ops *self,
> >> +                        libxl__remus_state *rs)
> > ...
> >> +    /* free qdisc cache */
> >> +    if (ns->qdisc_cache) {
> >> +        nl_cache_clear(ns->qdisc_cache);
> >> +        nl_cache_free(ns->qdisc_cache);
> >
> > Can these functions fail ?  If so, what does it mean and what should
> > the program do ?
> 
> No, they both return void.

Oh, good.

> >> +static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
> >> +                           libxl__remus_netbuf_state *netbuf_state,
> >> +                           int buffer_op)
> >> +{
> >> +    int rc;
> >> +
> >> +    STATE_AO_GC(netbuf_state->ao);
> >> +
> >> +    if (buffer_op == tc_buffer_start)
> >> +        rc = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
> >> +    else
> >> +        rc = rtnl_qdisc_plug_release_one(remus_nic->qdisc);
> >
> > Error handling again.
> 
> I can't quite follow you, what's wrong with the error handling in this
> function except the name of the "rc"?

In fact reading it again I was confused by the fact that this function
doesn't actually return rc.  Sorry about that - but I guess this shows
why using the conventional names etc. is important.

> > How does rtnl_link_get_ifindex report its error code ?  Or can it fail
> > only one way ?
> 
> It Returns Interface index or 0 if not set. No error code will be returned.

Oh.  The error handling in this rtnl facility you're using does seem
rather poor - but of course this isn't your fault.  Thanks for
clarifying.

Thanks for your comprehensive reply.

Ian.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v15 2/7] remus: introduce remus device
  2014-07-10 17:34       ` Ian Jackson
@ 2014-07-11  2:12         ` Hongyang Yang
  2014-07-11  6:16           ` Hongyang Yang
  0 siblings, 1 reply; 20+ messages in thread
From: Hongyang Yang @ 2014-07-11  2:12 UTC (permalink / raw)
  To: Ian Jackson
  Cc: laijs, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, rshriram, ian.campbell



On 07/11/2014 01:34 AM, Ian Jackson wrote:
> Hongyang Yang writes ("Re: [PATCH v15 2/7] remus: introduce remus device"):
>> On 07/10/2014 01:26 AM, Ian Jackson wrote:
>>> Sorry to mention this now, but I think it would be clearer if all
>>> these libxl__remus_device_* functions which manipulate _all_ the
>>> devices for a domain used the plural of "device", ie
>>> libxl__remus_devices_setup, etc.
>>
>> Ok
>
> Thanks.  You may find git-filter-branch helpful to do this easily.
> (Be careful not to blow your leg off.)

Thanks for the tip.

>
>>>> +    rds->num_devices++;
>>>> +    /*
>>>> +     * we add devices that have been setuped to the array no matter
>>>> +     * the setup process succeed or failed because we need to ensure
>>>> +     * the device been teardown while setup failed. If any of the
>>>> +     * device setup failed, we will quit remus, but before we exit,
>>>> +     * we will teardown the devices that have been added to **dev
>>>> +     */
>>>> +    rds->dev[rds->num_set_up++] = dev;
>>>> +    if (rc) {
>>>> +        /* setup failed */
>>>> +        rs->saved_rc = ERROR_FAIL;
>>>> +    }
>>>
>>> This doesn't look right.  If the setup fails, presumably we shouldn't
>>> put the device in the array ?  If we do it will presumably be torn
>>> down later.
>>
>> the netbuf script was designed as below:
>> 1. when setup failed, the script won't teardown the device itself.
>> 2. the teardown op is ok to be excute many times.
>
> Aha.  Right.
>
> I think you should state exactly that, probably in a comment here and
> also in the script itself.  This can replace the comment you have
> above, which is rather vague.
>
>> In the remus device layer, if one device setup failed(whether script
>> exit or the script get killed or something), we will quit
>> remus, but before that, we will teardown all device that has been set
>> up, whether it's correctly set up or not. So if we don't put the
>> device in the array, we will get a leaked device, that has not been
>> teardown.
>
> That makes sense.
>
>>> I still don't understand why libxl__remus_device_state is not part of
>>> libxl__remus_state.
>>
>> libxl__remus_device_state is part of libxl__remus_state:
>> +struct libxl__remus_state {
> ...
>> +    libxl__remus_device_state dev_state;
>>        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> I meant: why is it a separate structure, rather than the contents
> simply being included there ?
>

Err, I thought I've replied on the last thread about this, but I will
reply here.
I intend to use libxl__remus_state on upper layer, that is, in the
main flow of libxl layer, and use libxl__remus_device_state in both
remus abstract layer and concrete layer. I thought that will make
things more clear. But yes, there still some libxl__remus_state use
in remus abstract layer and concrete layer, I will fix it up in the
next version.

>
> Thanks for your other replies.  You don't seem to have replied to
> everything I said, including some questions I asked.  Do you intend
> to, later, perhaps ?

Sorry, I might have missed some of your comments or questions, but
that's not what I was intend to...I was trying to reply every question
you've pointed out. I will go back and go through your comments carefully.

>
> Thanks,
> Ian.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v15 3/7] remus netbuffer: implement remus network buffering for nic devices
  2014-07-10 17:43       ` Ian Jackson
@ 2014-07-11  2:18         ` Hongyang Yang
  0 siblings, 0 replies; 20+ messages in thread
From: Hongyang Yang @ 2014-07-11  2:18 UTC (permalink / raw)
  To: Ian Jackson
  Cc: laijs, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, rshriram, ian.campbell



On 07/11/2014 01:43 AM, Ian Jackson wrote:
> Hongyang Yang writes ("Re: [PATCH v15 3/7] remus netbuffer: implement remus network buffering for nic devices"):
>> On 07/10/2014 07:15 AM, Ian Jackson wrote:
>>> It occurs to me to ask: Is it wise to use such a bare name for the
>>> variable IFB ?  I would suggest that it might be better to use a
>>> variable with XEN or REMUS or something in it.  Otherwise perhaps
>>> someone somewhere might think IFB stood for something else and would
>>> set it to something irrelevant (or your setting of it might break
>>> their usage...)
>>
>> Maybe REMUS_IFB?
>
> Sounds good to me.
>
>>>> +struct libxl__remus_netbuf_state {
>>>> +    libxl__ao *ao;
>>>> +    uint32_t domid;
>>>> +    const char *netbufscript;
>>>
>>> Why is there a copy of netbufscript here ?
>>
>> We use netbuf state in device concrete layer, so we made a copy
>> of this.
>
> What I mean is, why does the concrete layer not have access to the
> remus_state ?  Is there something wrong with passing it the
> remus_state ?

I intend to make less connections between layers, but I will rethink
about this.

>
>>>> +/*----- init() and destroy() -----*/
>>>> +
>>>> +static int nic_init(const libxl__remus_device_ops *self,
>>>> +                    libxl__remus_state *rs)
>>>> +{
>>>
>>> This function seems to leak the things it allocates, on error.
>>> The `out' section should probably call destroy.
>>
>> If this failed, we will call teardown, and in teardown, the
>> nic_destroy will be called.
>
> OIC.  I think this needs to be documented in a comment in
> libxl__remus_device_ops.  Normally one wouldn't expect to have to call
> destroy() if init() fails.
>
>>> Last time I commented on the error handling in this patch.  I see that
>>> you have added an nl_geterror in some places.
>>>
>>> When I make a comment like that, you need to apply it whereever
>>> relevant in the code, and to the whole patch series.
>>
>> Yes, actually I've checked the whole patch about this and
>> corrected them. for this special case, nl_socket_alloc() doesn't
>> return an error number, so we can't use nl_geterror, it returns
>> NULL if alloc failed, so I think this error message should be ok.
>
> Err, OK.
>
>>>> +static void nic_destroy(const libxl__remus_device_ops *self,
>>>> +                        libxl__remus_state *rs)
>>> ...
>>>> +    /* free qdisc cache */
>>>> +    if (ns->qdisc_cache) {
>>>> +        nl_cache_clear(ns->qdisc_cache);
>>>> +        nl_cache_free(ns->qdisc_cache);
>>>
>>> Can these functions fail ?  If so, what does it mean and what should
>>> the program do ?
>>
>> No, they both return void.
>
> Oh, good.
>
>>>> +static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
>>>> +                           libxl__remus_netbuf_state *netbuf_state,
>>>> +                           int buffer_op)
>>>> +{
>>>> +    int rc;
>>>> +
>>>> +    STATE_AO_GC(netbuf_state->ao);
>>>> +
>>>> +    if (buffer_op == tc_buffer_start)
>>>> +        rc = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
>>>> +    else
>>>> +        rc = rtnl_qdisc_plug_release_one(remus_nic->qdisc);
>>>
>>> Error handling again.
>>
>> I can't quite follow you, what's wrong with the error handling in this
>> function except the name of the "rc"?
>
> In fact reading it again I was confused by the fact that this function
> doesn't actually return rc.  Sorry about that - but I guess this shows
> why using the conventional names etc. is important.
>
>>> How does rtnl_link_get_ifindex report its error code ?  Or can it fail
>>> only one way ?
>>
>> It Returns Interface index or 0 if not set. No error code will be returned.
>
> Oh.  The error handling in this rtnl facility you're using does seem
> rather poor - but of course this isn't your fault.  Thanks for
> clarifying.
>
> Thanks for your comprehensive reply.
>
> Ian.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v15 2/7] remus: introduce remus device
  2014-07-11  2:12         ` Hongyang Yang
@ 2014-07-11  6:16           ` Hongyang Yang
  0 siblings, 0 replies; 20+ messages in thread
From: Hongyang Yang @ 2014-07-11  6:16 UTC (permalink / raw)
  To: Ian Jackson
  Cc: laijs, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, rshriram, ian.campbell



On 07/11/2014 10:12 AM, Hongyang Yang wrote:
>
>
> On 07/11/2014 01:34 AM, Ian Jackson wrote:
>> Hongyang Yang writes ("Re: [PATCH v15 2/7] remus: introduce remus device"):
>>> On 07/10/2014 01:26 AM, Ian Jackson wrote:
>>>> Sorry to mention this now, but I think it would be clearer if all
>>>> these libxl__remus_device_* functions which manipulate _all_ the
>>>> devices for a domain used the plural of "device", ie
>>>> libxl__remus_devices_setup, etc.
>>>
>>> Ok
>>
>> Thanks.  You may find git-filter-branch helpful to do this easily.
>> (Be careful not to blow your leg off.)
>
> Thanks for the tip.
>
>>
>>>>> +    rds->num_devices++;
>>>>> +    /*
>>>>> +     * we add devices that have been setuped to the array no matter
>>>>> +     * the setup process succeed or failed because we need to ensure
>>>>> +     * the device been teardown while setup failed. If any of the
>>>>> +     * device setup failed, we will quit remus, but before we exit,
>>>>> +     * we will teardown the devices that have been added to **dev
>>>>> +     */
>>>>> +    rds->dev[rds->num_set_up++] = dev;
>>>>> +    if (rc) {
>>>>> +        /* setup failed */
>>>>> +        rs->saved_rc = ERROR_FAIL;
>>>>> +    }
>>>>
>>>> This doesn't look right.  If the setup fails, presumably we shouldn't
>>>> put the device in the array ?  If we do it will presumably be torn
>>>> down later.
>>>
>>> the netbuf script was designed as below:
>>> 1. when setup failed, the script won't teardown the device itself.
>>> 2. the teardown op is ok to be excute many times.
>>
>> Aha.  Right.
>>
>> I think you should state exactly that, probably in a comment here and
>> also in the script itself.  This can replace the comment you have
>> above, which is rather vague.
>>
>>> In the remus device layer, if one device setup failed(whether script
>>> exit or the script get killed or something), we will quit
>>> remus, but before that, we will teardown all device that has been set
>>> up, whether it's correctly set up or not. So if we don't put the
>>> device in the array, we will get a leaked device, that has not been
>>> teardown.
>>
>> That makes sense.
>>
>>>> I still don't understand why libxl__remus_device_state is not part of
>>>> libxl__remus_state.
>>>
>>> libxl__remus_device_state is part of libxl__remus_state:
>>> +struct libxl__remus_state {
>> ...
>>> +    libxl__remus_device_state dev_state;
>>>        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> I meant: why is it a separate structure, rather than the contents
>> simply being included there ?
>>
>
> Err, I thought I've replied on the last thread about this, but I will
> reply here.
> I intend to use libxl__remus_state on upper layer, that is, in the
> main flow of libxl layer, and use libxl__remus_device_state in both
> remus abstract layer and concrete layer. I thought that will make
> things more clear. But yes, there still some libxl__remus_state use
> in remus abstract layer and concrete layer, I will fix it up in the
> next version.

I reconsidered about this case, most part of libxl__remus_state can be
merged into libxl__remus_device_state, there's only one left:
libxl__ev_time timeout; /* used for checkpoint */
but I think this can be moved to libxl__domain_suspend_state just like
interval for remus:
struct libxl__domain_suspend_state {
...snip...
     const char *dm_savefile;
     int interval; /* checkpoint interval (for Remus) */
+   libxl__ev_time checkpoint_timeout; /* used for remus checkpoint */
...snip...
};

So, I'll merge libxl__remus_state into libxl__remus_device_state.

>
>>
>> Thanks for your other replies.  You don't seem to have replied to
>> everything I said, including some questions I asked.  Do you intend
>> to, later, perhaps ?
>
> Sorry, I might have missed some of your comments or questions, but
> that's not what I was intend to...I was trying to reply every question
> you've pointed out. I will go back and go through your comments carefully.
>
>>
>> Thanks,
>> Ian.
>> .
>>
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v15 2/7] remus: introduce remus device
  2014-07-09 17:26   ` Ian Jackson
  2014-07-10  2:18     ` Hongyang Yang
@ 2014-07-11  8:18     ` Hongyang Yang
  1 sibling, 0 replies; 20+ messages in thread
From: Hongyang Yang @ 2014-07-11  8:18 UTC (permalink / raw)
  To: Ian Jackson
  Cc: laijs, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
	xen-devel, rshriram, ian.campbell


On 07/10/2014 01:26 AM, Ian Jackson wrote:
> Yang Hongyang writes ("[PATCH v15 2/7] remus: introduce remus device"):
>> introduce remus device, an abstract layer of remus devices(nic, disk,
>> etc).It provides the following APIs for libxl:
>>    >libxl__remus_device_setup
>>      setup remus devices, like attach qdisc, enable disk buffering, etc
>>    >libxl__remus_device_teardown
>>      teardown devices
>>    >libxl__remus_device_postsuspend
>>    >libxl__remus_device_preresume
>>    >libxl__remus_device_commit
>>      above three are for checkpoint.
>
> Sorry to mention this now, but I think it would be clearer if all
> these libxl__remus_device_* functions which manipulate _all_ the
> devices for a domain used the plural of "device", ie
> libxl__remus_devices_setup, etc.
>
>> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
>> index 39f1c28..bcbd02b 100644
>> --- a/tools/libxl/libxl.c
>> +++ b/tools/libxl/libxl.c
>> @@ -733,6 +733,31 @@ out:
>>   static void remus_failover_cb(libxl__egc *egc,
>>                                 libxl__domain_suspend_state *dss, int rc);
>>
>> +static void libxl__remus_setup_failed(libxl__egc *egc,
>> +                                      libxl__remus_state *rs, int rc)
>> +{
>> +    STATE_AO_GC(rs->ao);
>> +    libxl__ao_complete(egc, ao, rc);
>> +}
>> +
>> +static void libxl__remus_setup_done(libxl__egc *egc,
>> +                                    libxl__remus_state *rs, int rc)
>
> This callback appears after the code which sets it up, which naturally
> executes first.  Things should be in chronological order, as I said.
>
>> +{
>> +    libxl__domain_suspend_state *dss = CONTAINER_OF(rs, *dss, rs);
>> +    STATE_AO_GC(rs->ao);
>> +
>> +    if (!rc) {
>> +        libxl__domain_suspend(egc, dss);
>> +        return;
>> +    }
>> +
>> +    LOG(ERROR, "Remus: failed to setup device for guest with domid %u",
>> +        dss->domid);
>
> This log message should report the rc.

Ok, thanks.

>
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index 83eb29a..8cc90dc 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -1454,6 +1454,17 @@ static void libxl__domain_suspend_callback(void *data)
>>       domain_suspend_callback_common(egc, dss);
>>   }
>>
>> +static void remus_device_postsuspend_cb(libxl__egc *egc,
>> +                                        libxl__remus_state *rs, int rc)
>> +{
>
> I can't quite tell, but isn't this code not in chronological order ?
>
>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>> index f8441f6..8ef20a0 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
> ...
>> +/*
>> + * This structure is for remus device layer, it records remus devices
>> + * that have been setuped.
>> + */
>
> Still lots of uses of "setuped".  Correct would be "devices that have
> been set up".
>
>> +/*----- checkpointing APIs -----*/
>> +
>> +/* callbacks */
>> +
>> +static void device_common_cb(libxl__egc *egc,
>> +                             libxl__remus_device *dev,
>> +                             int rc)
>> +{
>
> I asked for things to be in chronological order in the file.  I think
> this function therefore ought to be below these checkpoint functions.
>
> Also its name isn't very suggestive.  Perhaps it should have
> "checkpoint" in the name ?

Yes, I shall put "checkpoint" in the name.

>
>> +/* API implementations */
>> +
>> +void libxl__remus_device_postsuspend(libxl__egc *egc, libxl__remus_state *rs)
> ...
>> +void libxl__remus_device_preresume(libxl__egc *egc, libxl__remus_state *rs)
> ...
>> +void libxl__remus_device_commit(libxl__egc *egc, libxl__remus_state *rs)
>
> These three functions are nearly identical.  They should be combined
> somehow.  Perhaps there should be a single function which takes the
> offset of the per-device operation vtable member as a parameter.  Or
> perhaps the three functions should be generated by a macro.  Currently
> I'm leaning towards the former.
>
>> +    if(rds->num_set_up == 0)
>           ^
> Coding style.
>
>> +    for (i = 0; i < rds->num_set_up; i++) {
>> +        dev = rds->dev[i];
>> +        dev->callback = device_common_cb;
>> +        if (dev->ops->commit) {
>> +            dev->ops->commit(dev);
>> +        } else {
>> +            rds->num_devices++;
>> +            if (rds->num_devices == rds->num_set_up)
>> +                rs->callback(egc, rs, rs->saved_rc);
>
> Why not have this call device_common_cb rather than open-coding it ?
>
>> +/*----- main flow of control -----*/
>
> This is for setup, right ?  Or teardown ?  Or both ?

setup and teardown, will change this comment.

>
>> +/* callbacks */
>> +
>> +static void device_setup_cb(libxl__egc *egc,
>> +                            libxl__remus_device *dev,
>> +                            int rc);
>> +static void device_match_cb(libxl__egc *egc,
>> +                            libxl__remus_device *dev,
>> +                            int rc)
>> +{
>
> This is still not in chronological order.  You must put callbacks
> after the things that set them up, since obviously the callback
> happens after it is set up.
>
> Seeing things in the wrong order makes it hard to see what's going on.
> In the interests of trying to reduce the number of iterations of this
> series I'm going to do a kind of peephole review of what I see here;
> you should bear in mind that I'm not always trying to understand what
> the purpose of these functions is so I may have structural comments on
> your next version.
>
>> +    libxl__remus_device_state *const rds = dev->rds;
>> +    libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state);
>> +
>> +    STATE_AO_GC(rs->ao);
>> +
>> +    if (rc) {
>> +        if (++dev->ops_index >= ARRAY_SIZE(dev_ops) ||
>
> This use of ++ looks confusing to me.
>
>> +            rc != ERROR_REMUS_DEVOPS_NOT_MATCH) {
>> +            /* the device can not be matched */
>> +            rds->num_devices++;
>> +            rs->saved_rc = ERROR_FAIL;
>
> Please can we not return more ERROR_FAIL.  Instead, why not return
> ERROR_REMUS_DEVICE_NOT_SUPPORTED as I suggested before ?
>
>> +    rds->num_devices++;
>> +    /*
>> +     * we add devices that have been setuped to the array no matter
>> +     * the setup process succeed or failed because we need to ensure
>> +     * the device been teardown while setup failed. If any of the
>> +     * device setup failed, we will quit remus, but before we exit,
>> +     * we will teardown the devices that have been added to **dev
>> +     */
>> +    rds->dev[rds->num_set_up++] = dev;
>> +    if (rc) {
>> +        /* setup failed */
>> +        rs->saved_rc = ERROR_FAIL;
>> +    }
>
> This doesn't look right.  If the setup fails, presumably we shouldn't
> put the device in the array ?  If we do it will presumably be torn
> down later.
>
>> +static void destroy_device_ops(libxl__remus_state *rs);
>> +static void device_teardown_cb(libxl__egc *egc,
>> +                               libxl__remus_device *dev,
>> +                               int rc)
>> +{
>> +    libxl__remus_device_state *const rds = dev->rds;
>> +    libxl__remus_state *rs = CONTAINER_OF(rds, *rs, dev_state);
>> +
>> +    STATE_AO_GC(rs->ao);
>> +
>> +    /* ignore teardown errors to teardown as many devs as possible*/
>> +    rds->num_set_up--;
>
> We should at least preserve the rc so that we don't pretend that
> everything worked if it didn't.
>
> You may need to invent a new teardown rc variable, and/or add logging,
> so that you get the right error code out, and the right log messages,
> if a setup fails and causes a teardown which also fails.
>
>> +static int init_device_ops(libxl__remus_state *rs)
>> +{
>> +    int i, rc;
>> +    const libxl__remus_device_ops *ops;
>> +
>> +    for (i = 0; i < ARRAY_SIZE(dev_ops); i++) {
>> +        ops = dev_ops[i];
>> +        if (ops->init(ops, rs)) {
>> +            rc = ERROR_FAIL;
>
> In general, you should preserve the rc, not turn everything into
> ERROR_FAIL.
>
> I still don't understand why libxl__remus_device_state is not part of
> libxl__remus_state.
>
> I wonder if you need a different name for "the abstract thing which is
> represented by a libxl__remus_device_ops".  Since each ops only deals
> with one kind (but perhaps a kind may have several ops), I suggest
> calling them "subkind"s or "devsubkinds" or something.  At the moment
> you seem to be calling them "ops".
>
> As a result I think this function is poorly named.  It doesn't
> initialise any ops.  It initialises each subkind.  Perhaps it should
> be called init_device_subkind.
>
> You might consider whether libxl__remus_device_ops should be
> libxl__remus_subkind_ops perhaps or libxl__remus_devsubkind_ops or
> libxl__remus_device_subkind_ops.
>
>> +void libxl__remus_device_setup(libxl__egc *egc, libxl__remus_state *rs)
>> +{
>> +    STATE_AO_GC(rs->ao);
>> +
>> +    /* Convenience aliases */
>> +    libxl__remus_device_state *const rds = &rs->dev_state;
>> +
>> +    if (ARRAY_SIZE(dev_ops) == 0)
>> +        goto out;
>
> Why is this special case necessary ?

This means there're no ops present, the following process of setup are
unnecessary, so we just goto out and return.

>
>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
>> index de25f42..e56567f 100644
>> --- a/tools/libxl/libxl_types.idl
>> +++ b/tools/libxl/libxl_types.idl
>> @@ -44,6 +44,7 @@ libxl_error = Enumeration("error", [
>>       (-12, "OSEVENT_REG_FAIL"),
>>       (-13, "BUFFERFULL"),
>>       (-14, "UNKNOWN_CHILD"),
>> +    (-15, "REMUS_DEVOPS_NOT_MATCH"),
>
> I don't object to this name for this exact case and it's fine to have
> a specific error code.
>
> But if you are going to make the whole machinery return
> ERROR_REMUS_DEVICE_NOT_SUPPORTED if no subkind matches, then you can
> reuse that error code for the subkind methods.
>
> Thanks,
> Ian.
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-07-11  8:18 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-07  8:26 [PATCH v15 0/7] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
2014-07-07  8:26 ` [PATCH v15 1/7] remus: add libnl3 dependency for network buffering support Yang Hongyang
2014-07-07  8:26 ` [PATCH v15 2/7] remus: introduce remus device Yang Hongyang
2014-07-09 17:26   ` Ian Jackson
2014-07-10  2:18     ` Hongyang Yang
2014-07-10 17:34       ` Ian Jackson
2014-07-11  2:12         ` Hongyang Yang
2014-07-11  6:16           ` Hongyang Yang
2014-07-11  8:18     ` Hongyang Yang
2014-07-07  8:26 ` [PATCH v15 3/7] remus netbuffer: implement remus network buffering for nic devices Yang Hongyang
2014-07-09 23:15   ` Ian Jackson
2014-07-10  3:03     ` Hongyang Yang
2014-07-10 17:43       ` Ian Jackson
2014-07-11  2:18         ` Hongyang Yang
2014-07-10 12:53     ` Ian Campbell
2014-07-07  8:26 ` [PATCH v15 4/7] remus drbd: Implement remus drbd replicated disk Yang Hongyang
2014-07-07  8:26 ` [PATCH v15 5/7] libxl: network buffering cmdline switch Yang Hongyang
2014-07-07  8:26 ` [PATCH v15 6/7] libxl: disk " Yang Hongyang
2014-07-07  8:26 ` [PATCH v15 7/7] MAINTAINERS: Update maintained files of REMUS Yang Hongyang
  -- strict thread matches above, loose matches on Subject: below --
2014-07-10 17:35 [PATCH v15 2/7] remus: introduce remus device Ian Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.