xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0 of 4 V5] Remus/Libxl: Network buffering support
@ 2013-11-18 20:03 Shriram Rajagopalan
  2013-11-18 20:03 ` [PATCH 1 of 4 V5] remus: add libnl3 dependency to autoconf scripts Shriram Rajagopalan
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Shriram Rajagopalan @ 2013-11-18 20:03 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Roger Pau Monne, Ian Jackson, Ian Campbell,
	Stefano Stabellini

This patch series adds support for network buffering in the Remus
codebase in libxl.  NB: This series does not contain the
bug fix related to usleep calls in libxl__domain_suspend.
I will send it out later as a separate series.

Changes in V5:

Merge hotplug script patch (2/5) and hotplug script setup/teardown
patch (3/5) into a single patch.

Changes in V4:

[1/5] Remove check for libnl command line utils in autoconf checks

[2/5] minor nits

[3/5] define LIBXL_HAVE_REMUS_NETBUF in libxl.h

[4/5] clean ups. Make the usleep in checkpoint callback asynchronous

[5/5] minor nits

Changes in V3:
[1/5] Fix redundant checks in configure scripts
      (based on Ian Campbell's suggestions)

[2/5] Introduce locking in the script, during IFB setup.
      Add xenstore paths used by netbuf scripts
      to xenstore-paths.markdown

[3/5] Hotplug scripts setup/teardown invocations are now asynchronous
      following IanJ's feedback.  However, the invocations are still
      sequential. 

[5/5] Allow per-domain specification of netbuffer scripts in xl remus
      commmand.

And minor nits throughout the series based on feedback from
the last version

Changes in V2:
[1/5] Configure script will automatically enable/disable network
      buffer support depending on the availability of the appropriate
      libnl3 version. [If libnl3 is unavailable, a warning message will be
      printed to let the user know that the feature has been disabled.]

      use macros from pkg.m4 instead of pkg-config commands
      removed redundant checks for libnl3 libraries.

[3,4/5] - Minor nits.

Version 1:

[1/5] Changes to autoconf scripts to check for libnl3. Add linker flags
      to libxl Makefile.

[2/5] External script to setup/teardown network buffering using libnl3's
      CLI. This script will be invoked by libxl before starting Remus.
      The script's main job is to bring up an IFB device with plug qdisc
      attached to it.  It then re-routes egress traffic from the guest's
      vif to the IFB device.

[3/5] Libxl code to invoke the external setup script, followed by netlink
      related setup to obtain a handle on the output buffers attached
      to each vif.

[4/5] Libxl interaction with network buffer module in the kernel via
      libnl3 API.

[5/5] xl cmdline switch to explicitly enable network buffering when
      starting remus.


  Few things to note: 

    a) Based on previous email discussions, the setup/teardown task has
    been moved to a hotplug style shell script which can be customized as
    desired, instead of implementing it as C code inside libxl.

    b) Libnl3 is not available on NetBSD. Nor is it available on CentOS
   (Linux).  So I have made network buffering support an optional feature
   so that it can be disabled if desired.

   c) NetBSD does not have libnl3. So I have put the setup script under
   tools/hotplug/Linux folder.

thanks
shriram

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1 of 4 V5] remus: add libnl3 dependency to autoconf scripts
  2013-11-18 20:03 [PATCH 0 of 4 V5] Remus/Libxl: Network buffering support Shriram Rajagopalan
@ 2013-11-18 20:03 ` Shriram Rajagopalan
  2013-11-18 20:03 ` [PATCH 2 of 4 V5] tools/libxl: Remus network buffering - hotplug scripts and setup code Shriram Rajagopalan
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Shriram Rajagopalan @ 2013-11-18 20:03 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Roger Pau Monne, Ian Jackson, Ian Campbell,
	Stefano Stabellini

# HG changeset patch
# User Shriram Rajagopalan <rshriram@cs.ubc.ca>
# Date 1384798655 28800
# Node ID 4b471809a5bb016afb2d24521b5f8f548057b29f
# Parent  974c8559f8234d46c0e5cd06a5caca072548f4f8
remus: add libnl3 dependency to autoconf scripts

Libnl3 is required for controlling Remus network buffering.
This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
Also provide ability to configure tools without libnl3 support, that
is without network buffering support.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

diff -r 974c8559f823 -r 4b471809a5bb README
--- a/README	Mon Nov 18 09:38:07 2013 -0800
+++ b/README	Mon Nov 18 10:17:35 2013 -0800
@@ -71,6 +71,10 @@ disabled at compile time:
       includes the alternative ocaml xenstored.
     * cmake (if building vtpm stub domains)
     * markdown
+    * Development install of libnl3 (e.g., libnl-3-200,
+      libnl-3-dev, etc).  Required if network buffering is desired
+      when using Remus with libxl.  See tools/remus/README for detailed
+      information.
 
 Second, you need to acquire a suitable kernel for use in domain 0. If
 possible you should use a kernel provided by your OS distributor. If
diff -r 974c8559f823 -r 4b471809a5bb config/Tools.mk.in
--- a/config/Tools.mk.in	Mon Nov 18 09:38:07 2013 -0800
+++ b/config/Tools.mk.in	Mon Nov 18 10:17:35 2013 -0800
@@ -38,6 +38,8 @@ PTHREAD_LIBS        := @PTHREAD_LIBS@
 
 PTYFUNCS_LIBS       := @PTYFUNCS_LIBS@
 
+LIBNL3_LIBS         := @LIBNL3_LIBS@
+LIBNL3_CFLAGS       := @LIBNL3_CFLAGS@
 # Download GIT repositories via HTTP or GIT's own protocol?
 # GIT's protocol is faster and more robust, when it works at all (firewalls
 # may block it). We make it the default, but if your GIT repository downloads
@@ -56,6 +58,7 @@ CONFIG_QEMU_TRAD    := @qemu_traditional
 CONFIG_QEMU_XEN     := @qemu_xen@
 CONFIG_XEND         := @xend@
 CONFIG_BLKTAP1      := @blktap1@
+CONFIG_REMUS_NETBUF := @remus_netbuf@
 
 #System options
 ZLIB                := @zlib@
diff -r 974c8559f823 -r 4b471809a5bb tools/configure.ac
--- a/tools/configure.ac	Mon Nov 18 09:38:07 2013 -0800
+++ b/tools/configure.ac	Mon Nov 18 10:17:35 2013 -0800
@@ -230,6 +230,21 @@ AC_SUBST(libiconv)
 # Checks for header files.
 AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h])
 
+# Check for libnl3 >=3.2.8. If present enable remus network buffering.
+PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8],
+		[libnl3_lib="y"], [libnl3_lib="n"])
+
+AS_IF([test "x$libnl3_lib" = "xn" ], [
+	    AC_MSG_WARN([Disabling support for Remus network buffering.
+	    Please install libnl3 libraries, command line tools and devel
+	    headers - version 3.2.8 or higher])
+	    AC_SUBST(remus_netbuf, [n])
+	    ],[
+	    AC_SUBST(LIBNL3_LIBS)
+	    AC_SUBST(LIBNL3_CFLAGS)
+	    AC_SUBST(remus_netbuf, [y])
+])
+
 AC_OUTPUT()
 
 AS_IF([test "x$xend" = "xy" ], [
diff -r 974c8559f823 -r 4b471809a5bb tools/libxl/Makefile
--- a/tools/libxl/Makefile	Mon Nov 18 09:38:07 2013 -0800
+++ b/tools/libxl/Makefile	Mon Nov 18 10:17:35 2013 -0800
@@ -21,11 +21,13 @@ endif
 
 LIBXL_LIBS =
 LIBXL_LIBS = $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) $(LDLIBS_libblktapctl) $(PTYFUNCS_LIBS) $(LIBUUID_LIBS)
+LIBXL_LIBS += $(LIBNL3_LIBS)
 
 CFLAGS_LIBXL += $(CFLAGS_libxenctrl)
 CFLAGS_LIBXL += $(CFLAGS_libxenguest)
 CFLAGS_LIBXL += $(CFLAGS_libxenstore)
 CFLAGS_LIBXL += $(CFLAGS_libblktapctl) 
+CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
 CFLAGS_LIBXL += -Wshadow
 
 CFLAGS += $(PTHREAD_CFLAGS)
diff -r 974c8559f823 -r 4b471809a5bb tools/remus/README
--- a/tools/remus/README	Mon Nov 18 09:38:07 2013 -0800
+++ b/tools/remus/README	Mon Nov 18 10:17:35 2013 -0800
@@ -2,3 +2,9 @@ Remus provides fault tolerance for virtu
 checkpoints to a backup, which will activate if the target VM fails.
 
 See the website at http://nss.cs.ubc.ca/remus/ for details.
+
+Using Remus with libxl on Xen 4.4 and higher:
+ To enable network buffering, you need libnl 3.2.8
+ or higher along with the development headers and command line utilities.
+ If your distro does not have the appropriate libnl3 version, you can find
+ the latest source tarball of libnl3 at http://www.carisma.slowglass.com/~tgr/libnl/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2 of 4 V5] tools/libxl: Remus network buffering - hotplug scripts and setup code
  2013-11-18 20:03 [PATCH 0 of 4 V5] Remus/Libxl: Network buffering support Shriram Rajagopalan
  2013-11-18 20:03 ` [PATCH 1 of 4 V5] remus: add libnl3 dependency to autoconf scripts Shriram Rajagopalan
@ 2013-11-18 20:03 ` Shriram Rajagopalan
  2013-11-25 15:32   ` Ian Jackson
  2013-11-18 20:03 ` [PATCH 3 of 4 V5] tools/libxl: Control network buffering in remus callbacks Shriram Rajagopalan
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Shriram Rajagopalan @ 2013-11-18 20:03 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Roger Pau Monne, Ian Jackson, Ian Campbell,
	Stefano Stabellini

# HG changeset patch
# User Shriram Rajagopalan <rshriram@cs.ubc.ca>
# Date 1384801774 28800
# Node ID 5b4e029bb79b8db4210a182911a4ff7ec7957c4f
# Parent  4b471809a5bb016afb2d24521b5f8f548057b29f
tools/libxl: Remus network buffering - hotplug scripts and setup code

This patch introduces remus-netbuf-setup hotplug script responsible for
setting up and tearing down the necessary infrastructure required for
network output buffering in Remus.  This script is intended to be invoked
by libxl for each guest interface, when starting or stopping Remus.

Apart from returning success/failure indication via the usual hotplug
entries in xenstore, this script also writes to xenstore, the name of
the IFB device to be used to control the vif's network output.

The script relies on libnl3 command line utilities to perform various
setup/teardown functions. The script is confined to Linux platforms only
since NetBSD does not seem to have libnl3.

Additionally, this patch introduces code to setup/teardown remus network
buffering for a given guest, when libxl_domain_remus_start API is invoked.

The following steps are taken during setup:
 a) call the hotplug script for each vif to setup its network buffer

 b) establish a dedicated remus context containing libnl related
    state (netlink sockets, qdisc caches, etc.,)

 c) Obtain handles to plug qdiscs installed on the IFB devices
    chosen by the hotplug scripts.

During teardown, the netlink resources are released, followed by
invocation of hotplug scripts to remove the IFB devices.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>

diff -r 4b471809a5bb -r 5b4e029bb79b docs/misc/xenstore-paths.markdown
--- a/docs/misc/xenstore-paths.markdown	Mon Nov 18 10:17:35 2013 -0800
+++ b/docs/misc/xenstore-paths.markdown	Mon Nov 18 11:09:34 2013 -0800
@@ -356,6 +356,10 @@ The guest's virtual time offset from UTC
 
 The device model version for a domain.
 
+#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL]
+
+IFB device used by Remus to buffer network output from the associated vif.
+
 [BLKIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html
 [FBIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,fbif.h.html
 [HVMPARAMS]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,hvm,params.h.html
diff -r 4b471809a5bb -r 5b4e029bb79b tools/hotplug/Linux/Makefile
--- a/tools/hotplug/Linux/Makefile	Mon Nov 18 10:17:35 2013 -0800
+++ b/tools/hotplug/Linux/Makefile	Mon Nov 18 11:09:34 2013 -0800
@@ -16,6 +16,7 @@ XEN_SCRIPTS += network-nat vif-nat
 XEN_SCRIPTS += vif-openvswitch
 XEN_SCRIPTS += vif2
 XEN_SCRIPTS += vif-setup
+XEN_SCRIPTS-$(CONFIG_REMUS_NETBUF) += remus-netbuf-setup
 XEN_SCRIPTS += block
 XEN_SCRIPTS += block-enbd block-nbd
 XEN_SCRIPTS-$(CONFIG_BLKTAP1) += blktap
diff -r 4b471809a5bb -r 5b4e029bb79b tools/hotplug/Linux/remus-netbuf-setup
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/hotplug/Linux/remus-netbuf-setup	Mon Nov 18 11:09:34 2013 -0800
@@ -0,0 +1,183 @@
+#!/bin/bash
+#============================================================================
+# ${XEN_SCRIPT_DIR}/remus-netbuf-setup
+#
+# Script for attaching a network buffer to the specified vif (in any mode).
+# The hotplugging system will call this script when starting remus via libxl
+# API, libxl_domain_remus_start.
+#
+# Usage:
+# remus-netbuf-setup (setup|teardown)
+#
+# Environment vars:
+# vifname     vif interface name (required).
+# XENBUS_PATH path in Xenstore, where the IFB device details will be stored
+#                      or read from (required).
+#             (libxl passes /libxl/<domid>/remus/netbuf/<devid>)
+# IFB	      ifb interface to be cleaned up (required). [for teardown op only]
+
+# Written to the store: (setup operation)
+# XENBUS_PATH/ifb=<ifbdevName> the IFB device serving
+#  as the intermediate buffer through which the interface's network output
+#  can be controlled.
+#
+# To install a network buffer on a guest vif (vif1.0) using ifb (ifb0)
+# we need to do the following
+#
+#  ip link set dev ifb0 up
+#  tc qdisc add dev vif1.0 ingress
+#  tc filter add dev vif1.0 parent ffff: proto ip \
+#    prio 10 u32 match u32 0 0 action mirred egress redirect dev ifb0
+#  nl-qdisc-add --dev=ifb0 --parent root plug
+#  nl-qdisc-add --dev=ifb0 --parent root --update plug --limit=10000000
+#                                                (10MB limit on buffer)
+#
+# So order of operations when installing a network buffer on vif1.0
+# 1. find a free ifb and bring up the device
+# 2. redirect traffic from vif1.0 to ifb:
+#   2.1 add ingress qdisc to vif1.0 (to capture outgoing packets from guest)
+#   2.2 use tc filter command with actions mirred egress + redirect
+# 3. install plug_qdisc on ifb device, with which we can buffer/release
+#    guest's network output from vif1.0
+#
+#
+
+#============================================================================
+
+# Unlike other vif scripts, vif-common is not needed here as it executes vif
+#specific setup code such as renaming.
+dir=$(dirname "$0")
+. "$dir/xen-hotplug-common.sh"
+
+findCommand "$@"
+
+if [ "$command" != "setup" -a  "$command" != "teardown" ]
+then
+  echo "Invalid command: $command"
+  log err "Invalid command: $command"
+  exit 1
+fi
+
+evalVariables "$@"
+
+: ${vifname:?}
+: ${XENBUS_PATH:?}
+
+check_libnl_tools() {
+    if ! command -v nl-qdisc-list > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-list tool"
+    fi
+    if ! command -v nl-qdisc-add > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-add tool"
+    fi
+    if ! command -v nl-qdisc-delete > /dev/null 2>&1; then
+        fatal "Unable to find nl-qdisc-delete tool"
+    fi
+}
+
+# We only check for modules. We don't load them.
+# User/Admin is supposed to load ifb during boot time,
+# ensuring that there are enough free ifbs in the system.
+# Other modules will be loaded automatically by tc commands.
+check_modules() {
+    for m in ifb sch_plug sch_ingress act_mirred cls_u32
+    do
+	if ! modinfo $m > /dev/null 2>&1; then
+	    fatal "Unable to find $m kernel module"
+	fi
+    done
+}
+
+setup_ifb() {
+
+    for ifb in `ifconfig -a -s|egrep ^ifb|cut -d ' ' -f1`
+    do
+	local installed=`nl-qdisc-list -d $ifb`
+	[ -n "$installed" ] && continue
+	IFB="$ifb"
+	break
+    done
+
+    if [ -z "$IFB" ]
+    then
+	fatal "Unable to find a free IFB device for $vifname"
+    fi
+
+    do_or_die ip link set dev "$IFB" up
+}
+
+redirect_vif_traffic() {
+    local vif=$1
+    local ifb=$2
+
+    do_or_die tc qdisc add dev "$vif" ingress
+
+    tc filter add dev "$vif" parent ffff: proto ip prio 10 \
+	u32 match u32 0 0 action mirred egress redirect dev "$ifb" >/dev/null 2>&1
+
+    if [ $? -ne 0 ]
+    then
+	do_without_error tc qdisc del dev "$vif" ingress
+	fatal "Failed to redirect traffic from $vif to $ifb"
+    fi
+}
+
+add_plug_qdisc() {
+    local vif=$1
+    local ifb=$2
+
+    nl-qdisc-add --dev="$ifb" --parent root plug >/dev/null 2>&1
+    if [ $? -ne 0 ]
+    then
+	do_without_error tc qdisc del dev "$vif" ingress
+	fatal "Failed to add plug qdisc to $ifb"
+    fi
+
+    #set ifb buffering limit in bytes. Its okay if this command fails
+    nl-qdisc-add --dev="$ifb" --parent root \
+	--update plug --limit=10000000 >/dev/null 2>&1
+}
+
+teardown_netbuf() {
+    local vif=$1
+    local ifb=$2
+
+    if [ "$ifb" ]; then
+	do_without_error ip link set dev "$ifb" down
+	do_without_error nl-qdisc-delete --dev="$ifb" --parent root plug >/dev/null 2>&1
+	xenstore-rm -t "$XENBUS_PATH/ifb" 2>/dev/null || true
+    fi
+    do_without_error tc qdisc del dev "$vif" ingress
+    xenstore-rm -t "$XENBUS_PATH/hotplug-status" 2>/dev/null || true
+}
+
+xs_write_failed() {
+    local vif=$1
+    local ifb=$2
+    teardown_netbuf "$vifname" "$IFB"
+    fatal "failed to write ifb name to xenstore"
+}
+
+case "$command" in
+    setup)
+	check_libnl_tools
+	check_modules
+
+	claim_lock "pickifb"
+	setup_ifb
+	redirect_vif_traffic "$vifname" "$IFB"
+	add_plug_qdisc "$vifname" "$IFB"
+	release_lock "pickifb"
+
+	#not using xenstore_write that automatically exits on error
+	#because we need to cleanup
+	_xenstore_write "$XENBUS_PATH/ifb" "$IFB" || xs_write_failed "$vifname" "$IFB"
+	success
+	;;
+    teardown)
+	: ${IFB:?}
+	teardown_netbuf "$vifname" "$IFB"
+	;;
+esac
+
+log debug "Successful remus-netbuf-setup $command for $vifname, ifb $IFB."
diff -r 4b471809a5bb -r 5b4e029bb79b tools/libxl/Makefile
--- a/tools/libxl/Makefile	Mon Nov 18 10:17:35 2013 -0800
+++ b/tools/libxl/Makefile	Mon Nov 18 11:09:34 2013 -0800
@@ -42,6 +42,13 @@ LIBXL_OBJS-y += libxl_blktap2.o
 else
 LIBXL_OBJS-y += libxl_noblktap2.o
 endif
+
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+LIBXL_OBJS-y += libxl_netbuffer.o
+else
+LIBXL_OBJS-y += libxl_nonetbuffer.o
+endif
+
 LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
 LIBXL_OBJS-$(CONFIG_IA64) += libxl_nocpuid.o libxl_noarch.o
 LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_noarch.o
diff -r 4b471809a5bb -r 5b4e029bb79b tools/libxl/libxl.c
--- a/tools/libxl/libxl.c	Mon Nov 18 10:17:35 2013 -0800
+++ b/tools/libxl/libxl.c	Mon Nov 18 11:09:34 2013 -0800
@@ -698,8 +698,9 @@ libxl_vminfo * libxl_list_vm(libxl_ctx *
     return ptr;
 }
 
-static void remus_failover_cb(libxl__egc *egc,
-                              libxl__domain_suspend_state *dss, int rc);
+static void remus_replication_failure_cb(libxl__egc *egc,
+                                         libxl__domain_suspend_state *dss,
+                                         int rc);
 
 /* TODO: Explicit Checkpoint acknowledgements via recv_fd. */
 int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
@@ -718,40 +719,57 @@ int libxl_domain_remus_start(libxl_ctx *
 
     GCNEW(dss);
     dss->ao = ao;
-    dss->callback = remus_failover_cb;
+    dss->callback = remus_replication_failure_cb;
     dss->domid = domid;
     dss->fd = send_fd;
     /* TODO do something with recv_fd */
     dss->type = type;
     dss->live = 1;
     dss->debug = 0;
-    dss->remus = info;
 
     assert(info);
 
-    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+    GCNEW(dss->remus_ctx);
+
+    /* convenience shorthand */
+    libxl__remus_ctx *remus_ctx = dss->remus_ctx;
+    remus_ctx->blackhole = info->blackhole;
+    remus_ctx->interval = info->interval;
+    remus_ctx->compression = info->compression;
+    remus_ctx->dss = dss;
+
+    /* TODO: enable disk buffering */
+
+    /* Setup network buffering */
+    if (info->netbuf) {
+        if (!libxl__netbuffer_enabled(gc)) {
+            LOG(ERROR, "Remus: No support for network buffering");
+            goto out;
+        }
+
+        if (info->netbufscript) {
+            remus_ctx->netbufscript =
+                libxl__strdup(gc, info->netbufscript);
+        } else {
+            remus_ctx->netbufscript =
+                GCSPRINTF("%s/remus-netbuf-setup",
+                          libxl__xen_script_dir_path());
+        }
+    }
 
     /* Point of no return */
-    libxl__domain_suspend(egc, dss);
+    libxl__remus_setup_initiate(egc, dss);
     return AO_INPROGRESS;
 
  out:
     return AO_ABORT(rc);
 }
 
-static void remus_failover_cb(libxl__egc *egc,
-                              libxl__domain_suspend_state *dss, int rc)
+static void remus_replication_failure_cb(libxl__egc *egc,
+                                         libxl__domain_suspend_state *dss,
+                                         int rc)
 {
     STATE_AO_GC(dss->ao);
-    /*
-     * With Remus, if we reach this point, it means either
-     * backup died or some network error occurred preventing us
-     * from sending checkpoints.
-     */
-
-    /* TBD: Remus cleanup - i.e. detach qdisc, release other
-     * resources.
-     */
     libxl__ao_complete(egc, ao, rc);
 }
 
diff -r 4b471809a5bb -r 5b4e029bb79b tools/libxl/libxl.h
--- a/tools/libxl/libxl.h	Mon Nov 18 10:17:35 2013 -0800
+++ b/tools/libxl/libxl.h	Mon Nov 18 11:09:34 2013 -0800
@@ -375,6 +375,19 @@
  */
 #define LIBXL_HAVE_CREATEINFO_PVH 1
 
+/*
+ * LIBXL_HAVE_REMUS_NETBUF 1
+ *
+ * If this is defined, then the libxl_domain_remus_info structure will
+ * have a boolean field (netbuf) and a string field (netbufscript).
+ *
+ * netbuf, if true, indicates that network buffering should be enabled.
+ *
+ * netbufscript, if set, indicates the path to the hotplug script to
+ * setup or teardown network buffers.
+ */
+#define LIBXL_HAVE_REMUS_NETBUF 1
+
 /* Functions annotated with LIBXL_EXTERNAL_CALLERS_ONLY may not be
  * called from within libxl itself. Callers outside libxl, who
  * do not #include libxl_internal.h, are fine. */
diff -r 4b471809a5bb -r 5b4e029bb79b tools/libxl/libxl_dom.c
--- a/tools/libxl/libxl_dom.c	Mon Nov 18 10:17:35 2013 -0800
+++ b/tools/libxl/libxl_dom.c	Mon Nov 18 11:09:34 2013 -0800
@@ -1228,6 +1228,51 @@ int libxl__toolstack_save(uint32_t domid
     return 0;
 }
 
+/*----- remus setup/teardown code -----*/
+void libxl__remus_setup_initiate(libxl__egc *egc,
+                                 libxl__domain_suspend_state *dss)
+{
+    libxl__ev_time_init(&dss->remus_ctx->timeout);
+    if (!dss->remus_ctx->netbufscript)
+        libxl__remus_setup_done(egc, dss, 0);
+    else
+        libxl__remus_netbuf_setup(egc, dss);
+}
+
+void libxl__remus_setup_done(libxl__egc *egc,
+                             libxl__domain_suspend_state *dss,
+                             int rc)
+{
+    STATE_AO_GC(dss->ao);
+    if (!rc) {
+        libxl__domain_suspend(egc, dss);
+        return;
+    }
+
+    LOG(ERROR, "Remus: failed to setup network buffering"
+        " for guest with domid %u", dss->domid);
+    domain_suspend_done(egc, dss, rc);
+}
+
+void libxl__remus_teardown_initiate(libxl__egc *egc,
+                                    libxl__domain_suspend_state *dss,
+                                    int rc)
+{
+    /* stash rc somewhere before invoking teardown ops. */
+    dss->remus_ctx->saved_rc = rc;
+
+    if (!dss->remus_ctx->netbuf_ctx)
+        libxl__remus_teardown_done(egc, dss);
+    else
+        libxl__remus_netbuf_teardown(egc, dss);
+}
+
+void libxl__remus_teardown_done(libxl__egc *egc,
+                                libxl__domain_suspend_state *dss)
+{
+    dss->callback(egc, dss, dss->remus_ctx->saved_rc);
+}
+
 /*----- remus callbacks -----*/
 
 static int libxl__remus_domain_suspend_callback(void *data)
@@ -1276,7 +1321,7 @@ static void remus_checkpoint_dm_saved(li
     /* REMUS TODO: Wait for disk and memory ack, release network buffer */
     /* REMUS TODO: make this asynchronous */
     assert(!rc); /* REMUS TODO handle this error properly */
-    usleep(dss->interval * 1000);
+    usleep(dss->remus_ctx->interval * 1000);
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
 }
 
@@ -1294,7 +1339,6 @@ void libxl__domain_suspend(libxl__egc *e
     const libxl_domain_type type = dss->type;
     const int live = dss->live;
     const int debug = dss->debug;
-    const libxl_domain_remus_info *const r_info = dss->remus;
     libxl__srm_save_autogen_callbacks *const callbacks =
         &dss->shs.callbacks.save.a;
 
@@ -1329,11 +1373,8 @@ void libxl__domain_suspend(libxl__egc *e
     dss->guest_responded = 0;
     dss->dm_savefile = libxl__device_model_savefile(gc, domid);
 
-    if (r_info != NULL) {
-        dss->interval = r_info->interval;
-        if (r_info->compression)
-            dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
-    }
+    if (dss->remus_ctx && dss->remus_ctx->compression)
+        dss->xcflags |= XCFLAGS_CHECKPOINT_COMPRESS;
 
     dss->xce = xc_evtchn_open(NULL, 0);
     if (dss->xce == NULL)
@@ -1352,7 +1393,7 @@ void libxl__domain_suspend(libxl__egc *e
     }
 
     memset(callbacks, 0, sizeof(*callbacks));
-    if (r_info != NULL) {
+    if (dss->remus_ctx != NULL) {
         callbacks->suspend = libxl__remus_domain_suspend_callback;
         callbacks->postcopy = libxl__remus_domain_resume_callback;
         callbacks->checkpoint = libxl__remus_domain_checkpoint_callback;
@@ -1512,6 +1553,17 @@ static void domain_suspend_done(libxl__e
     if (dss->xce != NULL)
         xc_evtchn_close(dss->xce);
 
+    if (dss->remus_ctx) {
+        /*
+         * With Remus, if we reach this point, it means either
+         * backup died or some network error occurred preventing us
+         * from sending checkpoints. Teardown the network buffers and
+         * release netlink resources.  This is an async op.
+         */
+        libxl__remus_teardown_initiate(egc, dss, rc);
+        return;
+    }
+
     dss->callback(egc, dss, rc);
 }
 
diff -r 4b471809a5bb -r 5b4e029bb79b tools/libxl/libxl_internal.h
--- a/tools/libxl/libxl_internal.h	Mon Nov 18 10:17:35 2013 -0800
+++ b/tools/libxl/libxl_internal.h	Mon Nov 18 11:09:34 2013 -0800
@@ -2291,6 +2291,50 @@ typedef struct libxl__logdirty_switch {
     libxl__ev_time timeout;
 } libxl__logdirty_switch;
 
+typedef struct libxl__remus_ctx {
+    /* checkpoint interval */
+    int interval;
+    int blackhole;
+    int compression;
+    int saved_rc;
+    /* Script to setup/teardown network buffers */
+    const char *netbufscript;
+    /* Opaque context containing network buffer related stuff */
+    void *netbuf_ctx;
+    libxl__domain_suspend_state *dss;
+    libxl__ev_time timeout;
+    libxl__ev_child child;
+    int num_exec;
+} libxl__remus_ctx;
+
+_hidden int libxl__netbuffer_enabled(libxl__gc *gc);
+
+_hidden void libxl__remus_setup_initiate(libxl__egc *egc,
+                                         libxl__domain_suspend_state *dss);
+
+_hidden void libxl__remus_setup_done(libxl__egc *egc,
+                                     libxl__domain_suspend_state *dss,
+                                     int rc);
+
+_hidden void libxl__remus_teardown_initiate(libxl__egc *egc,
+                                            libxl__domain_suspend_state *dss,
+                                            int rc);
+
+_hidden void libxl__remus_teardown_done(libxl__egc *egc,
+                                        libxl__domain_suspend_state *dss);
+
+_hidden void libxl__remus_netbuf_setup(libxl__egc *egc,
+                                       libxl__domain_suspend_state *dss);
+
+_hidden void libxl__remus_netbuf_teardown(libxl__egc *egc,
+                                          libxl__domain_suspend_state *dss);
+
+_hidden int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid,
+                                               libxl__remus_ctx *remus_ctx);
+
+_hidden int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid,
+                                                  libxl__remus_ctx *remus_ctx);
+
 struct libxl__domain_suspend_state {
     /* set by caller of libxl__domain_suspend */
     libxl__ao *ao;
@@ -2301,7 +2345,7 @@ struct libxl__domain_suspend_state {
     libxl_domain_type type;
     int live;
     int debug;
-    const libxl_domain_remus_info *remus;
+    libxl__remus_ctx *remus_ctx;
     /* private */
     xc_evtchn *xce; /* event channel handle */
     int suspend_eventchn;
@@ -2309,7 +2353,6 @@ struct libxl__domain_suspend_state {
     int xcflags;
     int guest_responded;
     const char *dm_savefile;
-    int interval; /* checkpoint interval (for Remus) */
     libxl__save_helper_state shs;
     libxl__logdirty_switch logdirty;
     /* private for libxl__domain_save_device_model */
diff -r 4b471809a5bb -r 5b4e029bb79b tools/libxl/libxl_netbuffer.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/libxl/libxl_netbuffer.c	Mon Nov 18 11:09:34 2013 -0800
@@ -0,0 +1,516 @@
+/*
+ * Copyright (C) 2013
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+#include <netlink/cache.h>
+#include <netlink/socket.h>
+#include <netlink/attr.h>
+#include <netlink/route/link.h>
+#include <netlink/route/route.h>
+#include <netlink/route/qdisc.h>
+#include <netlink/route/qdisc/plug.h>
+
+typedef struct libxl__remus_netbuf_ctx {
+    struct rtnl_qdisc **netbuf_qdisc_list;
+    struct nl_sock *nlsock;
+    struct nl_cache *qdisc_cache;
+    char **vif_list;
+    char **ifb_list;
+    uint32_t num_netbufs;
+    uint32_t unused;
+} libxl__remus_netbuf_ctx;
+
+/* If the device has a vifname, then use that instead of
+ * the vifX.Y format.
+ */
+static char *get_vifname(libxl__gc *gc, uint32_t domid,
+                               libxl_device_nic *nic)
+{
+    const char *vifname = NULL;
+    vifname = libxl__xs_read(gc, XBT_NULL,
+                             libxl__sprintf(gc,
+                                            "%s/backend/vif/%d/%d/vifname",
+                                            libxl__xs_get_dompath(gc, 0),
+                                            domid, nic->devid));
+    if (!vifname) {
+        vifname = libxl__device_nic_devname(gc, domid,
+                                            nic->devid,
+                                            nic->nictype);
+    }
+
+    return libxl__strdup(gc, vifname);
+}
+
+static char **get_guest_vif_list(libxl__gc *gc, uint32_t domid,
+                                 int *num_vifs)
+{
+    libxl_device_nic *nics = NULL;
+    int nb, i = 0;
+    char **vif_list = NULL;
+
+    *num_vifs = 0;
+    nics = libxl_device_nic_list(CTX, domid, &nb);
+    if (!nics) return NULL;
+
+    /* Ensure that none of the vifs are backed by driver domains */
+    for (i = 0; i < nb; i++) {
+        if (nics[i].backend_domid != LIBXL_TOOLSTACK_DOMID) {
+            LOG(ERROR, "vif %s has driver domain (%u) as its backend.\n"
+                "Network buffering is not supported with driver domains",
+                get_vifname(gc, domid, &nics[i]), nics[i].backend_domid);
+            *num_vifs = -1;
+            goto end;
+        }
+    }
+
+    vif_list = libxl__calloc(gc, nb, sizeof(char *));
+    for (i = 0; i < nb; ++i)
+        vif_list[i] = get_vifname(gc, domid, &nics[i]);
+    *num_vifs = nb;
+
+ end:
+    for (i = 0; i < nb; i++)
+        libxl_device_nic_dispose(&nics[i]);
+    free(nics);
+    return vif_list;
+}
+
+static int init_qdiscs(libxl__gc *gc,
+                       libxl__remus_ctx *remus_ctx);
+
+static int exec_netbuf_script(libxl__gc *gc, libxl__remus_ctx *remus_ctx,
+                              char *op, libxl__ev_child_callback *death);
+
+static void netbuf_setup_script_cb(libxl__egc *egc,
+                                   libxl__ev_child *child,
+                                   pid_t pid, int status);
+
+static void netbuf_teardown_script_cb(libxl__egc *egc,
+                                      libxl__ev_child *child,
+                                      pid_t pid, int status);
+
+static void netbuf_setup_timeout_cb(libxl__egc *egc,
+                                    libxl__ev_time *ev,
+                                    const struct timeval *requested_abs);
+
+static int init_qdiscs(libxl__gc *gc,
+                       libxl__remus_ctx *remus_ctx)
+{
+    int i, ret, ifindex;
+    struct rtnl_link *ifb = NULL;
+    struct rtnl_qdisc *qdisc = NULL;
+
+    /* convenience vars */
+    libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx;
+    int num_netbufs = netbuf_ctx->num_netbufs;
+    char **ifb_list = netbuf_ctx->ifb_list;
+
+    /* Now that we have brought up IFB devices with plug qdisc for
+     * each vif, lets get a netlink handle on the plug qdisc for use
+     * during checkpointing.
+     */
+    netbuf_ctx->nlsock = nl_socket_alloc();
+    if (!netbuf_ctx->nlsock) {
+        LOG(ERROR, "cannot allocate nl socket");
+        goto end;
+    }
+
+    ret = nl_connect(netbuf_ctx->nlsock, NETLINK_ROUTE);
+    if (ret) {
+        LOG(ERROR, "failed to open netlink socket: %s",
+            nl_geterror(ret));
+        goto end;
+    }
+
+    /* get list of all qdiscs installed on network devs. */
+    ret = rtnl_qdisc_alloc_cache(netbuf_ctx->nlsock,
+                                 &netbuf_ctx->qdisc_cache);
+    if (ret) {
+        LOG(ERROR, "failed to allocate qdisc cache: %s",
+            nl_geterror(ret));
+        goto end;
+    }
+
+    /* list of handles to plug qdiscs */
+    netbuf_ctx->netbuf_qdisc_list =
+        libxl__calloc(gc, num_netbufs,
+                      sizeof(struct rtnl_qdisc *));
+
+    ifb = rtnl_link_alloc();
+
+    for (i = 0; i < num_netbufs; ++i) {
+
+        /* get a handle to the IFB interface */
+        ret = rtnl_link_get_kernel(netbuf_ctx->nlsock, 0,
+                                   ifb_list[i], &ifb);
+        if (ret) {
+            LOG(ERROR, "cannot obtain handle for %s: %s", ifb_list[i],
+                nl_geterror(ret));
+            goto end;
+        }
+
+        ifindex = rtnl_link_get_ifindex(ifb);
+        if (!ifindex) {
+            LOG(ERROR, "interface %s has no index", ifb_list[i]);
+            goto end;
+        }
+
+        /* Get a reference to the root qdisc installed on the IFB, by
+         * querying the qdisc list we obtained earlier. The netbufscript
+         * sets up the plug qdisc as the root qdisc, so we don't have to
+         * search the entire qdisc tree on the IFB dev.
+
+         * There is no need to explicitly free this qdisc as its just a
+         * reference from the qdisc cache we allocated earlier.
+         */
+        qdisc = rtnl_qdisc_get_by_parent(netbuf_ctx->qdisc_cache, ifindex,
+                                         TC_H_ROOT);
+
+        if (qdisc) {
+            const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
+            /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
+            if (!tc_kind || strcmp(tc_kind, "plug")) {
+                LOG(ERROR, "plug qdisc is not installed on %s", ifb_list[i]);
+                goto end;
+            }
+            netbuf_ctx->netbuf_qdisc_list[i] = qdisc;
+        } else {
+            LOG(ERROR, "Cannot get qdisc handle from ifb %s", ifb_list[i]);
+            goto end;
+        }
+    }
+
+    rtnl_link_put(ifb);
+    return 0;
+
+ end:
+    if (ifb) rtnl_link_put(ifb);
+    return ERROR_FAIL;
+}
+
+/* the script needs the following env & args
+ * $vifname
+ * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/)
+ * $IFB (for teardown)
+ * setup/teardown as command line arg.
+ * In return, the script writes the name of IFB device (during setup) to be
+ * used for output buffering into XENBUS_PATH/ifb
+ */
+static int exec_netbuf_script(libxl__gc *gc, libxl__remus_ctx *remus_ctx,
+                              char *op, libxl__ev_child_callback *death)
+{
+    int arraysize, nr = 0;
+    char **env = NULL, **args = NULL;
+    pid_t pid;
+    libxl__ev_child *child = &remus_ctx->child;
+    libxl__ev_time *timeout = &remus_ctx->timeout;
+    char *script = libxl__strdup(gc, remus_ctx->netbufscript);
+    uint32_t domid = remus_ctx->dss->domid;
+    int devid = remus_ctx->num_exec;
+    libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx;
+    char *vif = netbuf_ctx->vif_list[devid];
+    char *ifb = netbuf_ctx->ifb_list[devid];
+
+    if (!strcmp(op, "setup"))
+        arraysize = 5;
+    else
+        arraysize = 7;
+
+    GCNEW_ARRAY(env, arraysize);
+    env[nr++] = "vifname";
+    env[nr++] = vif;
+    env[nr++] = "XENBUS_PATH";
+    env[nr++] = GCSPRINTF("%s/remus/netbuf/%d",
+                          libxl__xs_libxl_path(gc, domid), devid);
+    if (!strcmp(op, "teardown")) {
+        env[nr++] = "IFB";
+        env[nr++] = ifb;
+    }
+    env[nr++] = NULL;
+    assert(nr <= arraysize);
+
+    arraysize = 3; nr = 0;
+    GCNEW_ARRAY(args, arraysize);
+    args[nr++] = script;
+    args[nr++] = op;
+    args[nr++] = NULL;
+    assert(nr == arraysize);
+
+    libxl__ev_child_init(child);
+
+    /* Set hotplug timeout */
+    if (libxl__ev_time_register_rel(gc, timeout,
+                                    netbuf_setup_timeout_cb,
+                                    LIBXL_HOTPLUG_TIMEOUT * 1000)) {
+        LOG(ERROR, "unable to register timeout for "
+            "netbuf setup script %s on vif %s", script, vif);
+        return ERROR_FAIL;
+    }
+
+    LOG(DEBUG, "Calling netbuf script: %s %s on vif %s",
+        script, op, vif);
+
+    /* Fork and exec netbuf script */
+    pid = libxl__ev_child_fork(gc, child, death);
+    if (pid == -1) {
+        LOG(ERROR, "unable to fork netbuf script %s", script);
+        return ERROR_FAIL;
+    }
+
+    if (!pid) {
+        /* child: Launch netbuf script */
+        libxl__exec(gc, -1, -1, -1, args[0], args, env);
+        /* notreached */
+        abort();
+    }
+
+    return 0;
+}
+
+static void netbuf_teardown_script_cb(libxl__egc *egc,
+                                      libxl__ev_child *child,
+                                      pid_t pid, int status)
+{
+    libxl__remus_ctx *remus_ctx = CONTAINER_OF(child, *remus_ctx, child);
+    libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx;
+
+    STATE_AO_GC(remus_ctx->dss->ao);
+
+    if (status) {
+        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
+                                      remus_ctx->netbufscript,
+                                      pid, status);
+    }
+
+    remus_ctx->num_exec++;
+    if (remus_ctx->num_exec < netbuf_ctx->num_netbufs) {
+        if (exec_netbuf_script(gc, remus_ctx,
+                               "teardown", netbuf_teardown_script_cb))
+            goto end;
+        return;
+    }
+
+ end:
+    libxl__remus_teardown_done(egc, remus_ctx->dss);
+}
+
+static void netbuf_setup_script_cb(libxl__egc *egc,
+                                   libxl__ev_child *child,
+                                   pid_t pid, int status)
+{
+    libxl__remus_ctx *remus_ctx = CONTAINER_OF(child, *remus_ctx, child);
+    char *out_path_base, *hotplug_error;
+    uint32_t domid = remus_ctx->dss->domid;
+    int devid = remus_ctx->num_exec;
+    libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx;
+    char *vif = netbuf_ctx->vif_list[devid];
+    char **ifb = &netbuf_ctx->ifb_list[devid];
+    int rc = ERROR_FAIL;
+
+    STATE_AO_GC(remus_ctx->dss->ao);
+
+    libxl__ev_time_deregister(gc, &remus_ctx->timeout);
+
+    out_path_base = GCSPRINTF("%s/remus/netbuf/%d",
+                              libxl__xs_libxl_path(gc, domid), devid);
+
+    hotplug_error = libxl__xs_read(gc, XBT_NULL,
+                                   GCSPRINTF("%s/hotplug-error",
+                                             out_path_base));
+    if (hotplug_error) {
+        LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
+            remus_ctx->netbufscript,
+            netbuf_ctx->vif_list[devid], hotplug_error);
+        goto end;
+    }
+
+    if (status) {
+        libxl_report_child_exitstatus(CTX, LIBXL__LOG_ERROR,
+                                      remus_ctx->netbufscript,
+                                      pid, status);
+        goto end;
+    }
+
+    *ifb = libxl__xs_read(gc, XBT_NULL,
+                          GCSPRINTF("%s/remus/netbuf/%d/ifb",
+                                    libxl__xs_libxl_path(gc, domid),
+                                    devid));
+    if (!(*ifb)) {
+        LOG(ERROR, "Cannot get ifb dev name for domain %u dev %s",
+            domid, vif);
+        goto end;
+    }
+
+    LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
+    remus_ctx->num_exec++;
+    if (remus_ctx->num_exec < netbuf_ctx->num_netbufs) {
+        if (exec_netbuf_script(gc, remus_ctx,
+                               "setup", netbuf_setup_script_cb))
+            goto end;
+        return;
+    }
+
+    rc = init_qdiscs(gc, remus_ctx);
+ end:
+    libxl__remus_setup_done(egc, remus_ctx->dss, rc);
+}
+
+static void netbuf_setup_timeout_cb(libxl__egc *egc,
+                                    libxl__ev_time *ev,
+                                    const struct timeval *requested_abs)
+{
+    libxl__remus_ctx *remus_ctx = CONTAINER_OF(ev, *remus_ctx, timeout);
+    int devid = remus_ctx->num_exec;
+    libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx;
+    char *vif = netbuf_ctx->vif_list[devid];
+
+    STATE_AO_GC(remus_ctx->dss->ao);
+
+    libxl__ev_time_deregister(gc, &remus_ctx->timeout);
+    assert(libxl__ev_child_inuse(&remus_ctx->child));
+
+    LOG(DEBUG, "killing hotplug script %s (on vif %s) because of timeout",
+        remus_ctx->netbufscript, vif);
+
+    if (kill(remus_ctx->child.pid, SIGKILL)) {
+        LOGEV(ERROR, errno, "unable to kill hotplug script %s [%ld]",
+              remus_ctx->netbufscript,
+              (unsigned long)remus_ctx->child.pid);
+    }
+
+    return;
+}
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 1;
+}
+
+/* Scan through the list of vifs belonging to domid and
+ * invoke the netbufscript to setup the IFB device & plug qdisc
+ * for each vif. Then scan through the list of IFB devices to obtain
+ * a handle on the plug qdisc installed on these IFB devices.
+ * Network output buffering is controlled via these qdiscs.
+ */
+void libxl__remus_netbuf_setup(libxl__egc *egc,
+                               libxl__domain_suspend_state *dss)
+{
+    libxl__remus_netbuf_ctx *netbuf_ctx = NULL;
+    uint32_t domid = dss->domid;
+    libxl__remus_ctx *remus_ctx = dss->remus_ctx;
+    int num_netbufs = 0;
+    int rc = ERROR_FAIL;
+
+    STATE_AO_GC(dss->ao);
+
+    netbuf_ctx = libxl__zalloc(gc, sizeof(libxl__remus_netbuf_ctx));
+    netbuf_ctx->vif_list = get_guest_vif_list(gc, domid, &num_netbufs);
+    if (!num_netbufs) {
+        rc = 0;
+        goto end;
+    }
+
+    if (num_netbufs < 0) goto end;
+
+    netbuf_ctx->ifb_list = libxl__calloc(gc, num_netbufs,
+                                         sizeof(char *));
+    netbuf_ctx->num_netbufs = num_netbufs;
+    remus_ctx->netbuf_ctx = netbuf_ctx;
+    remus_ctx->num_exec = 0; //start at devid == 0
+    if (exec_netbuf_script(gc, remus_ctx, "setup",
+                           netbuf_setup_script_cb))
+        goto end;
+    return;
+
+ end:
+    libxl__remus_setup_done(egc, dss, rc);
+}
+
+/* Note: This function will be called in the same gc context as
+ * libxl__remus_netbuf_setup, created during the libxl_domain_remus_start
+ * API call.
+ */
+void libxl__remus_netbuf_teardown(libxl__egc *egc,
+                                  libxl__domain_suspend_state *dss)
+{
+    libxl__remus_ctx *remus_ctx = dss->remus_ctx;
+    libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx;
+    STATE_AO_GC(dss->ao);
+
+    if (netbuf_ctx->qdisc_cache)
+        nl_cache_free(netbuf_ctx->qdisc_cache);
+
+    if (netbuf_ctx->nlsock)
+        nl_close(netbuf_ctx->nlsock);
+
+    remus_ctx->num_exec = 0; //start at devid == 0
+    if (exec_netbuf_script(gc, remus_ctx, "teardown",
+                           netbuf_teardown_script_cb))
+        libxl__remus_teardown_done(egc, dss);
+}
+
+#define TC_BUFFER_START 1
+#define TC_BUFFER_RELEASE 2
+static int remus_netbuf_op(libxl__gc *gc, uint32_t domid,
+			   libxl__remus_ctx *remus_ctx,
+			   int buffer_op)
+{
+    int i, ret;
+    libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx;
+
+    for (i = 0; i < netbuf_ctx->num_netbufs; ++i) {
+        if (buffer_op == TC_BUFFER_START)
+            ret = rtnl_qdisc_plug_buffer(netbuf_ctx->netbuf_qdisc_list[i]);
+        else
+            ret = rtnl_qdisc_plug_release_one(netbuf_ctx->netbuf_qdisc_list[i]);
+
+        if (!ret)
+            ret = rtnl_qdisc_add(netbuf_ctx->nlsock,
+                                 netbuf_ctx->netbuf_qdisc_list[i],
+                                 NLM_F_REQUEST);
+        if (ret) {
+            LOG(ERROR, "Remus: cannot do netbuf op %s on %s:%s",
+                ((buffer_op == TC_BUFFER_START) ?
+                 "start_new_epoch" : "release_prev_epoch"),
+                netbuf_ctx->ifb_list[i], nl_geterror(ret));
+            return ERROR_FAIL;
+        }
+    }
+
+    return 0;
+}
+
+int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid,
+					libxl__remus_ctx *remus_ctx)
+{
+    return remus_netbuf_op(gc, domid, remus_ctx, TC_BUFFER_START);
+}
+
+int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid,
+					   libxl__remus_ctx *remus_ctx)
+{
+    return remus_netbuf_op(gc, domid, remus_ctx, TC_BUFFER_RELEASE);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff -r 4b471809a5bb -r 5b4e029bb79b tools/libxl/libxl_nonetbuffer.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/libxl/libxl_nonetbuffer.c	Mon Nov 18 11:09:34 2013 -0800
@@ -0,0 +1,56 @@
+/*
+ * Copyright (C) 2013
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+    return 0;
+}
+
+/* Remus network buffer related stubs */
+void libxl__remus_netbuf_setup(libxl__egc *egc,
+                               libxl__domain_suspend_state *dss)
+{
+}
+
+void libxl__remus_netbuf_teardown(libxl__egc *egc,
+                                  libxl__domain_suspend_state *dss)
+{
+}
+
+int libxl__remus_netbuf_start_new_epoch(libxl__gc *gc, uint32_t domid,
+					libxl__remus_ctx *remus_ctx)
+{
+    LOG(ERROR, "Remus: No support for network buffering");
+    return ERROR_FAIL;
+}
+
+int libxl__remus_netbuf_release_prev_epoch(libxl__gc *gc, uint32_t domid,
+					   libxl__remus_ctx *remus_ctx)
+{
+    LOG(ERROR, "Remus: No support for network buffering");
+    return ERROR_FAIL;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff -r 4b471809a5bb -r 5b4e029bb79b tools/libxl/libxl_types.idl
--- a/tools/libxl/libxl_types.idl	Mon Nov 18 10:17:35 2013 -0800
+++ b/tools/libxl/libxl_types.idl	Mon Nov 18 11:09:34 2013 -0800
@@ -558,6 +558,8 @@ libxl_domain_remus_info = Struct("domain
     ("interval",     integer),
     ("blackhole",    bool),
     ("compression",  bool),
+    ("netbuf",       bool),
+    ("netbufscript", string),
     ])
 
 libxl_event_type = Enumeration("event_type", [

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3 of 4 V5] tools/libxl: Control network buffering in remus callbacks
  2013-11-18 20:03 [PATCH 0 of 4 V5] Remus/Libxl: Network buffering support Shriram Rajagopalan
  2013-11-18 20:03 ` [PATCH 1 of 4 V5] remus: add libnl3 dependency to autoconf scripts Shriram Rajagopalan
  2013-11-18 20:03 ` [PATCH 2 of 4 V5] tools/libxl: Remus network buffering - hotplug scripts and setup code Shriram Rajagopalan
@ 2013-11-18 20:03 ` Shriram Rajagopalan
  2013-11-25 15:38   ` Ian Jackson
  2013-11-18 20:03 ` [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch Shriram Rajagopalan
  2013-11-19 16:23 ` [PATCH 0 of 4 V5] Remus/Libxl: Network buffering support Ian Campbell
  4 siblings, 1 reply; 22+ messages in thread
From: Shriram Rajagopalan @ 2013-11-18 20:03 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Roger Pau Monne, Ian Jackson, Ian Campbell,
	Stefano Stabellini

# HG changeset patch
# User Shriram Rajagopalan <rshriram@cs.ubc.ca>
# Date 1384801802 28800
# Node ID 94eea030e0095ebd62bd6b40e02a819fdb778c82
# Parent  5b4e029bb79b8db4210a182911a4ff7ec7957c4f
tools/libxl: Control network buffering in remus callbacks

This patch constitutes the core network buffering logic.
and does the following:
 a) create a new network buffer when the domain is suspended
    (remus_domain_suspend_callback)
 b) release the previous network buffer pertaining to the
    committed checkpoint (remus_domain_checkpoint_dm_saved)

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>

diff -r 5b4e029bb79b -r 94eea030e009 tools/libxl/libxl_dom.c
--- a/tools/libxl/libxl_dom.c	Mon Nov 18 11:09:34 2013 -0800
+++ b/tools/libxl/libxl_dom.c	Mon Nov 18 11:10:02 2013 -0800
@@ -1277,8 +1277,27 @@ void libxl__remus_teardown_done(libxl__e
 
 static int libxl__remus_domain_suspend_callback(void *data)
 {
-    /* REMUS TODO: Issue disk and network checkpoint reqs. */
-    return libxl__domain_suspend_common_callback(data);
+    libxl__save_helper_state *shs = data;
+    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
+    libxl__remus_ctx *remus_ctx = dss->remus_ctx;
+    STATE_AO_GC(dss->ao);
+
+    /* REMUS TODO: Issue disk checkpoint reqs. */
+    int ok = libxl__domain_suspend_common_callback(data);
+
+    if (!remus_ctx->netbuf_ctx || !ok) goto out;
+
+    /* The domain was suspended successfully. Start a new network
+     * buffer for the next epoch. If this operation fails, then act
+     * as though domain suspend failed -- libxc exits its infinite
+     * loop and ultimately, the replication stops.
+     */
+    if (libxl__remus_netbuf_start_new_epoch(gc, dss->domid,
+                                            remus_ctx))
+        ok = 0;
+
+ out:
+    return ok;
 }
 
 static int libxl__remus_domain_resume_callback(void *data)
@@ -1291,7 +1310,7 @@ static int libxl__remus_domain_resume_ca
     if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
         return 0;
 
-    /* REMUS TODO: Deal with disk. Start a new network output buffer */
+    /* REMUS TODO: Deal with disk. */
     return 1;
 }
 
@@ -1300,6 +1319,9 @@ static int libxl__remus_domain_resume_ca
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc);
 
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs);
+
 static void libxl__remus_domain_checkpoint_callback(void *data)
 {
     libxl__save_helper_state *shs = data;
@@ -1318,10 +1340,51 @@ static void libxl__remus_domain_checkpoi
 static void remus_checkpoint_dm_saved(libxl__egc *egc,
                                       libxl__domain_suspend_state *dss, int rc)
 {
-    /* REMUS TODO: Wait for disk and memory ack, release network buffer */
-    /* REMUS TODO: make this asynchronous */
-    assert(!rc); /* REMUS TODO handle this error properly */
-    usleep(dss->remus_ctx->interval * 1000);
+    /*
+     * REMUS TODO: Wait for disk and explicit memory ack (through restore
+     * callback from remote) before releasing network buffer.
+     */
+    libxl__remus_ctx *remus_ctx = dss->remus_ctx;
+    STATE_AO_GC(dss->ao);
+
+    if (rc) {
+        LOG(ERROR, "Failed to save device model. Terminating Remus..");
+        goto out;
+    }
+
+    if (remus_ctx->netbuf_ctx) {
+        rc = libxl__remus_netbuf_release_prev_epoch(gc, dss->domid,
+                                                     remus_ctx);
+        if (rc) {
+            LOG(ERROR, "Failed to release network buffer."
+                " Terminating Remus..");
+            goto out;
+        }
+    }
+
+    /* Set checkpoint interval timeout */
+    rc = libxl__ev_time_register_rel(gc, &remus_ctx->timeout,
+                                     remus_next_checkpoint,
+                                     dss->remus_ctx->interval);
+    if (rc) {
+        LOG(ERROR, "unable to register timeout for next epoch."
+            " Terminating Remus..");
+        goto out;
+    }
+    return;
+
+ out:
+    libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+                                  const struct timeval *requested_abs)
+{
+    libxl__remus_ctx *remus_ctx = CONTAINER_OF(ev, *remus_ctx, timeout);
+    libxl__domain_suspend_state *dss = remus_ctx->dss;
+    STATE_AO_GC(dss->ao);
+
+    libxl__ev_time_deregister(gc, &remus_ctx->timeout);
     libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
 }

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch
  2013-11-18 20:03 [PATCH 0 of 4 V5] Remus/Libxl: Network buffering support Shriram Rajagopalan
                   ` (2 preceding siblings ...)
  2013-11-18 20:03 ` [PATCH 3 of 4 V5] tools/libxl: Control network buffering in remus callbacks Shriram Rajagopalan
@ 2013-11-18 20:03 ` Shriram Rajagopalan
  2013-11-25 15:37   ` Ian Jackson
  2013-11-19 16:23 ` [PATCH 0 of 4 V5] Remus/Libxl: Network buffering support Ian Campbell
  4 siblings, 1 reply; 22+ messages in thread
From: Shriram Rajagopalan @ 2013-11-18 20:03 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Roger Pau Monne, Ian Jackson, Ian Campbell,
	Stefano Stabellini

# HG changeset patch
# User Shriram Rajagopalan <rshriram@cs.ubc.ca>
# Date 1384804015 28800
# Node ID 3c11efb5e8fef9f070272edefed47f31b7c619ef
# Parent  94eea030e0095ebd62bd6b40e02a819fdb778c82
tools/xl: Remus - Network buffering cmdline switch

Command line switch to 'xl remus' command, to enable network buffering.
Pass on this flag to libxl so that it can act accordingly.
Also update man pages to reflect the addition of a new option to
'xl remus' command.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>

diff -r 94eea030e009 -r 3c11efb5e8fe docs/man/xl.conf.pod.5
--- a/docs/man/xl.conf.pod.5	Mon Nov 18 11:10:02 2013 -0800
+++ b/docs/man/xl.conf.pod.5	Mon Nov 18 11:46:55 2013 -0800
@@ -105,6 +105,12 @@ Configures the default gateway device to
 
 Default: C<None>
 
+=item B<remus.default.netbufscript="PATH">
+
+Configures the default script used by Remus to setup network buffering.
+
+Default: C</etc/xen/scripts/remus-netbuf-setup>
+
 =item B<output_format="json|sxp">
 
 Configures the default output format used by xl when printing "machine
diff -r 94eea030e009 -r 3c11efb5e8fe docs/man/xl.pod.1
--- a/docs/man/xl.pod.1	Mon Nov 18 11:10:02 2013 -0800
+++ b/docs/man/xl.pod.1	Mon Nov 18 11:46:55 2013 -0800
@@ -398,8 +398,7 @@ Print huge (!) amount of debug during th
 Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
 mechanism between the two hosts.
 
-N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
-     There is no support for network or disk buffering at the moment.
+N.B: There is no support for disk buffering at the moment.
 
 B<OPTIONS>
 
@@ -418,6 +417,18 @@ Generally useful for debugging.
 
 Disable memory checkpoint compression.
 
+=item B<-n>
+
+Enable network output buffering.  The default script used to configure
+network buffering is /etc/xen/scripts/remus-netbuf-setup. If you wish to
+use a custom script, use the I<-N> option or set the global variable
+I<remus.default.netbufscript> in /etc/xen/xl.conf to point to your script.
+
+=item B<-N> I<netbufscript>
+
+Use <netbufscript> to setup network buffering instead of the instead of
+the default (/etc/xen/scripts/remus-netbuf-setup).
+
 =item B<-s> I<sshcommand>
 
 Use <sshcommand> instead of ssh.  String will be passed to sh.
diff -r 94eea030e009 -r 3c11efb5e8fe tools/libxl/xl.c
--- a/tools/libxl/xl.c	Mon Nov 18 11:10:02 2013 -0800
+++ b/tools/libxl/xl.c	Mon Nov 18 11:46:55 2013 -0800
@@ -46,6 +46,7 @@ char *default_vifscript = NULL;
 char *default_bridge = NULL;
 char *default_gatewaydev = NULL;
 char *default_vifbackend = NULL;
+char *default_remus_netbufscript = NULL;
 enum output_format default_output_format = OUTPUT_FORMAT_JSON;
 int claim_mode = 1;
 
@@ -177,6 +178,9 @@ static void parse_global_config(const ch
     if (!xlu_cfg_get_long (config, "claim_mode", &l, 0))
         claim_mode = l;
 
+    xlu_cfg_replace_string (config, "remus.default.netbufscript",
+                            &default_remus_netbufscript, 0);
+
     xlu_cfg_destroy(config);
 }
 
diff -r 94eea030e009 -r 3c11efb5e8fe tools/libxl/xl.h
--- a/tools/libxl/xl.h	Mon Nov 18 11:10:02 2013 -0800
+++ b/tools/libxl/xl.h	Mon Nov 18 11:46:55 2013 -0800
@@ -152,6 +152,7 @@ extern char *default_vifscript;
 extern char *default_bridge;
 extern char *default_gatewaydev;
 extern char *default_vifbackend;
+extern char *default_remus_netbufscript;
 extern char *blkdev_start;
 
 enum output_format {
diff -r 94eea030e009 -r 3c11efb5e8fe tools/libxl/xl_cmdimpl.c
--- a/tools/libxl/xl_cmdimpl.c	Mon Nov 18 11:10:02 2013 -0800
+++ b/tools/libxl/xl_cmdimpl.c	Mon Nov 18 11:46:55 2013 -0800
@@ -7116,8 +7116,9 @@ int main_remus(int argc, char **argv)
     r_info.interval = 200;
     r_info.blackhole = 0;
     r_info.compression = 1;
-
-    SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) {
+    r_info.netbuf = 0;
+
+    SWITCH_FOREACH_OPT(opt, "buni:s:N:e", NULL, "remus", 2) {
     case 'i':
         r_info.interval = atoi(optarg);
         break;
@@ -7127,6 +7128,12 @@ int main_remus(int argc, char **argv)
     case 'u':
         r_info.compression = 0;
         break;
+    case 'n':
+        r_info.netbuf = 1;
+        break;
+    case 'N':
+        r_info.netbufscript = optarg;
+        break;
     case 's':
         ssh_command = optarg;
         break;
@@ -7138,6 +7145,9 @@ int main_remus(int argc, char **argv)
     domid = find_domain(argv[optind]);
     host = argv[optind + 1];
 
+    if(!r_info.netbufscript)
+        r_info.netbufscript = default_remus_netbufscript;
+
     if (r_info.blackhole) {
         send_fd = open("/dev/null", O_RDWR, 0644);
         if (send_fd < 0) {
@@ -7175,13 +7185,19 @@ int main_remus(int argc, char **argv)
     /* Point of no return */
     rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd, 0);
 
-    /* If we are here, it means backup has failed/domain suspend failed.
-     * Try to resume the domain and exit gracefully.
+    /* check if the domain exists. User may have xl destroyed the
+     * domain to force failover
+     */
+    if (libxl_domain_info(ctx, 0, domid)) {
+        fprintf(stderr, "Remus: Primary domain has been destroyed.\n");
+        close(send_fd);
+        return 0;
+    }
+
+    /* If we are here, it means remus setup/domain suspend/backup has
+     * failed. Try to resume the domain and exit gracefully.
      * TODO: Split-Brain check.
      */
-    fprintf(stderr, "remus sender: libxl_domain_suspend failed"
-            " (rc=%d)\n", rc);
-
     if (rc == ERROR_GUEST_TIMEDOUT)
         fprintf(stderr, "Failed to suspend domain at primary.\n");
     else {
diff -r 94eea030e009 -r 3c11efb5e8fe tools/libxl/xl_cmdtable.c
--- a/tools/libxl/xl_cmdtable.c	Mon Nov 18 11:10:02 2013 -0800
+++ b/tools/libxl/xl_cmdtable.c	Mon Nov 18 11:46:55 2013 -0800
@@ -481,6 +481,9 @@ struct cmd_spec cmd_table[] = {
       "-i MS                   Checkpoint domain memory every MS milliseconds (def. 200ms).\n"
       "-b                      Replicate memory checkpoints to /dev/null (blackhole)\n"
       "-u                      Disable memory checkpoint compression.\n"
+      "-n                      Enable network output buffering.\n"
+      "-N <netbufscript>       Use netbufscript to setup network buffering instead of the\n"
+      "                        instead of the default (/etc/xen/scripts/remus-netbuf-setup).\n"
       "-s <sshcommand>         Use <sshcommand> instead of ssh.  String will be passed\n"
       "                        to sh. If empty, run <host> instead of \n"
       "                        ssh <host> xl migrate-receive -r [-e]\n"

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0 of 4 V5] Remus/Libxl: Network buffering support
  2013-11-18 20:03 [PATCH 0 of 4 V5] Remus/Libxl: Network buffering support Shriram Rajagopalan
                   ` (3 preceding siblings ...)
  2013-11-18 20:03 ` [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch Shriram Rajagopalan
@ 2013-11-19 16:23 ` Ian Campbell
  2013-11-19 17:58   ` Shriram Rajagopalan
  4 siblings, 1 reply; 22+ messages in thread
From: Ian Campbell @ 2013-11-19 16:23 UTC (permalink / raw)
  To: Shriram Rajagopalan
  Cc: Stefano Stabellini, Andrew Cooper, Ian Jackson, George Dunlap,
	xen-devel, Roger Pau Monne

On Mon, 2013-11-18 at 12:03 -0800, Shriram Rajagopalan wrote:
> This patch series adds support for network buffering in the Remus
> codebase in libxl.  NB: This series does not contain the
> bug fix related to usleep calls in libxl__domain_suspend.
> I will send it out later as a separate series.
> 
> Changes in V5:
> 
> Merge hotplug script patch (2/5) and hotplug script setup/teardown
> patch (3/5) into a single patch.

This has dropped to 4 patches vs 7 in v4. One of the delta is this
merging, what happened to the other two? Especially given that Ian J
asked for #7 to be split last time around.

>From an initial glance this looks like code this time around is that
right? I know some stuff is to come later but IIRC that wasn't in the
previous series either.

>From a release point of view (George CCd) it looks like it only touches
Remus code other than a few little changes in the generic save/restore
code which are guarded by (in effect) if (remus). Is that correct?

If the only risk is to Remus users and the plus side is a far more
useful Remus then I'd say we should make a freeze exception, subject to
the code being reviewed of course.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0 of 4 V5] Remus/Libxl: Network buffering support
  2013-11-19 16:23 ` [PATCH 0 of 4 V5] Remus/Libxl: Network buffering support Ian Campbell
@ 2013-11-19 17:58   ` Shriram Rajagopalan
  2013-11-24  6:51     ` Shriram Rajagopalan
  0 siblings, 1 reply; 22+ messages in thread
From: Shriram Rajagopalan @ 2013-11-19 17:58 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Stefano Stabellini, Andrew Cooper, Ian Jackson, George Dunlap,
	xen-devel@lists.xen.org, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 1724 bytes --]

On Tuesday, November 19, 2013, Ian Campbell wrote:

> On Mon, 2013-11-18 at 12:03 -0800, Shriram Rajagopalan wrote:
> > This patch series adds support for network buffering in the Remus
> > codebase in libxl.  NB: This series does not contain the
> > bug fix related to usleep calls in libxl__domain_suspend.
> > I will send it out later as a separate series.
> >
> > Changes in V5:
> >
> > Merge hotplug script patch (2/5) and hotplug script setup/teardown
> > patch (3/5) into a single patch.
>
> This has dropped to 4 patches vs 7 in v4. One of the delta is this
> merging, what happened to the other two? Especially given that Ian J
> asked for #7 to be split last time around.
>
>
Two patches in v4 are related solely to the bug fix. The first was by Ian
while the other was by me, which Ian J asked me to split.

>From an initial glance this looks like code this time around is that
> right? I know some stuff is to come later but IIRC that wasn't in the
> previous series either.
>
>
This is basically same as v3, with all the feedback.


> >From a release point of view (George CCd) it looks like it only touches
> Remus code other than a few little changes in the generic save/restore
> code which are guarded by (in effect) if (remus). Is that correct?


Yep.


>
> If the only risk is to Remus users and the plus side is a far more
> useful Remus then I'd say we should make a freeze exception, subject to
> the code being reviewed of course.


Yes, these patches are confined totally to Remus' functionality in libxl.

The other bug fix series otoh may need more testing as it touches domain
suspend -- something core to both live migration and Remus. But that is a
bug fix, not a feature addition.


> Ian.
>
>

[-- Attachment #1.2: Type: text/html, Size: 2612 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0 of 4 V5] Remus/Libxl: Network buffering support
  2013-11-19 17:58   ` Shriram Rajagopalan
@ 2013-11-24  6:51     ` Shriram Rajagopalan
  0 siblings, 0 replies; 22+ messages in thread
From: Shriram Rajagopalan @ 2013-11-24  6:51 UTC (permalink / raw)
  To: Ian Campbell, George Dunlap
  Cc: Andrew Cooper, Stefano Stabellini, Ian Jackson, Roger Pau Monne,
	xen-devel@lists.xen.org

On Tue, Nov 19, 2013 at 11:58 AM, Shriram Rajagopalan
<rshriram@cs.ubc.ca> wrote:
> On Tuesday, November 19, 2013, Ian Campbell wrote:
>>
>> On Mon, 2013-11-18 at 12:03 -0800, Shriram Rajagopalan wrote:
>> > This patch series adds support for network buffering in the Remus
>> > codebase in libxl.  NB: This series does not contain the
>> > bug fix related to usleep calls in libxl__domain_suspend.
>> > I will send it out later as a separate series.
>> >
>> > Changes in V5:
>> >
>> > Merge hotplug script patch (2/5) and hotplug script setup/teardown
>> > patch (3/5) into a single patch.
>>
>> This has dropped to 4 patches vs 7 in v4. One of the delta is this
>> merging, what happened to the other two? Especially given that Ian J
>> asked for #7 to be split last time around.
>>
>
> Two patches in v4 are related solely to the bug fix. The first was by Ian
> while the other was by me, which Ian J asked me to split.
>
>> >From an initial glance this looks like code this time around is that
>> right? I know some stuff is to come later but IIRC that wasn't in the
>> previous series either.
>>
>
> This is basically same as v3, with all the feedback.
>
>>
>> >From a release point of view (George CCd) it looks like it only touches
>> Remus code other than a few little changes in the generic save/restore
>> code which are guarded by (in effect) if (remus). Is that correct?
>
>
> Yep.
>
>>
>>
>> If the only risk is to Remus users and the plus side is a far more
>> useful Remus then I'd say we should make a freeze exception, subject to
>> the code being reviewed of course.
>
>
> Yes, these patches are confined totally to Remus' functionality in libxl.
>
> The other bug fix series otoh may need more testing as it touches domain
> suspend -- something core to both live migration and Remus. But that is a
> bug fix, not a feature addition.
>
>>
>> Ian.
>>
>

ping!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2 of 4 V5] tools/libxl: Remus network buffering - hotplug scripts and setup code
  2013-11-18 20:03 ` [PATCH 2 of 4 V5] tools/libxl: Remus network buffering - hotplug scripts and setup code Shriram Rajagopalan
@ 2013-11-25 15:32   ` Ian Jackson
  2013-12-19 19:37     ` Shriram Rajagopalan
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Jackson @ 2013-11-25 15:32 UTC (permalink / raw)
  To: Shriram Rajagopalan
  Cc: Andrew Cooper, Roger Pau Monne, Stefano Stabellini, Ian Campbell,
	xen-devel

Shriram Rajagopalan writes ("[PATCH 2 of 4 V5] tools/libxl: Remus network buffering - hotplug scripts and setup code"):
> tools/libxl: Remus network buffering - hotplug scripts and setup code

Thanks.  This is going in the right direction, but I think there's
still some way to go.


One thing that makes reviewing it particularly difficult is the
physical ordering of the new code in your patch.

The callback programming model makes it easy to write spaghetti logic;
to counter this we try to have a firm structure to each set of event
handling code:

 * Firstly, function forward declarations, types, structs, etc.

 * Then helper functions which have a simple "call this and it
   does something and returns" control flow.

 * Then the main body of the event callbacks, in the order in
   which they will normally be called.  Obviously there is some
   judgement needed because the call flow won't always be entirely
   linear.  But keeping the layout as linear as possible means that
   one is never hunting for the next function.

 * Call flow of a single operation does not bounce back and forth
   between files.  If this is necessary a formal sub-operation is
   defined (see how it's done with the bootloader, for example).

I think you would be able to structure the code this way.  Would you
do that please ?

I think you may need to move several of the entrypoints into a new
"remus" file.  The "netbuf" file may need to become a more formally
separated part of the suspend/resume function.

We also try to break out sub-operations, giving them their own types
and callbacks.


> diff -r 4b471809a5bb -r 5b4e029bb79b tools/libxl/Makefile
> --- a/tools/libxl/Makefile	Mon Nov 18 10:17:35 2013 -0800
> +++ b/tools/libxl/Makefile	Mon Nov 18 11:09:34 2013 -0800
> @@ -42,6 +42,13 @@ LIBXL_OBJS-y += libxl_blktap2.o
> -static void remus_failover_cb(libxl__egc *egc,
> -                              libxl__domain_suspend_state *dss, int rc);
> +static void remus_replication_failure_cb(libxl__egc *egc,
> +                                         libxl__domain_suspend_state *dss,
> +                                         int rc);
...
> -    dss->remus = info;

This change from dss->remus to dss->remus_ctx needs a mention in the
commit message.  To be honest, I'm not sure why you don't just change
the type of the remus member.

> diff -r 4b471809a5bb -r 5b4e029bb79b tools/libxl/libxl_internal.h
> --- a/tools/libxl/libxl_internal.h	Mon Nov 18 10:17:35 2013 -0800
> +++ b/tools/libxl/libxl_internal.h	Mon Nov 18 11:09:34 2013 -0800
> @@ -2291,6 +2291,50 @@ typedef struct libxl__logdirty_switch {
>      libxl__ev_time timeout;
>  } libxl__logdirty_switch;
>  
> +typedef struct libxl__remus_ctx {
> +    /* checkpoint interval */
> +    int interval;
> +    int blackhole;
> +    int compression;
> +    int saved_rc;
> +    /* Script to setup/teardown network buffers */
> +    const char *netbufscript;
> +    /* Opaque context containing network buffer related stuff */
> +    void *netbuf_ctx;
> +    libxl__domain_suspend_state *dss;
> +    libxl__ev_time timeout;
> +    libxl__ev_child child;
> +    int num_exec;
> +} libxl__remus_ctx;

I think you should probably call this a "libxl__remus_state", to
correspond to all of the other "libxl__foo_state" structs.

You also need to specify in comments who fills in what parts of it, as
is done in the other libxl__foo_state's.


> diff -r 4b471809a5bb -r 5b4e029bb79b tools/libxl/libxl_netbuffer.c
> --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
> +++ b/tools/libxl/libxl_netbuffer.c	Mon Nov 18 11:09:34 2013 -0800
> @@ -0,0 +1,516 @@
...
> +typedef struct libxl__remus_netbuf_ctx {
> +    struct rtnl_qdisc **netbuf_qdisc_list;
> +    struct nl_sock *nlsock;
> +    struct nl_cache *qdisc_cache;
> +    char **vif_list;
> +    char **ifb_list;
> +    uint32_t num_netbufs;
> +    uint32_t unused;
> +} libxl__remus_netbuf_ctx;

Presumably this wants to be a libxl__remus_netbuf_state, then.

The callbacks from the remus and netbuf code to the suspend code
should not be hardwired.  Instead, function pointers should be passed,
the way the domain creation stuff does with a bootloader, and the way
the bootloader does with openpty.

This may seem like extra complication if there is only one caller and
it will always pass the same function, but it makes the logical
separation much clearer.  It's analogous to the separation between an
ordinary function and its caller, even if there is in fact only one
call site.

That is, functions like this

+_hidden void libxl__remus_setup_done(libxl__egc *egc,
+                                     libxl__domain_suspend_state *dss,
+                                     int rc);

should be private functions in the remus file which make the
appropriate callback.


> +/* If the device has a vifname, then use that instead of
> + * the vifX.Y format.
> + */
> +static char *get_vifname(libxl__gc *gc, uint32_t domid,
> +                               libxl_device_nic *nic)
> +{
> +    const char *vifname = NULL;
> +    vifname = libxl__xs_read(gc, XBT_NULL,
> +                             libxl__sprintf(gc,
> +                                            "%s/backend/vif/%d/%d/vifname",
> +                                            libxl__xs_get_dompath(gc, 0),
> +                                            domid, nic->devid));

Why not libxl__xs_read_checked ?

> +    if (!vifname) {

Surely this error handling is wrong.  NULL here might mean "failed" or
ENOENT.  If it means failed then get_vifname needs to return the error
to its caller, by returning NULL.

> +        vifname = libxl__device_nic_devname(gc, domid,
> +                                            nic->devid,
> +                                            nic->nictype);
> +    }
> +
> +    return libxl__strdup(gc, vifname);

Why this strdup ?

> +static char **get_guest_vif_list(libxl__gc *gc, uint32_t domid,
> +                                 int *num_vifs)
> +{
> +    libxl_device_nic *nics = NULL;
> +    int nb, i = 0;
> +    char **vif_list = NULL;
> +
> +    *num_vifs = 0;
> +    nics = libxl_device_nic_list(CTX, domid, &nb);
> +    if (!nics) return NULL;
> +
> +    /* Ensure that none of the vifs are backed by driver domains */
> +    for (i = 0; i < nb; i++) {
> +        if (nics[i].backend_domid != LIBXL_TOOLSTACK_DOMID) {
> +            LOG(ERROR, "vif %s has driver domain (%u) as its backend.\n"
> +                "Network buffering is not supported with driver domains",

The LOG macros should not normally be passed messages with newlines
in.

> +                get_vifname(gc, domid, &nics[i]), nics[i].backend_domid);
> +            *num_vifs = -1;

This calling convention is strange.  I think get_guest_vif_list should
return a libxl rc value.

> +    vif_list = libxl__calloc(gc, nb, sizeof(char *));
> +    for (i = 0; i < nb; ++i)
> +        vif_list[i] = get_vifname(gc, domid, &nics[i]);

What if get_vifname fails ?

...

> +
> +static int init_qdiscs(libxl__gc *gc,
> +                       libxl__remus_ctx *remus_ctx)
> +{
> +    int i, ret, ifindex;
> +    struct rtnl_link *ifb = NULL;
> +    struct rtnl_qdisc *qdisc = NULL;
...
> +    /* convenience vars */

We use the term "Convenience aliases".  It's best to use the exact
same terminology as elsewhere.

> +    libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx;
> +    int num_netbufs = netbuf_ctx->num_netbufs;
> +    char **ifb_list = netbuf_ctx->ifb_list;

These should all be declared const.

> +    /* list of handles to plug qdiscs */
> +    netbuf_ctx->netbuf_qdisc_list =
> +        libxl__calloc(gc, num_netbufs,
> +                      sizeof(struct rtnl_qdisc *));

Please use GCNEW_ARRAY.

> +        if (qdisc) {
> +            const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
> +            /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
> +            if (!tc_kind || strcmp(tc_kind, "plug")) {
> +                LOG(ERROR, "plug qdisc is not installed on %s", ifb_list[i]);

You seem to assume that all the failures from rtnl_tc_get_kind are the
due to the expected cause.  You need to check errno, I think.

> +    rtnl_link_put(ifb);
> +    return 0;
> +
> + end:

We call these "out", not "end".  This should be changed everywhere.

> +    if (ifb) rtnl_link_put(ifb);
> +    return ERROR_FAIL;

Can't this leak netbuf_ctx->nlsock and netbuf_ctx->qdisc_cache ?


> +}
> +
> +/* the script needs the following env & args
> + * $vifname
> + * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/)
> + * $IFB (for teardown)
> + * setup/teardown as command line arg.
> + * In return, the script writes the name of IFB device (during setup) to be
> + * used for output buffering into XENBUS_PATH/ifb
> + */

This information should be somewhere else, not buried in the libxl
source code.  For now, could it be in the one script which you
actually supply ?


> +static int exec_netbuf_script(libxl__gc *gc, libxl__remus_ctx *remus_ctx,
> +                              char *op, libxl__ev_child_callback *death)
> +{
> +    int arraysize, nr = 0;
> +    char **env = NULL, **args = NULL;
> +    pid_t pid;

This code has many important similarities to the device_hotplug
functions in libxl_device.c.

For example your timeout callback is pretty much identical to
device_hotplug_timeout_cb.

I think you should consider whether you can use libxl__ao_device, and
perhaps share other parts of that code.


> +    libxl__ev_child_init(child);

This should be done where the remus state is first populated, not
here.


> +    char *script = libxl__strdup(gc, remus_ctx->netbufscript);

These next lot are all convenience aliases AFIACT.  This should be
mentioned, and they should all be const:

> +    libxl__ev_child *child = &remus_ctx->child;
> +    libxl__ev_time *timeout = &remus_ctx->timeout;
> +    uint32_t domid = remus_ctx->dss->domid;
> +    int devid = remus_ctx->num_exec;
> +    libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx;
> +    char *vif = netbuf_ctx->vif_list[devid];
> +    char *ifb = netbuf_ctx->ifb_list[devid];

So instead, something like:

  +    libxl__ev_child *const child = &remus_ctx->child;
...
  +    const uint32_t domid = remus_ctx->dss->domid;
...
  +    char *const ifb = netbuf_ctx->ifb_list[devid];

There are other places in the code with a similar pattern which also
need to be changed.


> +    if (!strcmp(op, "setup"))
> +        arraysize = 5;
> +    else
> +        arraysize = 7;

Perhaps we could do away with this if() and just always allocate 7 ?

> +    GCNEW_ARRAY(env, arraysize);
> +    env[nr++] = "vifname";
> +    env[nr++] = vif;
> +    env[nr++] = "XENBUS_PATH";
> +    env[nr++] = GCSPRINTF("%s/remus/netbuf/%d",
> +                          libxl__xs_libxl_path(gc, domid), devid);
> +    if (!strcmp(op, "teardown")) {

I think you should pass op as an enum, and convert it to a string
right at the end.  That would do away with these strcmps.

> +    remus_ctx->num_exec++;

num_exec is a misleading variable name here.  In libxl_linux.c and
libxl_device.c it refers to the number of times we have run the
hotplug script _for each device_, not the iteration over the number of
devices.

> +static void netbuf_setup_script_cb(libxl__egc *egc,
> +                                   libxl__ev_child *child,
> +                                   pid_t pid, int status)
> +{
...
> +    hotplug_error = libxl__xs_read(gc, XBT_NULL,
> +                                   GCSPRINTF("%s/hotplug-error",
> +                                             out_path_base));

Again, libxl__xs_read_checked (several times) ?

> +static void netbuf_setup_timeout_cb(libxl__egc *egc,
> +                                    libxl__ev_time *ev,
> +                                    const struct timeval *requested_abs)
> +{

This seems to be yet another copy of device_hotplug_timeout_cb.

> +void libxl__remus_netbuf_setup(libxl__egc *egc,
> +                               libxl__domain_suspend_state *dss)
> +{
> +    libxl__remus_netbuf_ctx *netbuf_ctx = NULL;
> +    uint32_t domid = dss->domid;
> +    libxl__remus_ctx *remus_ctx = dss->remus_ctx;
> +    int num_netbufs = 0;
> +    int rc = ERROR_FAIL;
...
> +    netbuf_ctx = libxl__zalloc(gc, sizeof(libxl__remus_netbuf_ctx));

Please use GCNEW, not libxl__zalloc, when you can.

> +/* Note: This function will be called in the same gc context as
> + * libxl__remus_netbuf_setup, created during the libxl_domain_remus_start
> + * API call.
> + */
> +void libxl__remus_netbuf_teardown(libxl__egc *egc,
> +                                  libxl__domain_suspend_state *dss)
> +{
...
> +    remus_ctx->num_exec = 0; //start at devid == 0
> +    if (exec_netbuf_script(gc, remus_ctx, "teardown",
> +                           netbuf_teardown_script_cb))
> +        libxl__remus_teardown_done(egc, dss);

I think the return value from exec_netbuf_script should be put into an
rc variable.  And we don't correctly propagate the rc here AFAICT.

> +#define TC_BUFFER_START 1
> +#define TC_BUFFER_RELEASE 2

Are these #defines not supposed to come from some Linux header file ?
I don't think having them as constants here can be right.


Thanks,
Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch
  2013-11-18 20:03 ` [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch Shriram Rajagopalan
@ 2013-11-25 15:37   ` Ian Jackson
  2013-12-12  5:45     ` Shriram Rajagopalan
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Jackson @ 2013-11-25 15:37 UTC (permalink / raw)
  To: Shriram Rajagopalan
  Cc: Andrew Cooper, Roger Pau Monne, Stefano Stabellini, Ian Campbell,
	xen-devel

Shriram Rajagopalan writes ("[PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch"):
> tools/xl: Remus - Network buffering cmdline switch
> 
> Command line switch to 'xl remus' command, to enable network buffering.
> Pass on this flag to libxl so that it can act accordingly.
> Also update man pages to reflect the addition of a new option to
> 'xl remus' command.

Shouldn't enabling network buffering be the default ?

Is it really useful to have a command-line option to change the script ?

> diff -r 94eea030e009 -r 3c11efb5e8fe docs/man/xl.pod.1
> --- a/docs/man/xl.pod.1	Mon Nov 18 11:10:02 2013 -0800
> +++ b/docs/man/xl.pod.1	Mon Nov 18 11:46:55 2013 -0800
> @@ -398,8 +398,7 @@ Print huge (!) amount of debug during th
>  Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
>  mechanism between the two hosts.
>  
> -N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
> -     There is no support for network or disk buffering at the moment.
> +N.B: There is no support for disk buffering at the moment.

I think you need to keep the "experimental (proof-of-concept)" note.
Without disk buffering, surely any VM which uses remus might corrupt
its disk ?

> +=item B<-n>
> +
> +Enable network output buffering.  The default script used to configure
> +network buffering is /etc/xen/scripts/remus-netbuf-setup. If you wish to
> +use a custom script, use the I<-N> option or set the global variable
> +I<remus.default.netbufscript> in /etc/xen/xl.conf to point to your script.

There is no need to mention the default script again in this
paragraph.

I seem to have missed something, but should it be possible to specify
the script individually per domain ?

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3 of 4 V5] tools/libxl: Control network buffering in remus callbacks
  2013-11-18 20:03 ` [PATCH 3 of 4 V5] tools/libxl: Control network buffering in remus callbacks Shriram Rajagopalan
@ 2013-11-25 15:38   ` Ian Jackson
  0 siblings, 0 replies; 22+ messages in thread
From: Ian Jackson @ 2013-11-25 15:38 UTC (permalink / raw)
  To: Shriram Rajagopalan
  Cc: Andrew Cooper, Roger Pau Monne, Stefano Stabellini, Ian Campbell,
	xen-devel

Shriram Rajagopalan writes ("[PATCH 3 of 4 V5] tools/libxl: Control network buffering in remus callbacks"):
> tools/libxl: Control network buffering in remus callbacks

I had a quick look at this and some of my style/structure comments
on 2/4 seem to apply to this too.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch
  2013-11-25 15:37   ` Ian Jackson
@ 2013-12-12  5:45     ` Shriram Rajagopalan
  2013-12-12 11:07       ` Ian Campbell
  2013-12-12 15:12       ` Ian Jackson
  0 siblings, 2 replies; 22+ messages in thread
From: Shriram Rajagopalan @ 2013-12-12  5:45 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Andrew Cooper, Roger Pau Monne, Stefano Stabellini, Ian Campbell,
	xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 2538 bytes --]

On Mon, Nov 25, 2013 at 7:37 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:

> Shriram Rajagopalan writes ("[PATCH 4 of 4 V5] tools/xl: Remus - Network
> buffering cmdline switch"):
> > tools/xl: Remus - Network buffering cmdline switch
> >
> > Command line switch to 'xl remus' command, to enable network buffering.
> > Pass on this flag to libxl so that it can act accordingly.
> > Also update man pages to reflect the addition of a new option to
> > 'xl remus' command.
>
> Shouldn't enabling network buffering be the default ?
>
>
Given that network buffering support is conditionally compiled into
libxl, it makes sense to let the user explicitly enable network buffering.
If network buffering was enabled by default, but libxl was missing the
feature,
then running xl remus (without any option) would immediately fail saying
network
buffering support is missing. Might as well make people enable it
explicitly and
then discover the lack of support for it inside libxl.


> Is it really useful to have a command-line option to change the script ?
>
>
That was added on Ian C's suggestion. It might be useful if people have
openvswitch based setups and want to tweak the network buffering setup.


> > diff -r 94eea030e009 -r 3c11efb5e8fe docs/man/xl.pod.1
> > --- a/docs/man/xl.pod.1       Mon Nov 18 11:10:02 2013 -0800
> > +++ b/docs/man/xl.pod.1       Mon Nov 18 11:46:55 2013 -0800
> > @@ -398,8 +398,7 @@ Print huge (!) amount of debug during th
> >  Enable Remus HA for domain. By default B<xl> relies on ssh as a
> transport
> >  mechanism between the two hosts.
> >
> > -N.B: Remus support in xl is still in experimental (proof-of-concept)
> phase.
> > -     There is no support for network or disk buffering at the moment.
> > +N.B: There is no support for disk buffering at the moment.
>
> I think you need to keep the "experimental (proof-of-concept)" note.
> Without disk buffering, surely any VM which uses remus might corrupt
> its disk ?
>
>
agreed.


> > +=item B<-n>
> > +
> > +Enable network output buffering.  The default script used to configure
> > +network buffering is /etc/xen/scripts/remus-netbuf-setup. If you wish to
> > +use a custom script, use the I<-N> option or set the global variable
> > +I<remus.default.netbufscript> in /etc/xen/xl.conf to point to your
> script.
>
> There is no need to mention the default script again in this
> paragraph.
>
> I seem to have missed something, but should it be possible to specify
> the script individually per domain ?
>
>
using the -N option.


>  Ian.
>
>

[-- Attachment #1.2: Type: text/html, Size: 3999 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch
  2013-12-12  5:45     ` Shriram Rajagopalan
@ 2013-12-12 11:07       ` Ian Campbell
  2013-12-12 15:13         ` Ian Jackson
  2013-12-12 15:12       ` Ian Jackson
  1 sibling, 1 reply; 22+ messages in thread
From: Ian Campbell @ 2013-12-12 11:07 UTC (permalink / raw)
  To: rshriram
  Cc: Andrew Cooper, Stefano Stabellini, Ian Jackson, Roger Pau Monne,
	xen-devel@lists.xen.org

On Wed, 2013-12-11 at 21:45 -0800, Shriram Rajagopalan wrote:
> On Mon, Nov 25, 2013 at 7:37 AM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
>         Shriram Rajagopalan writes ("[PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch"):
>         > tools/xl: Remus - Network buffering cmdline switch
>         >
>         > Command line switch to 'xl remus' command, to enable network buffering.
>         > Pass on this flag to libxl so that it can act accordingly.
>         > Also update man pages to reflect the addition of a new option to
>         > 'xl remus' command.

>         Shouldn't enabling network buffering be the default ?

> Given that network buffering support is conditionally compiled into
> libxl, it makes sense to let the user explicitly enable network buffering.
> If network buffering was enabled by default, but libxl was missing the feature,
> then running xl remus (without any option) would immediately fail saying network 
> buffering support is missing. Might as well make people enable it explicitly and 
> then discover the lack of support for it inside libxl. 

The default should be set depending on whether support was enabled in
libxl or not. libxl_defbool is the mechanism which allows this sort of
thing to be decided at the library level.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch
  2013-12-12  5:45     ` Shriram Rajagopalan
  2013-12-12 11:07       ` Ian Campbell
@ 2013-12-12 15:12       ` Ian Jackson
  2013-12-12 18:37         ` Shriram Rajagopalan
  1 sibling, 1 reply; 22+ messages in thread
From: Ian Jackson @ 2013-12-12 15:12 UTC (permalink / raw)
  To: rshriram
  Cc: Andrew Cooper, Roger Pau Monne, Stefano Stabellini, Ian Campbell,
	xen-devel@lists.xen.org

Shriram Rajagopalan writes ("Re: [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch"):
> On Mon, Nov 25, 2013 at 7:37 AM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
>     Shriram Rajagopalan writes ("[PATCH 4 of 4 V5] tools/xl: Remus - Network
>     buffering cmdline switch"):
>     > tools/xl: Remus - Network buffering cmdline switch
...
>     Shouldn't enabling network buffering be the default ?
>

> Given that network buffering support is conditionally compiled into
> libxl, it makes sense to let the user explicitly enable network
> buffering.  If network buffering was enabled by default, but libxl
> was missing the feature, then running xl remus (without any option)
> would immediately fail saying network buffering support is
> missing. Might as well make people enable it explicitly and then
> discover the lack of support for it inside libxl.

Is Remus actually useable without network buffering ?  I don't think
we would want to change the default later, and clearly the default
should be to behave correctly.

>     Is it really useful to have a command-line option to change the script ?
> 
> That was added on Ian C's suggestion. It might be useful if people have 
> openvswitch based setups and want to tweak the network buffering setup.

I think this should be a per-vif configuration option, with a default
taken from the domain and/or global config.  Not a command-line
option.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch
  2013-12-12 11:07       ` Ian Campbell
@ 2013-12-12 15:13         ` Ian Jackson
  2013-12-12 15:36           ` Ian Campbell
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Jackson @ 2013-12-12 15:13 UTC (permalink / raw)
  To: Ian Campbell
  Cc: rshriram, Andrew Cooper, Stefano Stabellini, Roger Pau Monne,
	xen-devel@lists.xen.org

Ian Campbell writes ("Re: [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch"):
> The default should be set depending on whether support was enabled in
> libxl or not. libxl_defbool is the mechanism which allows this sort of
> thing to be decided at the library level.

I don't think I agree with this.

As I understand it, operating without network buffering could lead to
strange behaviours by the domain when it fails over - violations of
network protocols by "the" VM.

The default should be to operate correctly, and if correct operation
is not supported the default should be to fail.  (Cf security.)

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch
  2013-12-12 15:13         ` Ian Jackson
@ 2013-12-12 15:36           ` Ian Campbell
  0 siblings, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2013-12-12 15:36 UTC (permalink / raw)
  To: Ian Jackson
  Cc: rshriram, Andrew Cooper, Stefano Stabellini, Roger Pau Monne,
	xen-devel@lists.xen.org

On Thu, 2013-12-12 at 15:13 +0000, Ian Jackson wrote:
> Ian Campbell writes ("Re: [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch"):
> > The default should be set depending on whether support was enabled in
> > libxl or not. libxl_defbool is the mechanism which allows this sort of
> > thing to be decided at the library level.
> 
> I don't think I agree with this.
> 
> As I understand it, operating without network buffering could lead to
> strange behaviours by the domain when it fails over - violations of
> network protocols by "the" VM.
> 
> The default should be to operate correctly, and if correct operation
> is not supported the default should be to fail.  (Cf security.)

Er, yes, that's absolutely correct, not sure what I was thinking.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch
  2013-12-12 15:12       ` Ian Jackson
@ 2013-12-12 18:37         ` Shriram Rajagopalan
  2013-12-12 18:43           ` Ian Jackson
  2013-12-12 18:47           ` Brendan Cully
  0 siblings, 2 replies; 22+ messages in thread
From: Shriram Rajagopalan @ 2013-12-12 18:37 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Andrew Cooper, Roger Pau Monne, Stefano Stabellini, Ian Campbell,
	xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 2551 bytes --]

On Thu, Dec 12, 2013 at 7:12 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:

> Shriram Rajagopalan writes ("Re: [PATCH 4 of 4 V5] tools/xl: Remus -
> Network buffering cmdline switch"):
> > On Mon, Nov 25, 2013 at 7:37 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>
> wrote:
> >     Shriram Rajagopalan writes ("[PATCH 4 of 4 V5] tools/xl: Remus -
> Network
> >     buffering cmdline switch"):
> >     > tools/xl: Remus - Network buffering cmdline switch
> ...
> >     Shouldn't enabling network buffering be the default ?
> >
>
> > Given that network buffering support is conditionally compiled into
> > libxl, it makes sense to let the user explicitly enable network
> > buffering.  If network buffering was enabled by default, but libxl
> > was missing the feature, then running xl remus (without any option)
> > would immediately fail saying network buffering support is
> > missing. Might as well make people enable it explicitly and then
> > discover the lack of support for it inside libxl.
>
> Is Remus actually useable without network buffering ?  I don't think
> we would want to change the default later, and clearly the default
> should be to behave correctly.
>

It is usable without network buffering. Memory state is preserved on
failover.
On failover, without network buffering, exising network
connectivity (tcp connections) "may" be lost, under some circumstances.

You are right about the fact that for Remus to behave 'correctly', one
needs both network and disk buffering -- the latter being absent is the
reason
Remus in libxl is still experimental.

So, might as well enable network buffering by default, punt if the support
is
missing and suggest that the user invoke Remus without network buffering
(while providing sufficient warnings that the failover may be incomplete)


> >     Is it really useful to have a command-line option to change the
> script ?
> >
> > That was added on Ian C's suggestion. It might be useful if people have
> > openvswitch based setups and want to tweak the network buffering setup.
>
> I think this should be a per-vif configuration option, with a default
> taken from the domain and/or global config.  Not a command-line
> option.
>
>
This is getting needlessly complicated. These are "nice to have" features,
which I am least concerned about at this moment.
I am more than happy to fall back to simply using the default script
specified in xl.conf,
for the initial version.  If users want to provide a per-vif configuration
script, the support
can be integrated later.


> Thanks,
> Ian.
>
>

[-- Attachment #1.2: Type: text/html, Size: 3655 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch
  2013-12-12 18:37         ` Shriram Rajagopalan
@ 2013-12-12 18:43           ` Ian Jackson
  2013-12-12 18:47           ` Brendan Cully
  1 sibling, 0 replies; 22+ messages in thread
From: Ian Jackson @ 2013-12-12 18:43 UTC (permalink / raw)
  To: rshriram
  Cc: Andrew Cooper, Roger Pau Monne, Stefano Stabellini, Ian Campbell,
	xen-devel@lists.xen.org

Shriram Rajagopalan writes ("Re: [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch"):
> So, might as well enable network buffering by default, punt if the support is
> missing and suggest that the user invoke Remus without network buffering
> (while providing sufficient warnings that the failover may be incomplete)

Right.

>     I think this should be a per-vif configuration option, with a default
>     taken from the domain and/or global config.  Not a command-line
>     option.
...
> This is getting needlessly complicated. These are "nice to have" features, 
> which I am least concerned about at this moment. 
> I am more than happy to fall back to simply using the default script
> specified in xl.conf, for the initial version.  If users want to
> provide a per-vif configuration script, the support can be
> integrated later.

Fair enough.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch
  2013-12-12 18:37         ` Shriram Rajagopalan
  2013-12-12 18:43           ` Ian Jackson
@ 2013-12-12 18:47           ` Brendan Cully
  2013-12-12 18:52             ` Shriram Rajagopalan
  1 sibling, 1 reply; 22+ messages in thread
From: Brendan Cully @ 2013-12-12 18:47 UTC (permalink / raw)
  To: Shriram Rajagopalan
  Cc: Ian Campbell, Stefano Stabellini, Andrew Cooper, Ian Jackson,
	xen-devel@lists.xen.org, Roger Pau Monne

On Thursday, 12 December 2013 at 10:37, Shriram Rajagopalan wrote:
> On Thu, Dec 12, 2013 at 7:12 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:
> 
> > Shriram Rajagopalan writes ("Re: [PATCH 4 of 4 V5] tools/xl: Remus -
> > Network buffering cmdline switch"):
> > > On Mon, Nov 25, 2013 at 7:37 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>
> > wrote:
> > >     Shriram Rajagopalan writes ("[PATCH 4 of 4 V5] tools/xl: Remus -
> > Network
> > >     buffering cmdline switch"):
> > >     > tools/xl: Remus - Network buffering cmdline switch
> > ...
> > >     Shouldn't enabling network buffering be the default ?
> > >
> >
> > > Given that network buffering support is conditionally compiled into
> > > libxl, it makes sense to let the user explicitly enable network
> > > buffering.  If network buffering was enabled by default, but libxl
> > > was missing the feature, then running xl remus (without any option)
> > > would immediately fail saying network buffering support is
> > > missing. Might as well make people enable it explicitly and then
> > > discover the lack of support for it inside libxl.
> >
> > Is Remus actually useable without network buffering ?  I don't think
> > we would want to change the default later, and clearly the default
> > should be to behave correctly.
> >
> 
> It is usable without network buffering. Memory state is preserved on
> failover.
> On failover, without network buffering, exising network
> connectivity (tcp connections) "may" be lost, under some circumstances.

uh, wouldn't it be possible to acknowledge client requests whose
backing state hasn't been replicated? So that on failover the backup
might have no memory of a promised it had made? it's more than just
tcp resets, isn't it?

> You are right about the fact that for Remus to behave 'correctly', one
> needs both network and disk buffering -- the latter being absent is the
> reason
> Remus in libxl is still experimental.
> 
> So, might as well enable network buffering by default, punt if the support
> is
> missing and suggest that the user invoke Remus without network buffering
> (while providing sufficient warnings that the failover may be incomplete)
> 
> 
> > >     Is it really useful to have a command-line option to change the
> > script ?
> > >
> > > That was added on Ian C's suggestion. It might be useful if people have
> > > openvswitch based setups and want to tweak the network buffering setup.
> >
> > I think this should be a per-vif configuration option, with a default
> > taken from the domain and/or global config.  Not a command-line
> > option.
> >
> >
> This is getting needlessly complicated. These are "nice to have" features,
> which I am least concerned about at this moment.
> I am more than happy to fall back to simply using the default script
> specified in xl.conf,
> for the initial version.  If users want to provide a per-vif configuration
> script, the support
> can be integrated later.
> 
> 
> > Thanks,
> > Ian.
> >
> >

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch
  2013-12-12 18:47           ` Brendan Cully
@ 2013-12-12 18:52             ` Shriram Rajagopalan
  2013-12-12 18:55               ` Brendan Cully
  0 siblings, 1 reply; 22+ messages in thread
From: Shriram Rajagopalan @ 2013-12-12 18:52 UTC (permalink / raw)
  To: Ian Jackson, Andrew Cooper, Roger Pau Monne, Stefano Stabellini,
	Ian Campbell, xen-devel@lists.xen.org, Brendan Cully


[-- Attachment #1.1: Type: text/plain, Size: 1159 bytes --]

>
>
> > >
> > > Is Remus actually useable without network buffering ?  I don't think
> > > we would want to change the default later, and clearly the default
> > > should be to behave correctly.
> > >
> >
> > It is usable without network buffering. Memory state is preserved on
> > failover.
> > On failover, without network buffering, exising network
> > connectivity (tcp connections) "may" be lost, under some circumstances.
>
> uh, wouldn't it be possible to acknowledge client requests whose
> backing state hasn't been replicated? So that on failover the backup
> might have no memory of a promised it had made? it's more than just
> tcp resets, isn't it?
>
>
true.. which is why I stated ..

 > You are right about the fact that for Remus to behave 'correctly', one
> > needs both network and disk buffering -- the latter being absent is the
> > reason
> > Remus in libxl is still experimental.
> >
> > So, might as well enable network buffering by default, punt if the
> support
> > is
> > missing and suggest that the user invoke Remus without network buffering
> > (while providing sufficient warnings that the failover may be incomplete)
> >
>

:)

[-- Attachment #1.2: Type: text/html, Size: 1772 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch
  2013-12-12 18:52             ` Shriram Rajagopalan
@ 2013-12-12 18:55               ` Brendan Cully
  0 siblings, 0 replies; 22+ messages in thread
From: Brendan Cully @ 2013-12-12 18:55 UTC (permalink / raw)
  To: Shriram Rajagopalan
  Cc: Ian Campbell, Stefano Stabellini, Andrew Cooper, Ian Jackson,
	xen-devel@lists.xen.org, Roger Pau Monne

On Thursday, 12 December 2013 at 10:52, Shriram Rajagopalan wrote:
> >
> >
> > > >
> > > > Is Remus actually useable without network buffering ?  I don't think
> > > > we would want to change the default later, and clearly the default
> > > > should be to behave correctly.
> > > >
> > >
> > > It is usable without network buffering. Memory state is preserved on
> > > failover.
> > > On failover, without network buffering, exising network
> > > connectivity (tcp connections) "may" be lost, under some circumstances.
> >
> > uh, wouldn't it be possible to acknowledge client requests whose
> > backing state hasn't been replicated? So that on failover the backup
> > might have no memory of a promised it had made? it's more than just
> > tcp resets, isn't it?
> >
> >
> true.. which is why I stated ..
> 
>  > You are right about the fact that for Remus to behave 'correctly', one
> > > needs both network and disk buffering -- the latter being absent is the
> > > reason
> > > Remus in libxl is still experimental.

ah, maybe you meant to leave the quotes off of 'correctly'.

> > >
> > > So, might as well enable network buffering by default, punt if the
> > support
> > > is
> > > missing and suggest that the user invoke Remus without network buffering
> > > (while providing sufficient warnings that the failover may be incomplete)
> > >
> >
> 
> :)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2 of 4 V5] tools/libxl: Remus network buffering - hotplug scripts and setup code
  2013-11-25 15:32   ` Ian Jackson
@ 2013-12-19 19:37     ` Shriram Rajagopalan
  0 siblings, 0 replies; 22+ messages in thread
From: Shriram Rajagopalan @ 2013-12-19 19:37 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Andrew Cooper, Roger Pau Monne, Stefano Stabellini, Ian Campbell,
	xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 16079 bytes --]

On Mon, Nov 25, 2013 at 9:32 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:

> Shriram Rajagopalan writes ("[PATCH 2 of 4 V5] tools/libxl: Remus network
> buffering - hotplug scripts and setup code"):
> > tools/libxl: Remus network buffering - hotplug scripts and setup code
>
> Thanks.  This is going in the right direction, but I think there's
> still some way to go.
>
>
> One thing that makes reviewing it particularly difficult is the
> physical ordering of the new code in your patch.
>
> The callback programming model makes it easy to write spaghetti logic;
> to counter this we try to have a firm structure to each set of event
> handling code:
>
>  * Firstly, function forward declarations, types, structs, etc.
>
>  * Then helper functions which have a simple "call this and it
>    does something and returns" control flow.
>
>  * Then the main body of the event callbacks, in the order in
>    which they will normally be called.  Obviously there is some
>    judgement needed because the call flow won't always be entirely
>    linear.  But keeping the layout as linear as possible means that
>    one is never hunting for the next function.
>
>  * Call flow of a single operation does not bounce back and forth
>    between files.  If this is necessary a formal sub-operation is
>    defined (see how it's done with the bootloader, for example).
>
> I think you would be able to structure the code this way.  Would you
> do that please ?
>
> I think you may need to move several of the entrypoints into a new
> "remus" file.  The "netbuf" file may need to become a more formally
> separated part of the suspend/resume function.
>
> We also try to break out sub-operations, giving them their own types
> and callbacks.
>
>
> > diff -r 4b471809a5bb -r 5b4e029bb79b tools/libxl/Makefile
> > --- a/tools/libxl/Makefile    Mon Nov 18 10:17:35 2013 -0800
> > +++ b/tools/libxl/Makefile    Mon Nov 18 11:09:34 2013 -0800
> > @@ -42,6 +42,13 @@ LIBXL_OBJS-y += libxl_blktap2.o
> > -static void remus_failover_cb(libxl__egc *egc,
> > -                              libxl__domain_suspend_state *dss, int rc);
> > +static void remus_replication_failure_cb(libxl__egc *egc,
> > +                                         libxl__domain_suspend_state
> *dss,
> > +                                         int rc);
> ...
> > -    dss->remus = info;
>
> This change from dss->remus to dss->remus_ctx needs a mention in the
> commit message.  To be honest, I'm not sure why you don't just change
> the type of the remus member.
>
> > diff -r 4b471809a5bb -r 5b4e029bb79b tools/libxl/libxl_internal.h
> > --- a/tools/libxl/libxl_internal.h    Mon Nov 18 10:17:35 2013 -0800
> > +++ b/tools/libxl/libxl_internal.h    Mon Nov 18 11:09:34 2013 -0800
> > @@ -2291,6 +2291,50 @@ typedef struct libxl__logdirty_switch {
> >      libxl__ev_time timeout;
> >  } libxl__logdirty_switch;
> >
> > +typedef struct libxl__remus_ctx {
> > +    /* checkpoint interval */
> > +    int interval;
> > +    int blackhole;
> > +    int compression;
> > +    int saved_rc;
> > +    /* Script to setup/teardown network buffers */
> > +    const char *netbufscript;
> > +    /* Opaque context containing network buffer related stuff */
> > +    void *netbuf_ctx;
> > +    libxl__domain_suspend_state *dss;
> > +    libxl__ev_time timeout;
> > +    libxl__ev_child child;
> > +    int num_exec;
> > +} libxl__remus_ctx;
>
> I think you should probably call this a "libxl__remus_state", to
> correspond to all of the other "libxl__foo_state" structs.
>
> You also need to specify in comments who fills in what parts of it, as
> is done in the other libxl__foo_state's.
>
>
> > diff -r 4b471809a5bb -r 5b4e029bb79b tools/libxl/libxl_netbuffer.c
> > --- /dev/null Thu Jan 01 00:00:00 1970 +0000
> > +++ b/tools/libxl/libxl_netbuffer.c   Mon Nov 18 11:09:34 2013 -0800
> > @@ -0,0 +1,516 @@
> ...
> > +typedef struct libxl__remus_netbuf_ctx {
> > +    struct rtnl_qdisc **netbuf_qdisc_list;
> > +    struct nl_sock *nlsock;
> > +    struct nl_cache *qdisc_cache;
> > +    char **vif_list;
> > +    char **ifb_list;
> > +    uint32_t num_netbufs;
> > +    uint32_t unused;
> > +} libxl__remus_netbuf_ctx;
>
> Presumably this wants to be a libxl__remus_netbuf_state, then.
>
> The callbacks from the remus and netbuf code to the suspend code
> should not be hardwired.  Instead, function pointers should be passed,
> the way the domain creation stuff does with a bootloader, and the way
> the bootloader does with openpty.
>
> This may seem like extra complication if there is only one caller and
> it will always pass the same function, but it makes the logical
> separation much clearer.  It's analogous to the separation between an
> ordinary function and its caller, even if there is in fact only one
> call site.
>
> That is, functions like this
>
> +_hidden void libxl__remus_setup_done(libxl__egc *egc,
> +                                     libxl__domain_suspend_state *dss,
> +                                     int rc);
>
> should be private functions in the remus file which make the
> appropriate callback.
>
>
> > +/* If the device has a vifname, then use that instead of
> > + * the vifX.Y format.
> > + */
> > +static char *get_vifname(libxl__gc *gc, uint32_t domid,
> > +                               libxl_device_nic *nic)
> > +{
> > +    const char *vifname = NULL;
> > +    vifname = libxl__xs_read(gc, XBT_NULL,
> > +                             libxl__sprintf(gc,
> > +
>  "%s/backend/vif/%d/%d/vifname",
> > +                                            libxl__xs_get_dompath(gc,
> 0),
> > +                                            domid, nic->devid));
>
> Why not libxl__xs_read_checked ?
>
> > +    if (!vifname) {
>
> Surely this error handling is wrong.  NULL here might mean "failed" or
> ENOENT.  If it means failed then get_vifname needs to return the error
> to its caller, by returning NULL.
>
> > +        vifname = libxl__device_nic_devname(gc, domid,
> > +                                            nic->devid,
> > +                                            nic->nictype);
> > +    }
> > +
> > +    return libxl__strdup(gc, vifname);
>
> Why this strdup ?
>
> > +static char **get_guest_vif_list(libxl__gc *gc, uint32_t domid,
> > +                                 int *num_vifs)
> > +{
> > +    libxl_device_nic *nics = NULL;
> > +    int nb, i = 0;
> > +    char **vif_list = NULL;
> > +
> > +    *num_vifs = 0;
> > +    nics = libxl_device_nic_list(CTX, domid, &nb);
> > +    if (!nics) return NULL;
> > +
> > +    /* Ensure that none of the vifs are backed by driver domains */
> > +    for (i = 0; i < nb; i++) {
> > +        if (nics[i].backend_domid != LIBXL_TOOLSTACK_DOMID) {
> > +            LOG(ERROR, "vif %s has driver domain (%u) as its backend.\n"
> > +                "Network buffering is not supported with driver
> domains",
>
> The LOG macros should not normally be passed messages with newlines
> in.
>
> > +                get_vifname(gc, domid, &nics[i]),
> nics[i].backend_domid);
> > +            *num_vifs = -1;
>
> This calling convention is strange.  I think get_guest_vif_list should
> return a libxl rc value.
>
> > +    vif_list = libxl__calloc(gc, nb, sizeof(char *));
> > +    for (i = 0; i < nb; ++i)
> > +        vif_list[i] = get_vifname(gc, domid, &nics[i]);
>
> What if get_vifname fails ?
>
>

The idea here is that if the interface has a name, then we use it.
Otherwise,
we simply revert to the name that Xen assigns it - which is in vifX.Y
format.
But I agree with you that the vifname reading process from xenstore may not
be successful always.


> ...
>
> > +
> > +static int init_qdiscs(libxl__gc *gc,
> > +                       libxl__remus_ctx *remus_ctx)
> > +{
> > +    int i, ret, ifindex;
> > +    struct rtnl_link *ifb = NULL;
> > +    struct rtnl_qdisc *qdisc = NULL;
> ...
> > +    /* convenience vars */
>
> We use the term "Convenience aliases".  It's best to use the exact
> same terminology as elsewhere.
>
> > +    libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx;
> > +    int num_netbufs = netbuf_ctx->num_netbufs;
> > +    char **ifb_list = netbuf_ctx->ifb_list;
>
> These should all be declared const.
>
> > +    /* list of handles to plug qdiscs */
> > +    netbuf_ctx->netbuf_qdisc_list =
> > +        libxl__calloc(gc, num_netbufs,
> > +                      sizeof(struct rtnl_qdisc *));
>
> Please use GCNEW_ARRAY.
>
> > +        if (qdisc) {
> > +            const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
> > +            /* Sanity check: Ensure that the root qdisc is a plug
> qdisc. */
> > +            if (!tc_kind || strcmp(tc_kind, "plug")) {
> > +                LOG(ERROR, "plug qdisc is not installed on %s",
> ifb_list[i]);
>
> You seem to assume that all the failures from rtnl_tc_get_kind are the
> due to the expected cause.  You need to check errno, I think.
>

Not according to the documentation. Its return value is either the name of
the
qdisc or NULL if its not set. There is no mention of errno.
I can only go by whats documented :).


> > +    rtnl_link_put(ifb);
> > +    return 0;
> > +
> > + end:
>
> We call these "out", not "end".  This should be changed everywhere.
>
> > +    if (ifb) rtnl_link_put(ifb);
> > +    return ERROR_FAIL;
>
> Can't this leak netbuf_ctx->nlsock and netbuf_ctx->qdisc_cache ?
>
>
You asked this question before (in a previous version). There is a cleanup
code that gets called that releases these resources (the teardown function).

>
> > +}
> > +
> > +/* the script needs the following env & args
> > + * $vifname
> > + * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/)
> > + * $IFB (for teardown)
> > + * setup/teardown as command line arg.
> > + * In return, the script writes the name of IFB device (during setup)
> to be
> > + * used for output buffering into XENBUS_PATH/ifb
> > + */
>
> This information should be somewhere else, not buried in the libxl
> source code.  For now, could it be in the one script which you
> actually supply ?
>
>
It is also documented in the script.


>
> > +static int exec_netbuf_script(libxl__gc *gc, libxl__remus_ctx
> *remus_ctx,
> > +                              char *op, libxl__ev_child_callback *death)
> > +{
> > +    int arraysize, nr = 0;
> > +    char **env = NULL, **args = NULL;
> > +    pid_t pid;
>
> This code has many important similarities to the device_hotplug
> functions in libxl_device.c.
>
> For example your timeout callback is pretty much identical to
> device_hotplug_timeout_cb.
>
> I think you should consider whether you can use libxl__ao_device, and
> perhaps share other parts of that code.
>
>
I looked at them before deciding to roll out my own.  Please correct me if
I am
wrong about this:
The ao_device interface while sharing some similarities, does not really
fit in here.
It attempts bring up/down a device (that has been created already, for
e.g., a vif),
add it to the guest and represent it internally using libxl__device or
something.

The netbuf code invokes the script, which in turn sets up an IFB device
that is not
part of the guest's configuration.  The information about the device
returned by the
script is collected in an internal data structure and new handles (plug
qdisc handle)
are created, and added to remus' state.

The control flows are different.  The only commonality is invoke external
script, wait
until it finishes and invoke callback.

On alternative is to hack into the device_hotplug code, and add special
cases. For e.g.,
if device_ptr is null, then this is a remus op, so do Y instead of X.

This kind of special casing would percolate throughout the device hotplug
code,
polluting a what was otherwise cleanly defined interface to handle generic
device hotplug.


If you have any other alternatives in mind, I would be happy to check it
out.


>
> > +    libxl__ev_child_init(child);
>
> This should be done where the remus state is first populated, not
> here.
>
>
> > +    char *script = libxl__strdup(gc, remus_ctx->netbufscript);
>
> These next lot are all convenience aliases AFIACT.  This should be
> mentioned, and they should all be const:
>
> > +    libxl__ev_child *child = &remus_ctx->child;
> > +    libxl__ev_time *timeout = &remus_ctx->timeout;
> > +    uint32_t domid = remus_ctx->dss->domid;
> > +    int devid = remus_ctx->num_exec;
> > +    libxl__remus_netbuf_ctx *netbuf_ctx = remus_ctx->netbuf_ctx;
> > +    char *vif = netbuf_ctx->vif_list[devid];
> > +    char *ifb = netbuf_ctx->ifb_list[devid];
>
> So instead, something like:
>
>   +    libxl__ev_child *const child = &remus_ctx->child;
> ...
>   +    const uint32_t domid = remus_ctx->dss->domid;
> ...
>   +    char *const ifb = netbuf_ctx->ifb_list[devid];
>
> There are other places in the code with a similar pattern which also
> need to be changed.
>
>
> > +    if (!strcmp(op, "setup"))
> > +        arraysize = 5;
> > +    else
> > +        arraysize = 7;
>
> Perhaps we could do away with this if() and just always allocate 7 ?
>
> > +    GCNEW_ARRAY(env, arraysize);
> > +    env[nr++] = "vifname";
> > +    env[nr++] = vif;
> > +    env[nr++] = "XENBUS_PATH";
> > +    env[nr++] = GCSPRINTF("%s/remus/netbuf/%d",
> > +                          libxl__xs_libxl_path(gc, domid), devid);
> > +    if (!strcmp(op, "teardown")) {
>
> I think you should pass op as an enum, and convert it to a string
> right at the end.  That would do away with these strcmps.
>
> > +    remus_ctx->num_exec++;
>
> num_exec is a misleading variable name here.  In libxl_linux.c and
> libxl_device.c it refers to the number of times we have run the
> hotplug script _for each device_, not the iteration over the number of
> devices.
>
> > +static void netbuf_setup_script_cb(libxl__egc *egc,
> > +                                   libxl__ev_child *child,
> > +                                   pid_t pid, int status)
> > +{
> ...
> > +    hotplug_error = libxl__xs_read(gc, XBT_NULL,
> > +                                   GCSPRINTF("%s/hotplug-error",
> > +                                             out_path_base));
>
> Again, libxl__xs_read_checked (several times) ?
>
> > +static void netbuf_setup_timeout_cb(libxl__egc *egc,
> > +                                    libxl__ev_time *ev,
> > +                                    const struct timeval *requested_abs)
> > +{
>
> This seems to be yet another copy of device_hotplug_timeout_cb.
>
>
Yep. See my answer above.

 > +void libxl__remus_netbuf_setup(libxl__egc *egc,
> > +                               libxl__domain_suspend_state *dss)
> > +{
> > +    libxl__remus_netbuf_ctx *netbuf_ctx = NULL;
> > +    uint32_t domid = dss->domid;
> > +    libxl__remus_ctx *remus_ctx = dss->remus_ctx;
> > +    int num_netbufs = 0;
> > +    int rc = ERROR_FAIL;
> ...
> > +    netbuf_ctx = libxl__zalloc(gc, sizeof(libxl__remus_netbuf_ctx));
>
> Please use GCNEW, not libxl__zalloc, when you can.
>
> > +/* Note: This function will be called in the same gc context as
> > + * libxl__remus_netbuf_setup, created during the
> libxl_domain_remus_start
> > + * API call.
> > + */
> > +void libxl__remus_netbuf_teardown(libxl__egc *egc,
> > +                                  libxl__domain_suspend_state *dss)
> > +{
> ...
> > +    remus_ctx->num_exec = 0; //start at devid == 0
> > +    if (exec_netbuf_script(gc, remus_ctx, "teardown",
> > +                           netbuf_teardown_script_cb))
> > +        libxl__remus_teardown_done(egc, dss);
>
> I think the return value from exec_netbuf_script should be put into an
> rc variable.  And we don't correctly propagate the rc here AFAICT.
>
> > +#define TC_BUFFER_START 1
> > +#define TC_BUFFER_RELEASE 2
>
> Are these #defines not supposed to come from some Linux header file ?
> I don't think having them as constants here can be right.
>
>
They are not constants taken from any library and stuck in here.  They are
merely
for readability.  The related code is remus_netbuffer_op, where you would
find
the buffer/release code blocks controlled through the value of op variable
(which is either BUFFER_START or BUFFER_RELEASE)

 thanks
shriram

[-- Attachment #1.2: Type: text/html, Size: 21190 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2013-12-19 19:37 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-18 20:03 [PATCH 0 of 4 V5] Remus/Libxl: Network buffering support Shriram Rajagopalan
2013-11-18 20:03 ` [PATCH 1 of 4 V5] remus: add libnl3 dependency to autoconf scripts Shriram Rajagopalan
2013-11-18 20:03 ` [PATCH 2 of 4 V5] tools/libxl: Remus network buffering - hotplug scripts and setup code Shriram Rajagopalan
2013-11-25 15:32   ` Ian Jackson
2013-12-19 19:37     ` Shriram Rajagopalan
2013-11-18 20:03 ` [PATCH 3 of 4 V5] tools/libxl: Control network buffering in remus callbacks Shriram Rajagopalan
2013-11-25 15:38   ` Ian Jackson
2013-11-18 20:03 ` [PATCH 4 of 4 V5] tools/xl: Remus - Network buffering cmdline switch Shriram Rajagopalan
2013-11-25 15:37   ` Ian Jackson
2013-12-12  5:45     ` Shriram Rajagopalan
2013-12-12 11:07       ` Ian Campbell
2013-12-12 15:13         ` Ian Jackson
2013-12-12 15:36           ` Ian Campbell
2013-12-12 15:12       ` Ian Jackson
2013-12-12 18:37         ` Shriram Rajagopalan
2013-12-12 18:43           ` Ian Jackson
2013-12-12 18:47           ` Brendan Cully
2013-12-12 18:52             ` Shriram Rajagopalan
2013-12-12 18:55               ` Brendan Cully
2013-11-19 16:23 ` [PATCH 0 of 4 V5] Remus/Libxl: Network buffering support Ian Campbell
2013-11-19 17:58   ` Shriram Rajagopalan
2013-11-24  6:51     ` Shriram Rajagopalan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).