* [PATCH v18 01/11] libxl: introduce libxl__multidev_prepare_with_aodev
2014-07-28 9:23 [PATCH v18 00/11] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
@ 2014-07-28 9:23 ` Yang Hongyang
2014-07-29 17:21 ` Ian Jackson
2014-07-28 9:23 ` [PATCH v18 02/11] libxl: add support for async. function calls when using libxl__ao_device Yang Hongyang
` (10 subsequent siblings)
11 siblings, 1 reply; 20+ messages in thread
From: Yang Hongyang @ 2014-07-28 9:23 UTC (permalink / raw)
To: xen-devel
Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
eddie.dong, rshriram, laijs
libxl__multidev_prepare_with_aodev is similar to libxl__multidev_prepare,
but takes a libxl__ao_device as an extra argument.
libxl__multidev_prepare is now a wrapper around
libxl__multidev_prepare_with_aodev.
This new internal API will be used by the Remus device abstract layer
for handling various Remus devices.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
---
tools/libxl/libxl_device.c | 13 ++++++++++---
tools/libxl/libxl_internal.h | 14 +++++++++++---
2 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index f8a2e1b..9180732 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -463,11 +463,10 @@ void libxl__multidev_begin(libxl__ao *ao, libxl__multidev *multidev)
static void multidev_one_callback(libxl__egc *egc, libxl__ao_device *aodev);
-libxl__ao_device *libxl__multidev_prepare(libxl__multidev *multidev) {
+void libxl__multidev_prepare_with_aodev(libxl__multidev *multidev,
+ libxl__ao_device *aodev) {
STATE_AO_GC(multidev->ao);
- libxl__ao_device *aodev;
- GCNEW(aodev);
aodev->multidev = multidev;
aodev->callback = multidev_one_callback;
libxl__prepare_ao_device(ao, aodev);
@@ -477,6 +476,14 @@ libxl__ao_device *libxl__multidev_prepare(libxl__multidev *multidev) {
GCREALLOC_ARRAY(multidev->array, multidev->allocd);
}
multidev->array[multidev->used++] = aodev;
+}
+
+libxl__ao_device *libxl__multidev_prepare(libxl__multidev *multidev) {
+ STATE_AO_GC(multidev->ao);
+ libxl__ao_device *aodev;
+
+ GCNEW(aodev);
+ libxl__multidev_prepare_with_aodev(multidev, aodev);
return aodev;
}
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index beb052e..611b9fb 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2148,9 +2148,17 @@ struct libxl__ao_device {
/* Starts preparing to add/remove a bunch of devices. */
_hidden void libxl__multidev_begin(libxl__ao *ao, libxl__multidev*);
-/* Prepares to add/remove one of many devices. Returns a libxl__ao_device
- * which has had libxl__prepare_ao_device called, and which has also
- * had ->callback set. The user should not mess with aodev->callback. */
+/* Prepares to add/remove one of many devices.
+ * Calls libxl__prepare_ao_device on libxl__ao_device argument provided and
+ * also sets the ->callback. The user should not mess with aodev->callback.
+ */
+_hidden void libxl__multidev_prepare_with_aodev(libxl__multidev*,
+ libxl__ao_device*);
+
+/* A wrapper function around libxl__multidev_prepare_with_aodev.
+ * Allocates a libxl__ao_device and prepares it for addition/removal.
+ * Returns the newly allocated libxl__ao_dev.
+ */
_hidden libxl__ao_device *libxl__multidev_prepare(libxl__multidev*);
/* Notifies the multidev machinery that we have now finished preparing
--
1.9.1
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH v18 02/11] libxl: add support for async. function calls when using libxl__ao_device
2014-07-28 9:23 [PATCH v18 00/11] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
2014-07-28 9:23 ` [PATCH v18 01/11] libxl: introduce libxl__multidev_prepare_with_aodev Yang Hongyang
@ 2014-07-28 9:23 ` Yang Hongyang
2014-07-29 17:24 ` Ian Jackson
2014-07-28 9:23 ` [PATCH v18 03/11] autoconf: add libnl3 dependency for Remus network buffering support Yang Hongyang
` (9 subsequent siblings)
11 siblings, 1 reply; 20+ messages in thread
From: Yang Hongyang @ 2014-07-28 9:23 UTC (permalink / raw)
To: xen-devel
Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
eddie.dong, rshriram, laijs
Extend libxl__ao_device with a libxl__ev_child member, which can be
used to asynchronously execute functions that take a long time to complete.
Remus uses this functionality to execute functions that involve blocking
system calls.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
---
tools/libxl/libxl_device.c | 1 +
tools/libxl/libxl_internal.h | 2 ++
2 files changed, 3 insertions(+)
diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c
index 9180732..89dc824 100644
--- a/tools/libxl/libxl_device.c
+++ b/tools/libxl/libxl_device.c
@@ -435,6 +435,7 @@ void libxl__prepare_ao_device(libxl__ao *ao, libxl__ao_device *aodev)
/* We init this here because we might call device_hotplug_done
* without actually calling any hotplug script */
libxl__async_exec_init(&aodev->aes);
+ libxl__ev_child_init(&aodev->child);
}
/* multidev */
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 611b9fb..4bc042b 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2128,6 +2128,8 @@ struct libxl__ao_device {
int num_exec;
/* for calling hotplug scripts */
libxl__async_exec_state aes;
+ /* for executing functions asynchronously */
+ libxl__ev_child child;
};
/*
--
1.9.1
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v18 02/11] libxl: add support for async. function calls when using libxl__ao_device
2014-07-28 9:23 ` [PATCH v18 02/11] libxl: add support for async. function calls when using libxl__ao_device Yang Hongyang
@ 2014-07-29 17:24 ` Ian Jackson
2014-07-30 8:45 ` Hongyang Yang
0 siblings, 1 reply; 20+ messages in thread
From: Ian Jackson @ 2014-07-29 17:24 UTC (permalink / raw)
To: Yang Hongyang
Cc: laijs, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
xen-devel, rshriram, ian.campbell
Yang Hongyang writes ("[PATCH v18 02/11] libxl: add support for async. function calls when using libxl__ao_device"):
> Extend libxl__ao_device with a libxl__ev_child member, which can be
> used to asynchronously execute functions that take a long time to complete.
The code change is fine.
I think the commit message and the comment should explain that this
member is used only for syscalls where only a synchronous version is
provided.
How about
Extend libxl__ao_device with a libxl__ev_child member.
This can be used to fork children to allow the asynchronous execution
of system calls which only come in a synchronous variant. This will
be useful for Remus, in the following patches.
instead ?
> + /* for executing functions asynchronously */
> + libxl__ev_child child;
And
+ /* for asynchronous execution of synchronous-only syscalls etc. */
Thanks,
Ian.
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v18 02/11] libxl: add support for async. function calls when using libxl__ao_device
2014-07-29 17:24 ` Ian Jackson
@ 2014-07-30 8:45 ` Hongyang Yang
0 siblings, 0 replies; 20+ messages in thread
From: Hongyang Yang @ 2014-07-30 8:45 UTC (permalink / raw)
To: Ian Jackson
Cc: laijs, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
xen-devel, rshriram, ian.campbell
On 07/30/2014 01:24 AM, Ian Jackson wrote:
> Yang Hongyang writes ("[PATCH v18 02/11] libxl: add support for async. function calls when using libxl__ao_device"):
>> Extend libxl__ao_device with a libxl__ev_child member, which can be
>> used to asynchronously execute functions that take a long time to complete.
>
> The code change is fine.
>
> I think the commit message and the comment should explain that this
> member is used only for syscalls where only a synchronous version is
> provided.
>
> How about
>
> Extend libxl__ao_device with a libxl__ev_child member.
>
> This can be used to fork children to allow the asynchronous execution
> of system calls which only come in a synchronous variant. This will
> be useful for Remus, in the following patches.
>
> instead ?
>
>> + /* for executing functions asynchronously */
>> + libxl__ev_child child;
>
> And
> + /* for asynchronous execution of synchronous-only syscalls etc. */
It is better, thank you!
>
> Thanks,
> Ian.
> .
>
--
Thanks,
Yang.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v18 03/11] autoconf: add libnl3 dependency for Remus network buffering support
2014-07-28 9:23 [PATCH v18 00/11] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
2014-07-28 9:23 ` [PATCH v18 01/11] libxl: introduce libxl__multidev_prepare_with_aodev Yang Hongyang
2014-07-28 9:23 ` [PATCH v18 02/11] libxl: add support for async. function calls when using libxl__ao_device Yang Hongyang
@ 2014-07-28 9:23 ` Yang Hongyang
2014-07-28 9:23 ` [PATCH v18 04/11] libxl/remus: introduce an abstract Remus device layer Yang Hongyang
` (8 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: Yang Hongyang @ 2014-07-28 9:23 UTC (permalink / raw)
To: xen-devel
Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
eddie.dong, rshriram, laijs
Libnl3 is required for controlling Remus network buffering.
This patch adds dependency on libnl3 (>= 3.2.8) to autoconf scripts.
It also provides the ability to configure tools without libnl3 support
i.e., without network buffering support.
When there is no network buffering support, libxl__netbuffer_enabled()
returns 0, otherwise returns 1. The callers of this api will be
introduced in the rest of the series.
NOTE: This patch changes tools/configure.ac, please rerun
autogen.sh while applying the patch.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
README | 4 ++++
config/Tools.mk.in | 4 ++++
docs/README.remus | 6 ++++++
tools/configure.ac | 16 ++++++++++++++++
tools/libxl/Makefile | 13 +++++++++++++
tools/libxl/libxl_internal.h | 1 +
tools/libxl/libxl_netbuffer.c | 31 +++++++++++++++++++++++++++++++
tools/libxl/libxl_nonetbuffer.c | 31 +++++++++++++++++++++++++++++++
8 files changed, 106 insertions(+)
create mode 100644 tools/libxl/libxl_netbuffer.c
create mode 100644 tools/libxl/libxl_nonetbuffer.c
diff --git a/README b/README
index 9bbe734..e770932 100644
--- a/README
+++ b/README
@@ -72,6 +72,10 @@ disabled at compile time:
* cmake (if building vtpm stub domains)
* markdown
* figlet (for generating the traditional Xen start of day banner)
+ * Development install of libnl3 (e.g., libnl-3-200,
+ libnl-3-dev, etc). Required if network buffering is desired
+ when using Remus with libxl. See tools/remus/README for detailed
+ information.
Second, you need to acquire a suitable kernel for use in domain 0. If
possible you should use a kernel provided by your OS distributor. If
diff --git a/config/Tools.mk.in b/config/Tools.mk.in
index 748cc69..c47eafa 100644
--- a/config/Tools.mk.in
+++ b/config/Tools.mk.in
@@ -43,6 +43,9 @@ PTHREAD_LIBS := @PTHREAD_LIBS@
PTYFUNCS_LIBS := @PTYFUNCS_LIBS@
+LIBNL3_LIBS := @LIBNL3_LIBS@
+LIBNL3_CFLAGS := @LIBNL3_CFLAGS@
+
# Download GIT repositories via HTTP or GIT's own protocol?
# GIT's protocol is faster and more robust, when it works at all (firewalls
# may block it). We make it the default, but if your GIT repository downloads
@@ -62,6 +65,7 @@ CONFIG_BLKTAP1 := @blktap1@
CONFIG_BLKTAP2 := @blktap2@
CONFIG_VTPM := @vtpm@
CONFIG_QEMUU_EXTRA_ARGS:= @EXTRA_QEMUU_CONFIGURE_ARGS@
+CONFIG_REMUS_NETBUF := @remus_netbuf@
#System options
ZLIB := @zlib@
diff --git a/docs/README.remus b/docs/README.remus
index 9fa00fe..ddf5b55 100644
--- a/docs/README.remus
+++ b/docs/README.remus
@@ -2,3 +2,9 @@ Remus provides fault tolerance for virtual machines by sending continuous
checkpoints to a backup, which will activate if the target VM fails.
See the website at http://wiki.xen.org/wiki/Remus for details.
+
+Using Remus with libxl on Xen 4.5 and higher:
+ To enable network buffering, you need libnl 3.2.8
+ or higher along with the development headers and command line utilities.
+ If your distro does not have the appropriate libnl3 version, you can find
+ the latest source tarball of libnl3 at http://www.carisma.slowglass.com/~tgr/libnl/
diff --git a/tools/configure.ac b/tools/configure.ac
index 629d6a0..b10dac3 100644
--- a/tools/configure.ac
+++ b/tools/configure.ac
@@ -295,6 +295,22 @@ esac
# Checks for header files.
AC_CHECK_HEADERS([yajl/yajl_version.h sys/eventfd.h valgrind/memcheck.h utmp.h])
+# Check for libnl3 >=3.2.8. If present enable remus network buffering.
+PKG_CHECK_MODULES(LIBNL3, [libnl-3.0 >= 3.2.8 libnl-route-3.0 >= 3.2.8],
+ [libnl3_lib="y"], [libnl3_lib="n"])
+
+AS_IF([test "x$libnl3_lib" = "xn" ], [
+ AC_MSG_WARN([Disabling support for Remus network buffering.
+ Please install libnl3 libraries, command line tools and devel
+ headers - version 3.2.8 or higher])
+ AC_SUBST(remus_netbuf, [n])
+ ],[
+ AC_SUBST(remus_netbuf, [y])
+])
+
+AC_SUBST(LIBNL3_LIBS)
+AC_SUBST(LIBNL3_CFLAGS)
+
fi # ! $rump
AC_OUTPUT()
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index bd0db3b..eb63510 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -21,11 +21,17 @@ endif
LIBXL_LIBS =
LIBXL_LIBS = $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxenstore) $(LDLIBS_libblktapctl) $(PTYFUNCS_LIBS) $(LIBUUID_LIBS)
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+LIBXL_LIBS += $(LIBNL3_LIBS)
+endif
CFLAGS_LIBXL += $(CFLAGS_libxenctrl)
CFLAGS_LIBXL += $(CFLAGS_libxenguest)
CFLAGS_LIBXL += $(CFLAGS_libxenstore)
CFLAGS_LIBXL += $(CFLAGS_libblktapctl)
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
+endif
CFLAGS_LIBXL += -Wshadow
LIBXL_LIBS-$(CONFIG_ARM) += -lfdt
@@ -43,6 +49,13 @@ LIBXL_OBJS-y += libxl_blktap2.o
else
LIBXL_OBJS-y += libxl_noblktap2.o
endif
+
+ifeq ($(CONFIG_REMUS_NETBUF),y)
+LIBXL_OBJS-y += libxl_netbuffer.o
+else
+LIBXL_OBJS-y += libxl_nonetbuffer.o
+endif
+
LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 4bc042b..c3e95e6 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2480,6 +2480,7 @@ typedef struct libxl__save_helper_state {
* marshalling and xc callback functions */
} libxl__save_helper_state;
+_hidden int libxl__netbuffer_enabled(libxl__gc *gc);
/*----- Domain suspend (save) state structure -----*/
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
new file mode 100644
index 0000000..52d593c
--- /dev/null
+++ b/tools/libxl/libxl_netbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2014
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+ return 1;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
new file mode 100644
index 0000000..1c72a7f
--- /dev/null
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2014
+ * Author Shriram Rajagopalan <rshriram@cs.ubc.ca>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+int libxl__netbuffer_enabled(libxl__gc *gc)
+{
+ return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--
1.9.1
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH v18 04/11] libxl/remus: introduce an abstract Remus device layer
2014-07-28 9:23 [PATCH v18 00/11] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
` (2 preceding siblings ...)
2014-07-28 9:23 ` [PATCH v18 03/11] autoconf: add libnl3 dependency for Remus network buffering support Yang Hongyang
@ 2014-07-28 9:23 ` Yang Hongyang
2014-08-07 18:30 ` Ian Jackson
2014-07-28 9:23 ` [PATCH v18 05/11] libxl/remus: setup and control network output buffering Yang Hongyang
` (7 subsequent siblings)
11 siblings, 1 reply; 20+ messages in thread
From: Yang Hongyang @ 2014-07-28 9:23 UTC (permalink / raw)
To: xen-devel
Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
eddie.dong, rshriram, laijs
Introduce an abstract device layer that allows the Remus
logic in libxl to control a guest's devices in a device-agnostic
manner. The device layer also exposes a set of internal interfaces
that a device type must implement, if it wishes to support Remus.
The following API are exposed to libxl:
One-time configuration operations:
*libxl__remus_devices_setup
> Enable output buffering for NICs, setup disk replication, etc.
*libxl__remus_devices_teardown
> Disable network output buffering and disk replication;
teardown any associated external setups like qdiscs for NICs.
Operations executed every checkpoint (in order of invocation):
*libxl__remus_devices_postsuspend
*libxl__remus_devices_preresume
*libxl__remus_devices_commit
Each device type needs to implement the interfaces specified in
the libxl__remus_device_subkind_ops if it wishes to support Remus.
The high-level control flow through the Remus device layer is shown below:
xl remus
|-> libxl_domain_remus_start
|-> libxl__remus_devices_setup
|-> Per-checkpoint libxl__remus_devices_[postsuspend,preresume,commit]
...
|-> On backup failure/network error/other errors
libxl__remus_devices_teardown
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
For comments:
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
---
tools/libxl/Makefile | 2 +
tools/libxl/libxl.c | 47 +++++-
tools/libxl/libxl_dom.c | 172 ++++++++++++++++++++--
tools/libxl/libxl_internal.h | 177 +++++++++++++++++++++++
tools/libxl/libxl_remus_device.c | 298 +++++++++++++++++++++++++++++++++++++++
tools/libxl/libxl_types.idl | 2 +
6 files changed, 679 insertions(+), 19 deletions(-)
create mode 100644 tools/libxl/libxl_remus_device.c
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index eb63510..202f1bb 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -56,6 +56,8 @@ else
LIBXL_OBJS-y += libxl_nonetbuffer.o
endif
+LIBXL_OBJS-y += libxl_remus_device.o
+
LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 3526539..95eead8 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -781,6 +781,10 @@ out:
return ptr;
}
+static void libxl__remus_setup_done(libxl__egc *egc,
+ libxl__remus_devices_state *rds, int rc);
+static void libxl__remus_setup_failed(libxl__egc *egc,
+ libxl__remus_devices_state *rds, int rc);
static void remus_failover_cb(libxl__egc *egc,
libxl__domain_suspend_state *dss, int rc);
@@ -812,16 +816,51 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
assert(info);
- /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+ /* Convenience aliases */
+ libxl__remus_devices_state *const rds = &dss->rds;
+ rds->ao = ao;
+ rds->egc = egc;
+ rds->domid = domid;
+ rds->callback = libxl__remus_setup_done;
/* Point of no return */
- libxl__domain_suspend(egc, dss);
+ libxl__remus_devices_setup(egc, rds);
return AO_INPROGRESS;
out:
return AO_ABORT(rc);
}
+static void libxl__remus_setup_done(libxl__egc *egc,
+ libxl__remus_devices_state *rds, int rc)
+{
+ libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+ STATE_AO_GC(dss->ao);
+
+ if (!rc) {
+ libxl__domain_suspend(egc, dss);
+ return;
+ }
+
+ LOG(ERROR, "Remus: failed to setup device for guest with domid %u, rc %d",
+ dss->domid, rc);
+ rds->callback = libxl__remus_setup_failed;
+ libxl__remus_devices_teardown(egc, rds);
+}
+
+static void libxl__remus_setup_failed(libxl__egc *egc,
+ libxl__remus_devices_state *rds, int rc)
+{
+ libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+ STATE_AO_GC(dss->ao);
+
+ if (rc)
+ LOG(ERROR, "Remus: failed to teardown device after setup failed"
+ " for guest with domid %u, rc %d", dss->domid, rc);
+
+ dss->callback(egc, dss, rc);
+}
+
static void remus_failover_cb(libxl__egc *egc,
libxl__domain_suspend_state *dss, int rc)
{
@@ -831,10 +870,6 @@ static void remus_failover_cb(libxl__egc *egc,
* backup died or some network error occurred preventing us
* from sending checkpoints.
*/
-
- /* TBD: Remus cleanup - i.e. detach qdisc, release other
- * resources.
- */
libxl__ao_complete(egc, ao, rc);
}
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 83eb29a..38e22f3 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -798,8 +798,6 @@ static void domain_suspend_done(libxl__egc *egc,
libxl__domain_suspend_state *dss, int rc);
static void domain_suspend_callback_common_done(libxl__egc *egc,
libxl__domain_suspend_state *dss, int ok);
-static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
- libxl__domain_suspend_state *dss, int ok);
/*----- complicated callback, called by xc_domain_save -----*/
@@ -1461,6 +1459,14 @@ static void domain_suspend_callback_common_done(libxl__egc *egc,
}
/*----- remus callbacks -----*/
+static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
+ libxl__domain_suspend_state *dss, int ok);
+static void remus_devices_postsuspend_cb(libxl__egc *egc,
+ libxl__remus_devices_state *rds,
+ int rc);
+static void remus_devices_preresume_cb(libxl__egc *egc,
+ libxl__remus_devices_state *rds,
+ int rc);
static void libxl__remus_domain_suspend_callback(void *data)
{
@@ -1475,32 +1481,74 @@ static void libxl__remus_domain_suspend_callback(void *data)
static void remus_domain_suspend_callback_common_done(libxl__egc *egc,
libxl__domain_suspend_state *dss, int ok)
{
- /* REMUS TODO: Issue disk and network checkpoint reqs. */
+ if (!ok)
+ goto out;
+
+ libxl__remus_devices_state *const rds = &dss->rds;
+ rds->callback = remus_devices_postsuspend_cb;
+ libxl__remus_devices_postsuspend(egc, rds);
+ return;
+
+out:
libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
}
-static void libxl__remus_domain_resume_callback(void *data)
+static void remus_devices_postsuspend_cb(libxl__egc *egc,
+ libxl__remus_devices_state *rds,
+ int rc)
{
int ok = 0;
+ libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+
+ if (rc)
+ goto out;
+
+ ok = 1;
+
+out:
+ libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
+}
+
+static void libxl__remus_domain_resume_callback(void *data)
+{
libxl__save_helper_state *shs = data;
libxl__egc *egc = shs->egc;
libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
STATE_AO_GC(dss->ao);
- /* Resumes the domain and the device model */
- if (libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
+ libxl__remus_devices_state *const rds = &dss->rds;
+ rds->callback = remus_devices_preresume_cb;
+ libxl__remus_devices_preresume(egc, rds);
+}
+
+static void remus_devices_preresume_cb(libxl__egc *egc,
+ libxl__remus_devices_state *rds,
+ int rc)
+{
+ int ok = 0;
+ libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+ STATE_AO_GC(dss->ao);
+
+ if (rc)
goto out;
- /* REMUS TODO: Deal with disk. Start a new network output buffer */
- ok = 1;
+ /* Resumes the domain and the device model */
+ if (!libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
+ ok = 1;
+
out:
- libxl__xc_domain_saverestore_async_callback_done(egc, shs, ok);
+ libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, ok);
}
/*----- remus asynchronous checkpoint callback -----*/
static void remus_checkpoint_dm_saved(libxl__egc *egc,
libxl__domain_suspend_state *dss, int rc);
+static void remus_devices_commit_cb(libxl__egc *egc,
+ libxl__remus_devices_state *rds,
+ int rc);
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+ const struct timeval *requested_abs);
static void libxl__remus_domain_checkpoint_callback(void *data)
{
@@ -1520,10 +1568,76 @@ static void libxl__remus_domain_checkpoint_callback(void *data)
static void remus_checkpoint_dm_saved(libxl__egc *egc,
libxl__domain_suspend_state *dss, int rc)
{
- /* REMUS TODO: Wait for disk and memory ack, release network buffer */
- /* REMUS TODO: make this asynchronous */
- assert(!rc); /* REMUS TODO handle this error properly */
- usleep(dss->interval * 1000);
+ /* Convenience aliases */
+ libxl__remus_devices_state *const rds = &dss->rds;
+
+ STATE_AO_GC(dss->ao);
+
+ if (rc) {
+ LOG(ERROR, "Failed to save device model. Terminating Remus..");
+ goto out;
+ }
+
+ rds->callback = remus_devices_commit_cb;
+ libxl__remus_devices_commit(egc, rds);
+
+ return;
+
+out:
+ libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void remus_devices_commit_cb(libxl__egc *egc,
+ libxl__remus_devices_state *rds,
+ int rc)
+{
+ libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+
+ STATE_AO_GC(dss->ao);
+
+ if (rc) {
+ LOG(ERROR, "Failed to do device commit op."
+ " Terminating Remus..");
+ goto out;
+ }
+
+ /*
+ * At this point, we have successfully checkpointed the guest and
+ * committed it at the backup. We'll come back after the checkpoint
+ * interval to checkpoint the guest again. Until then, let the guest
+ * continue execution.
+ */
+
+ /* Set checkpoint interval timeout */
+ rc = libxl__ev_time_register_rel(gc, &dss->checkpoint_timeout,
+ remus_next_checkpoint,
+ dss->interval);
+
+ if (rc) {
+ LOG(ERROR, "unable to register timeout for next epoch."
+ " Terminating Remus..");
+ goto out;
+ }
+
+ return;
+
+out:
+ libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 0);
+}
+
+static void remus_next_checkpoint(libxl__egc *egc, libxl__ev_time *ev,
+ const struct timeval *requested_abs)
+{
+ libxl__domain_suspend_state *dss =
+ CONTAINER_OF(ev, *dss, checkpoint_timeout);
+
+ STATE_AO_GC(dss->ao);
+
+ /*
+ * Time to checkpoint the guest again. We return 1 to libxc
+ * (xc_domain_save.c). in order to continue executing the infinite loop
+ * (suspend, checkpoint, resume) in xc_domain_save().
+ */
libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);
}
@@ -1738,6 +1852,10 @@ static void save_device_model_datacopier_done(libxl__egc *egc,
dss->save_dm_callback(egc, dss, our_rc);
}
+static void remus_teardown_done(libxl__egc *egc,
+ libxl__remus_devices_state *rds,
+ int rc);
+
static void domain_suspend_done(libxl__egc *egc,
libxl__domain_suspend_state *dss, int rc)
{
@@ -1752,6 +1870,34 @@ static void domain_suspend_done(libxl__egc *egc,
xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
dss->guest_evtchn.port, &dss->guest_evtchn_lockfd);
+ if (!dss->remus) {
+ remus_teardown_done(egc, &dss->rds, rc);
+ return;
+ }
+
+ /*
+ * With Remus, if we reach this point, it means either
+ * backup died or some network error occurred preventing us
+ * from sending checkpoints. Teardown the network buffers and
+ * release netlink resources. This is an async op.
+ */
+ LOG(WARN, "Remus: Domain suspend terminated with rc %d,"
+ " teardown Remus devices...", rc);
+ dss->rds.callback = remus_teardown_done;
+ libxl__remus_devices_teardown(egc, &dss->rds);
+}
+
+static void remus_teardown_done(libxl__egc *egc,
+ libxl__remus_devices_state *rds,
+ int rc)
+{
+ libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
+ STATE_AO_GC(dss->ao);
+
+ if (rc)
+ LOG(ERROR, "Remus: failed to teardown device for guest with domid %u,"
+ " rc %d", dss->domid, rc);
+
dss->callback(egc, dss, rc);
}
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index c3e95e6..91ba122 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2480,6 +2480,181 @@ typedef struct libxl__save_helper_state {
* marshalling and xc callback functions */
} libxl__save_helper_state;
+/*----- remus device related state structure -----*/
+/*
+ * The abstract Remus device layer exposes a common
+ * set of API to [external] libxl for manipulating devices attached to
+ * a guest protected by Remus. The device layer also exposes a set of
+ * [internal] interfaces that every device type must implement.
+ *
+ * The following API are exposed to libxl:
+ *
+ * One-time configuration operations:
+ * +libxl__remus_devices_setup
+ * > Enable output buffering for NICs, setup disk replication, etc.
+ * +libxl__remus_devices_teardown
+ * > Disable output buffering and disk replication; teardown any
+ * associated external setups like qdiscs for NICs.
+ *
+ * Operations executed every checkpoint (in order of invocation):
+ * +libxl__remus_devices_postsuspend
+ * +libxl__remus_devices_preresume
+ * +libxl__remus_devices_commit
+ *
+ * Each device type needs to implement the interfaces specified in
+ * the libxl__remus_device_subkind_ops if it wishes to support Remus.
+ *
+ * The high-level control flow through the Remus device layer is shown below:
+ *
+ * xl remus
+ * |-> libxl_domain_remus_start
+ * |-> libxl__remus_devices_setup
+ * |-> Per-checkpoint libxl__remus_devices_[postsuspend,preresume,commit]
+ * ...
+ * |-> On backup failure, network error or other internal errors:
+ * libxl__remus_devices_teardown
+ */
+
+typedef enum libxl__remus_device_kind {
+ LIBXL__REMUS_DEVICE_NIC = (1 << 0),
+ LIBXL__REMUS_DEVICE_DISK = (1 << 1),
+} libxl__remus_device_kind;
+
+typedef struct libxl__remus_device libxl__remus_device;
+typedef struct libxl__remus_devices_state libxl__remus_devices_state;
+typedef struct libxl__remus_device_subkind_ops libxl__remus_device_subkind_ops;
+
+/*
+ * Interfaces to be implemented by every device type that wishes to
+ * support Remus. Functions must be implemented unless otherwise
+ * stated. Many of these functions are asynchronous. They call
+ * dev->aodev.callback when done. The actual implementations may be
+ * synchronous and call dev->aodev.callback directly (as the last
+ * thing they do).
+ */
+struct libxl__remus_device_subkind_ops {
+ /* the device kind this ops belongs to... */
+ libxl__remus_device_kind kind;
+
+ /*
+ * init() and cleanup() relate to the subkind-specific state in
+ * the libxl ctx, not to any specific device.
+ * Synchronous. cleanup() cannot fail.
+ */
+ int (*init)(libxl__remus_devices_state *rds);
+ void (*cleanup)(libxl__remus_devices_state *rds);
+
+ /*
+ * Checkpoint operations. May be NULL, meaning the op is not
+ * implemented and the caller should treat them as a no-op (and do
+ * nothing when checkpointing).
+ * Asynchronous.
+ */
+
+ void (*postsuspend)(libxl__remus_device *dev);
+ void (*preresume)(libxl__remus_device *dev);
+ void (*commit)(libxl__remus_device *dev);
+
+ /*
+ * setup() and teardown() are refer to the actual remus device.
+ * Asynchronous.
+ * teardown is called even if setup fails.
+ */
+ /*
+ * setup() should first determines whether the subkind matches the specific
+ * device. If matched, the device will then be managed with this set of
+ * subkind operations.
+ * Yields 0 if the device successfully set up.
+ * REMUS_DEVOPS_DOES_NOT_MATCH if the ops does not match the device.
+ * any other rc indicates failure.
+ */
+ void (*setup)(libxl__remus_device *dev);
+ void (*teardown)(libxl__remus_device *dev);
+};
+
+typedef void libxl__remus_callback(libxl__egc *,
+ libxl__remus_devices_state *, int rc);
+
+/*
+ * State associated with a remus invocation, including parameters
+ * passed to the remus abstract device layer by the remus
+ * save/restore machinery.
+ */
+struct libxl__remus_devices_state {
+ /*---- must be set by caller of libxl__remus_device_(setup|teardown) ----*/
+
+ libxl__ao *ao;
+ libxl__egc *egc;
+ uint32_t domid;
+ libxl__remus_callback *callback;
+ int device_kind_flags;
+
+ /*----- private for abstract layer only -----*/
+
+ int num_devices;
+ /*
+ * this array is allocated before setup the remus devices by the
+ * remus abstract layer.
+ * the size of this array is 'num_devices', which is the total number
+ * of libxl nic devices and disk devices(num_nics + num_disks).
+ */
+ libxl__remus_device **dev;
+
+ libxl_device_nic *nics;
+ int num_nics;
+ libxl_device_disk *disks;
+ int num_disks;
+
+ libxl__multidev multidev;
+};
+
+/*
+ * Information about a single device being handled by remus.
+ * Allocated by the remus abstract layer.
+ */
+struct libxl__remus_device {
+ /*----- shared between abstract and concrete layers -----*/
+ /*
+ * if this is true, that means the subkind ops matched the
+ * device and we have actually set up the device no matter
+ * setup succeed or not.
+ */
+ int set_up;
+
+ /*----- set by remus device abstruct layer -----*/
+ /* libxl__device_* which this remus device related to */
+ const void *backend_dev;
+ libxl__remus_device_kind kind;
+ libxl__remus_devices_state *rds;
+ libxl__ao_device aodev;
+
+ /*----- private for abstract layer only -----*/
+
+ /*
+ * Control and state variables for the asynchronous callback
+ * based loops which iterate over device subkinds, and over
+ * individual devices.
+ */
+ int ops_index;
+ const libxl__remus_device_subkind_ops *ops;
+
+ /*----- private for concrete (device-specific) layer -----*/
+
+ /* concrete device's private data */
+ void *concrete_data;
+};
+
+/* the following 5 APIs are async ops, call rds->callback when done */
+_hidden void libxl__remus_devices_setup(libxl__egc *egc,
+ libxl__remus_devices_state *rds);
+_hidden void libxl__remus_devices_teardown(libxl__egc *egc,
+ libxl__remus_devices_state *rds);
+_hidden void libxl__remus_devices_postsuspend(libxl__egc *egc,
+ libxl__remus_devices_state *rds);
+_hidden void libxl__remus_devices_preresume(libxl__egc *egc,
+ libxl__remus_devices_state *rds);
+_hidden void libxl__remus_devices_commit(libxl__egc *egc,
+ libxl__remus_devices_state *rds);
_hidden int libxl__netbuffer_enabled(libxl__gc *gc);
/*----- Domain suspend (save) state structure -----*/
@@ -2520,6 +2695,8 @@ struct libxl__domain_suspend_state {
libxl__ev_xswatch guest_watch;
libxl__ev_time guest_timeout;
const char *dm_savefile;
+ libxl__remus_devices_state rds;
+ libxl__ev_time checkpoint_timeout; /* used for Remus checkpoint */
int interval; /* checkpoint interval (for Remus) */
libxl__save_helper_state shs;
libxl__logdirty_switch logdirty;
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
new file mode 100644
index 0000000..9ca9468
--- /dev/null
+++ b/tools/libxl/libxl_remus_device.c
@@ -0,0 +1,298 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author: Yang Hongyang <yanghy@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+static const libxl__remus_device_subkind_ops *remus_ops[] = {
+ NULL,
+};
+
+/*----- helper functions -----*/
+
+static int init_device_subkind(libxl__remus_devices_state *rds)
+{
+ int rc;
+ const libxl__remus_device_subkind_ops **ops;
+
+ for (ops = remus_ops; *ops; ops++) {
+ rc = (*ops)->init(rds);
+ if (rc)
+ goto out;
+ }
+
+ rc = 0;
+out:
+ return rc;
+
+}
+
+static void cleanup_device_subkind(libxl__remus_devices_state *rds)
+{
+ const libxl__remus_device_subkind_ops **ops;
+
+ for (ops = remus_ops; *ops; ops++)
+ (*ops)->cleanup(rds);
+}
+
+/*----- setup() and teardown() -----*/
+
+/* callbacks */
+
+static void devices_setup_cb(libxl__egc *egc,
+ libxl__multidev *multidev,
+ int rc);
+static void devices_teardown_cb(libxl__egc *egc,
+ libxl__multidev *multidev,
+ int rc);
+
+/* remus device setup and teardown */
+
+static libxl__remus_device* remus_device_init(libxl__egc *egc,
+ libxl__remus_devices_state *rds,
+ libxl__remus_device_kind kind,
+ void *libxl_dev)
+{
+ libxl__remus_device *dev = NULL;
+
+ STATE_AO_GC(rds->ao);
+ GCNEW(dev);
+ dev->backend_dev = libxl_dev;
+ dev->kind = kind;
+ dev->rds = rds;
+ dev->ops_index = -1;
+
+ return dev;
+}
+
+static void remus_devices_setup(libxl__egc *egc,
+ libxl__remus_devices_state *rds);
+
+void libxl__remus_devices_setup(libxl__egc *egc, libxl__remus_devices_state *rds)
+{
+ int i, rc;
+
+ STATE_AO_GC(rds->ao);
+
+ rc = init_device_subkind(rds);
+ if (rc)
+ goto out;
+
+ rds->num_devices = 0;
+ rds->num_nics = 0;
+ rds->num_disks = 0;
+
+ if (rds->device_kind_flags & LIBXL__REMUS_DEVICE_NIC)
+ rds->nics = libxl_device_nic_list(CTX, rds->domid, &rds->num_nics);
+
+ if (rds->device_kind_flags & LIBXL__REMUS_DEVICE_DISK)
+ rds->disks = libxl_device_disk_list(CTX, rds->domid, &rds->num_disks);
+
+ if (rds->num_nics == 0 && rds->num_disks == 0)
+ goto out;
+
+ GCNEW_ARRAY(rds->dev, rds->num_nics + rds->num_disks);
+
+ for (i = 0; i < rds->num_nics; i++) {
+ rds->dev[rds->num_devices++] = remus_device_init(egc, rds,
+ LIBXL__REMUS_DEVICE_NIC,
+ &rds->nics[i]);
+ }
+
+ for (i = 0; i < rds->num_disks; i++) {
+ rds->dev[rds->num_devices++] = remus_device_init(egc, rds,
+ LIBXL__REMUS_DEVICE_DISK,
+ &rds->disks[i]);
+ }
+
+ remus_devices_setup(egc, rds);
+
+ return;
+
+out:
+ rds->callback(egc, rds, rc);
+}
+
+static void remus_devices_setup(libxl__egc *egc,
+ libxl__remus_devices_state *rds)
+{
+ int i, rc;
+ libxl__remus_device *dev;
+
+ STATE_AO_GC(rds->ao);
+
+ libxl__multidev_begin(ao, &rds->multidev);
+ rds->multidev.callback = devices_setup_cb;
+ for (i = 0; i < rds->num_devices; i++) {
+ dev = rds->dev[i];
+ if (dev->set_up)
+ continue;
+
+ /* find avaliable ops */
+ do {
+ dev->ops = remus_ops[++dev->ops_index];
+ if (!dev->ops) {
+ rc = ERROR_REMUS_DEVICE_NOT_SUPPORTED;
+ goto out;
+ }
+ } while (dev->ops->kind != dev->kind);
+
+ libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);
+ dev->ops->setup(dev);
+ }
+
+ rc = 0;
+out:
+ libxl__multidev_prepared(egc, &rds->multidev, rc);
+}
+
+static void devices_setup_cb(libxl__egc *egc,
+ libxl__multidev *multidev,
+ int rc)
+{
+ int i;
+ libxl__remus_device *dev;
+
+ STATE_AO_GC(multidev->ao);
+
+ /* Convenience aliases */
+ libxl__remus_devices_state *const rds =
+ CONTAINER_OF(multidev, *rds, multidev);
+
+ /* find the error that was not ERROR_REMUS_DEVOPS_DOES_NOT_MATCH */
+ for (i = 0; i < rds->num_devices; i++) {
+ dev = rds->dev[i];
+
+ if (!dev->aodev.rc || dev->aodev.rc == ERROR_REMUS_DEVOPS_DOES_NOT_MATCH)
+ continue;
+
+ rc = dev->aodev.rc;
+ goto out;
+ }
+
+ /* if the error is still ERROR_REMUS_DEVOPS_DOES_NOT_MATCH, begin next iter */
+ if (rc == ERROR_REMUS_DEVOPS_DOES_NOT_MATCH) {
+ remus_devices_setup(egc, rds);
+ return;
+ }
+
+out:
+ rds->callback(egc, rds, rc);
+}
+
+void libxl__remus_devices_teardown(libxl__egc *egc,
+ libxl__remus_devices_state *rds)
+{
+ int i;
+ libxl__remus_device *dev;
+
+ STATE_AO_GC(rds->ao);
+
+ libxl__multidev_begin(ao, &rds->multidev);
+ rds->multidev.callback = devices_teardown_cb;
+ for (i = 0; i < rds->num_devices; i++) {
+ dev = rds->dev[i];
+ if (!dev->ops || !dev->set_up)
+ continue;
+
+ libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);
+ dev->ops->teardown(dev);
+ }
+
+ libxl__multidev_prepared(egc, &rds->multidev, 0);
+}
+
+static void devices_teardown_cb(libxl__egc *egc,
+ libxl__multidev *multidev,
+ int rc)
+{
+ int i;
+
+ STATE_AO_GC(multidev->ao);
+
+ /* Convenience aliases */
+ libxl__remus_devices_state *const rds =
+ CONTAINER_OF(multidev, *rds, multidev);
+
+ /* clean nic */
+ for (i = 0; i < rds->num_nics; i++)
+ libxl_device_nic_dispose(&rds->nics[i]);
+ free(rds->nics);
+ rds->nics = NULL;
+ rds->num_nics = 0;
+
+ /* clean disk */
+ for (i = 0; i < rds->num_disks; i++)
+ libxl_device_disk_dispose(&rds->disks[i]);
+ free(rds->disks);
+ rds->disks = NULL;
+ rds->num_disks = 0;
+
+ cleanup_device_subkind(rds);
+
+ rds->callback(egc, rds, rc);
+}
+
+/*----- checkpointing APIs -----*/
+
+/* callbacks */
+
+static void devices_checkpoint_cb(libxl__egc *egc,
+ libxl__multidev *multidev,
+ int rc);
+
+/* API implementations */
+
+#define define_remus_checkpoint_api(api) \
+void libxl__remus_devices_##api(libxl__egc *egc, \
+ libxl__remus_devices_state *rds) \
+{ \
+ int i; \
+ libxl__remus_device *dev; \
+ \
+ STATE_AO_GC(rds->ao); \
+ \
+ libxl__multidev_begin(ao, &rds->multidev); \
+ rds->multidev.callback = devices_checkpoint_cb; \
+ for (i = 0; i < rds->num_devices; i++) { \
+ dev = rds->dev[i]; \
+ if (!dev->set_up || !dev->ops->api) \
+ continue; \
+ libxl__multidev_prepare_with_aodev(&rds->multidev, &dev->aodev);\
+ dev->ops->api(dev); \
+ } \
+ \
+ libxl__multidev_prepared(egc, &rds->multidev, 0); \
+}
+
+define_remus_checkpoint_api(postsuspend);
+
+define_remus_checkpoint_api(preresume);
+
+define_remus_checkpoint_api(commit);
+
+static void devices_checkpoint_cb(libxl__egc *egc,
+ libxl__multidev *multidev,
+ int rc)
+{
+ STATE_AO_GC(multidev->ao);
+
+ /* Convenience aliases */
+ libxl__remus_devices_state *const rds =
+ CONTAINER_OF(multidev, *rds, multidev);
+
+ rds->callback(egc, rds, rc);
+}
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index a412f9c..25bd8f3 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -58,6 +58,8 @@ libxl_error = Enumeration("error", [
(-12, "OSEVENT_REG_FAIL"),
(-13, "BUFFERFULL"),
(-14, "UNKNOWN_CHILD"),
+ (-15, "REMUS_DEVOPS_DOES_NOT_MATCH"),
+ (-16, "REMUS_DEVICE_NOT_SUPPORTED"),
], value_namespace = "")
libxl_domain_type = Enumeration("domain_type", [
--
1.9.1
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v18 04/11] libxl/remus: introduce an abstract Remus device layer
2014-07-28 9:23 ` [PATCH v18 04/11] libxl/remus: introduce an abstract Remus device layer Yang Hongyang
@ 2014-08-07 18:30 ` Ian Jackson
2014-08-27 1:46 ` Hongyang Yang
0 siblings, 1 reply; 20+ messages in thread
From: Ian Jackson @ 2014-08-07 18:30 UTC (permalink / raw)
To: Yang Hongyang
Cc: laijs, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
xen-devel, rshriram, ian.campbell
Yang Hongyang writes ("[PATCH v18 04/11] libxl/remus: introduce an abstract Remus device layer"):
> Introduce an abstract device layer that allows the Remus
> logic in libxl to control a guest's devices in a device-agnostic
> manner. The device layer also exposes a set of internal interfaces
> that a device type must implement, if it wishes to support Remus.
Thanks. I think this is converging. I have mostly nits as comments
now. I have only two nontrivial comments: one about your use of
multidev which I think needs to be improved, and the other is about
the libxl__remus_device_kind enum (which you are already aware of).
> +static void remus_devices_preresume_cb(libxl__egc *egc,
> + libxl__remus_devices_state *rds,
> + int rc)
> +{
> + int ok = 0;
> + libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
> + STATE_AO_GC(dss->ao);
> +
> + if (rc)
> goto out;
>
> - /* REMUS TODO: Deal with disk. Start a new network output buffer */
> - ok = 1;
> + /* Resumes the domain and the device model */
> + if (!libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
> + ok = 1;
Again, this should use the standard `goto out' error handling style.
In this case that means:
rc = libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1);
if (rc) goto out;
ok = 1;
out:
> +static void remus_devices_commit_cb(libxl__egc *egc,
> + libxl__remus_devices_state *rds,
> + int rc)
> +{
...
> + /* Set checkpoint interval timeout */
> + rc = libxl__ev_time_register_rel(gc, &dss->checkpoint_timeout,
> + remus_next_checkpoint,
> + dss->interval);
> +
> + if (rc) {
> + LOG(ERROR, "unable to register timeout for next epoch."
> + " Terminating Remus..");
> + goto out;
> + }
There is no need to log failures of libxl__ev_time_register_rel et
al. See the comment in libxl_internal.h near line 691. It is
sufficient to do
if (rc) goto out;
> +typedef enum libxl__remus_device_kind {
> + LIBXL__REMUS_DEVICE_NIC = (1 << 0),
> + LIBXL__REMUS_DEVICE_DISK = (1 << 1),
> +} libxl__remus_device_kind;
We still need to talk about this, and the comments I had about the
vtables.
> +typedef struct libxl__remus_device libxl__remus_device;
> +typedef struct libxl__remus_devices_state libxl__remus_devices_state;
> +typedef struct libxl__remus_device_subkind_ops libxl__remus_device_subkind_ops;
> +
> +/*
> + * Interfaces to be implemented by every device type that wishes to
> + * support Remus. Functions must be implemented unless otherwise
> + * stated. Many of these functions are asynchronous. They call
> + * dev->aodev.callback when done. The actual implementations may be
> + * synchronous and call dev->aodev.callback directly (as the last
> + * thing they do).
> + */
> +struct libxl__remus_device_subkind_ops {
> + /* the device kind this ops belongs to... */
> + libxl__remus_device_kind kind;
> +
> + /*
> + * init() and cleanup() relate to the subkind-specific state in
> + * the libxl ctx, not to any specific device.
> + * Synchronous. cleanup() cannot fail.
> + */
> + int (*init)(libxl__remus_devices_state *rds);
> + void (*cleanup)(libxl__remus_devices_state *rds);
But actually they take a libxl__remus_devices_state.
Either the state is global for all simultaneous remus invocations in
with this libxl_ctx, in which case init and cleanup should not take
any libxl__remus_devices_state.
Or the state is per remus invocation, in which case the comment is
wrong.
You also need to document the error behaviour. From the call site I
think something like:
Before the first call to init, the subkind-specific state will be
all-bits-zero. cleanup will be called whether or not init
succeeded.
This is a similar situation to the one where I asked you to document
the same thing about `teardown'.
And if this is global state in the libxl_ctx, you have to also say:
init must be idempotent; it will be called multiple times,
possibly even if after it has been called and failed.
And if that is the semantics I think something like `ensure_inited' is
probably correct for its name.
> + int num_devices;
> + /*
> + * this array is allocated before setup the remus devices by the
> + * remus abstract layer.
> + * the size of this array is 'num_devices', which is the total number
> + * of libxl nic devices and disk devices(num_nics + num_disks).
> + */
> + libxl__remus_device **dev;
(As I said before) this comment leaves some questions unananswered:
What proportion of the devs array is initialised at any one time ?
May the devs array contain null pointers and what do they mean ? etc.
(And, sorry for not noticing this last time, but I think this variable
needs to be called `devs' rather than `dev'.)
> +/*
> + * Information about a single device being handled by remus.
> + * Allocated by the remus abstract layer.
> + */
> +struct libxl__remus_device {
> + /*----- shared between abstract and concrete layers -----*/
> + /*
> + * if this is true, that means the subkind ops matched the
> + * device and we have actually set up the device no matter
> + * setup succeed or not.
> + */
> + int set_up;
I don't understand this. The protocol documented in
libxl__remus_device_subkind_ops seems to be how the subkind
communicates to the abstract layer whether the device was successfully
set up. Is this variable in fact solely for the abstract layer ?
Also, "we have actually set up the device" and "setup succeeded" seem
to be the same thing.
(Also, can it be a boolean?)
> + /* find the error that was not ERROR_REMUS_DEVOPS_DOES_NOT_MATCH */
> + for (i = 0; i < rds->num_devices; i++) {
> + dev = rds->dev[i];
> +
> + if (!dev->aodev.rc || dev->aodev.rc == ERROR_REMUS_DEVOPS_DOES_NOT_MATCH)
This is quite tortuous. I think you probably want to do it
differently by having two layers of callback function:
You should probably make the multidev->callback only when you have
found the right subkind (or failed).
So the subkind should be told to use a different callback which is
handled here in the abstract type code. Then your abstract code can
iterate separately through each subkind, rather than hunting through
the innards of multidev.
(I think that accessing aodev->rc here is a layering violation.)
Thanks,
Ian.
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v18 04/11] libxl/remus: introduce an abstract Remus device layer
2014-08-07 18:30 ` Ian Jackson
@ 2014-08-27 1:46 ` Hongyang Yang
2014-08-27 2:21 ` Ian Jackson
0 siblings, 1 reply; 20+ messages in thread
From: Hongyang Yang @ 2014-08-27 1:46 UTC (permalink / raw)
To: Ian Jackson
Cc: laijs, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
xen-devel, rshriram, ian.campbell
Hi Ian,
Thanks for the review, I'm addressing these comments. What do you think of
the rest of the patches? Do you intend to review them all this time or just
stop here and review next version?
在 08/08/2014 02:30 AM, Ian Jackson 写道:
> Yang Hongyang writes ("[PATCH v18 04/11] libxl/remus: introduce an abstract Remus device layer"):
>> Introduce an abstract device layer that allows the Remus
>> logic in libxl to control a guest's devices in a device-agnostic
>> manner. The device layer also exposes a set of internal interfaces
>> that a device type must implement, if it wishes to support Remus.
>
> Thanks. I think this is converging. I have mostly nits as comments
> now. I have only two nontrivial comments: one about your use of
> multidev which I think needs to be improved, and the other is about
> the libxl__remus_device_kind enum (which you are already aware of).
>
>
>
>> +static void remus_devices_preresume_cb(libxl__egc *egc,
>> + libxl__remus_devices_state *rds,
>> + int rc)
>> +{
>> + int ok = 0;
>> + libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
>> + STATE_AO_GC(dss->ao);
>> +
>> + if (rc)
>> goto out;
>>
>> - /* REMUS TODO: Deal with disk. Start a new network output buffer */
>> - ok = 1;
>> + /* Resumes the domain and the device model */
>> + if (!libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1))
>> + ok = 1;
>
> Again, this should use the standard `goto out' error handling style.
> In this case that means:
>
> rc = libxl__domain_resume(gc, dss->domid, /* Fast Suspend */1);
> if (rc) goto out;
>
> ok = 1;
> out:
>
>
>> +static void remus_devices_commit_cb(libxl__egc *egc,
>> + libxl__remus_devices_state *rds,
>> + int rc)
>> +{
> ...
>> + /* Set checkpoint interval timeout */
>> + rc = libxl__ev_time_register_rel(gc, &dss->checkpoint_timeout,
>> + remus_next_checkpoint,
>> + dss->interval);
>> +
>> + if (rc) {
>> + LOG(ERROR, "unable to register timeout for next epoch."
>> + " Terminating Remus..");
>> + goto out;
>> + }
>
> There is no need to log failures of libxl__ev_time_register_rel et
> al. See the comment in libxl_internal.h near line 691. It is
> sufficient to do
>
> if (rc) goto out;
>
>> +typedef enum libxl__remus_device_kind {
>> + LIBXL__REMUS_DEVICE_NIC = (1 << 0),
>> + LIBXL__REMUS_DEVICE_DISK = (1 << 1),
>> +} libxl__remus_device_kind;
>
> We still need to talk about this, and the comments I had about the
> vtables.
>
>> +typedef struct libxl__remus_device libxl__remus_device;
>> +typedef struct libxl__remus_devices_state libxl__remus_devices_state;
>> +typedef struct libxl__remus_device_subkind_ops libxl__remus_device_subkind_ops;
>> +
>> +/*
>> + * Interfaces to be implemented by every device type that wishes to
>> + * support Remus. Functions must be implemented unless otherwise
>> + * stated. Many of these functions are asynchronous. They call
>> + * dev->aodev.callback when done. The actual implementations may be
>> + * synchronous and call dev->aodev.callback directly (as the last
>> + * thing they do).
>> + */
>> +struct libxl__remus_device_subkind_ops {
>> + /* the device kind this ops belongs to... */
>> + libxl__remus_device_kind kind;
>> +
>> + /*
>> + * init() and cleanup() relate to the subkind-specific state in
>> + * the libxl ctx, not to any specific device.
>> + * Synchronous. cleanup() cannot fail.
>> + */
>> + int (*init)(libxl__remus_devices_state *rds);
>> + void (*cleanup)(libxl__remus_devices_state *rds);
>
> But actually they take a libxl__remus_devices_state.
>
> Either the state is global for all simultaneous remus invocations in
> with this libxl_ctx, in which case init and cleanup should not take
> any libxl__remus_devices_state.
>
> Or the state is per remus invocation, in which case the comment is
> wrong.
>
> You also need to document the error behaviour. From the call site I
> think something like:
>
> Before the first call to init, the subkind-specific state will be
> all-bits-zero. cleanup will be called whether or not init
> succeeded.
>
> This is a similar situation to the one where I asked you to document
> the same thing about `teardown'.
>
>
> And if this is global state in the libxl_ctx, you have to also say:
>
> init must be idempotent; it will be called multiple times,
> possibly even if after it has been called and failed.
>
> And if that is the semantics I think something like `ensure_inited' is
> probably correct for its name.
>
>
>> + int num_devices;
>> + /*
>> + * this array is allocated before setup the remus devices by the
>> + * remus abstract layer.
>> + * the size of this array is 'num_devices', which is the total number
>> + * of libxl nic devices and disk devices(num_nics + num_disks).
>> + */
>> + libxl__remus_device **dev;
>
> (As I said before) this comment leaves some questions unananswered:
>
> What proportion of the devs array is initialised at any one time ?
> May the devs array contain null pointers and what do they mean ? etc.
>
> (And, sorry for not noticing this last time, but I think this variable
> needs to be called `devs' rather than `dev'.)
>
>> +/*
>> + * Information about a single device being handled by remus.
>> + * Allocated by the remus abstract layer.
>> + */
>> +struct libxl__remus_device {
>> + /*----- shared between abstract and concrete layers -----*/
>> + /*
>> + * if this is true, that means the subkind ops matched the
>> + * device and we have actually set up the device no matter
>> + * setup succeed or not.
>> + */
>> + int set_up;
>
> I don't understand this. The protocol documented in
> libxl__remus_device_subkind_ops seems to be how the subkind
> communicates to the abstract layer whether the device was successfully
> set up. Is this variable in fact solely for the abstract layer ?
>
> Also, "we have actually set up the device" and "setup succeeded" seem
> to be the same thing.
>
> (Also, can it be a boolean?)
>
>> + /* find the error that was not ERROR_REMUS_DEVOPS_DOES_NOT_MATCH */
>> + for (i = 0; i < rds->num_devices; i++) {
>> + dev = rds->dev[i];
>> +
>> + if (!dev->aodev.rc || dev->aodev.rc == ERROR_REMUS_DEVOPS_DOES_NOT_MATCH)
>
> This is quite tortuous. I think you probably want to do it
> differently by having two layers of callback function:
>
> You should probably make the multidev->callback only when you have
> found the right subkind (or failed).
>
> So the subkind should be told to use a different callback which is
> handled here in the abstract type code. Then your abstract code can
> iterate separately through each subkind, rather than hunting through
> the innards of multidev.
>
> (I think that accessing aodev->rc here is a layering violation.)
>
>
> Thanks,
> Ian.
> .
>
--
Thanks,
Yang.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v18 04/11] libxl/remus: introduce an abstract Remus device layer
2014-08-27 1:46 ` Hongyang Yang
@ 2014-08-27 2:21 ` Ian Jackson
2014-08-27 2:27 ` Hongyang Yang
0 siblings, 1 reply; 20+ messages in thread
From: Ian Jackson @ 2014-08-27 2:21 UTC (permalink / raw)
To: Hongyang Yang
Cc: laijs, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
xen-devel, rshriram, ian.campbell
Hongyang Yang writes ("Re: [PATCH v18 04/11] libxl/remus: introduce an abstract Remus device layer"):
> Thanks for the review, I'm addressing these comments. What do you think of
> the rest of the patches? Do you intend to review them all this time or just
> stop here and review next version?
I'm away at a conference right now and will be travelling much of the
next few weeks. I think you should send your series when you've dealt
with the outstanding issues. I will have a day or two in the office
next week and might be able to pay some attention to it then. Sorry
for not being responsive.
Thanks,
Ian.
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v18 04/11] libxl/remus: introduce an abstract Remus device layer
2014-08-27 2:21 ` Ian Jackson
@ 2014-08-27 2:27 ` Hongyang Yang
0 siblings, 0 replies; 20+ messages in thread
From: Hongyang Yang @ 2014-08-27 2:27 UTC (permalink / raw)
To: Ian Jackson
Cc: laijs, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
xen-devel, rshriram, ian.campbell
在 08/27/2014 10:21 AM, Ian Jackson 写道:
> Hongyang Yang writes ("Re: [PATCH v18 04/11] libxl/remus: introduce an abstract Remus device layer"):
>> Thanks for the review, I'm addressing these comments. What do you think of
>> the rest of the patches? Do you intend to review them all this time or just
>> stop here and review next version?
>
> I'm away at a conference right now and will be travelling much of the
> next few weeks. I think you should send your series when you've dealt
> with the outstanding issues. I will have a day or two in the office
> next week and might be able to pay some attention to it then. Sorry
> for not being responsive.
Thank you for the information, I will send next version as soon as
possible, hope will catch your time.
Thanks,
Yang.
>
> Thanks,
> Ian.
> .
>
--
Thanks,
Yang.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v18 05/11] libxl/remus: setup and control network output buffering
2014-07-28 9:23 [PATCH v18 00/11] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
` (3 preceding siblings ...)
2014-07-28 9:23 ` [PATCH v18 04/11] libxl/remus: introduce an abstract Remus device layer Yang Hongyang
@ 2014-07-28 9:23 ` Yang Hongyang
2014-07-28 9:24 ` [PATCH v18 06/11] libxl/remus: setup and control disk replication for DRBD backends Yang Hongyang
` (6 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: Yang Hongyang @ 2014-07-28 9:23 UTC (permalink / raw)
To: xen-devel
Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
eddie.dong, rshriram, laijs
This patch adds the machinery required for protecting a guest's
network device state. This patch comprises of two parts:
1. Hotplug scripts: The remus-netbuf-setup script is responsible for
setting up and tearing down the necessary infrastructure required for
network output buffering. This script should be invoked by libxl for
each of the guest's network interfaces, when starting or stopping Remus.
Apart from returning success/failure indication via the usual hotplug
entries in xenstore, this script also writes to xenstore, the name of
the REMUS_IFB device to be used to control the vif's network output.
The script relies on libnl3 command line utilities to perform various
setup/teardown functions. The script is confined to Linux platforms only
since NetBSD does not seem to have libnl3.
2. Remus network device: Implements the interfaces required by the
remus abstract device layer. A note about the implementation:
a) init() & cleanup() are called once per Remus invocation. They
establish and free netlink related state respectively.
b) setup() and teardown are called for each vif attached to the
guest.
During setup():
i) The hotplug script is called to setup a network buffer on a
given vif. The script chooses an available IFB device from
the system, redirects vif egress traffic to the IFB device
and sets up the plug qdisc (output buffer) on the IFB device.
The name of the IFB device is communicated via xenstore to
libxl.
ii) Libxl obtains a handle to the plug qdisc using the libnl3 API
and subsequently controls output buffering using this handle
in the checkpoint callbacks.
During teardown(), the hotplug scripts are called again to remove
the vif->ifb traffic redirection, release the ifb and the plug
qdisc associated with it.
c) The checkpoint callbacks [postsuspend(), preresume() and commit()]
are implemented as synchronous ops as the netlink calls associated
with the qdisc subsystem are very fast.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
docs/misc/xenstore-paths.markdown | 4 +
tools/hotplug/Linux/Makefile | 1 +
tools/hotplug/Linux/remus-netbuf-setup | 237 ++++++++++++++++
tools/libxl/libxl.c | 7 +
tools/libxl/libxl_internal.h | 7 +
tools/libxl/libxl_netbuffer.c | 485 +++++++++++++++++++++++++++++++++
tools/libxl/libxl_nonetbuffer.c | 25 ++
tools/libxl/libxl_remus_device.c | 2 +
8 files changed, 768 insertions(+)
create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown
index ea67536..d94ea9d 100644
--- a/docs/misc/xenstore-paths.markdown
+++ b/docs/misc/xenstore-paths.markdown
@@ -393,6 +393,10 @@ The guest's virtual time offset from UTC in seconds.
The device model version for a domain.
+#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL]
+
+ifb device used by Remus to buffer network output from the associated vif.
+
[BLKIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html
[FBIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,fbif.h.html
[HVMPARAMS]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,hvm,params.h.html
diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index d5de9e6..721f8c0 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -16,6 +16,7 @@ XEN_SCRIPTS += vif-nat
XEN_SCRIPTS += vif-openvswitch
XEN_SCRIPTS += vif2
XEN_SCRIPTS += vif-setup
+XEN_SCRIPTS-$(CONFIG_REMUS_NETBUF) += remus-netbuf-setup
XEN_SCRIPTS += block
XEN_SCRIPTS += block-enbd block-nbd
XEN_SCRIPTS-$(CONFIG_BLKTAP1) += blktap
diff --git a/tools/hotplug/Linux/remus-netbuf-setup b/tools/hotplug/Linux/remus-netbuf-setup
new file mode 100644
index 0000000..64d44d5
--- /dev/null
+++ b/tools/hotplug/Linux/remus-netbuf-setup
@@ -0,0 +1,237 @@
+#!/bin/bash
+#============================================================================
+# ${XEN_SCRIPT_DIR}/remus-netbuf-setup
+#
+# Script for attaching a network buffer to the specified vif (in any mode).
+# The hotplugging system will call this script when starting remus via libxl
+# API, libxl_domain_remus_start.
+#
+# Usage:
+# remus-netbuf-setup (setup|teardown)
+#
+# Environment vars:
+# vifname vif interface name (required).
+# XENBUS_PATH path in Xenstore, where the REMUS_IFB device details will be
+# stored or read from (required).
+# (libxl passes /libxl/<domid>/remus/netbuf/<devid>)
+# REMUS_IFB ifb interface to be cleaned up (required). [for teardown op only]
+
+# Written to the store: (setup operation)
+# XENBUS_PATH/ifb=<ifbdevName> the REMUS_IFB device serving
+# as the intermediate buffer through which the interface's network output
+# can be controlled.
+#
+
+# Remus network buffering requirements:
+
+# We need to buffer (queue) egress traffic from every vif attached to
+# the guest and release the buffers when the checkpoint associated
+# with them has been committed at the backup host. We achieve this
+# with the help of the plug queuing discipline (sch_plug module).
+# Simply put, Remus' network buffering imposes traffic
+# shaping on the guest's vif(s).
+
+# Limitations and Workarounds:
+
+# Egress traffic from a vif appears as ingress traffic to dom0. Linux
+# supports policing (dropping packets) but not traffic shaping
+# (queuing packets) on ingress traffic. The standard workaround to
+# this limitation is to attach an ingress qdisc to the guest vif,
+# redirect all egress traffic from the guest to an intermediate
+# queuing interface, and apply egress rules to it. The IFB
+# (Intermediate Functional Block) device serves the purpose of an
+# intermediate queuing interface.
+#
+
+# The following commands install a network buffer on a
+# guest's vif (vif1.0) using an IFB device (ifb0):
+#
+# ip link set dev ifb0 up
+# tc qdisc add dev vif1.0 ingress
+# tc filter add dev vif1.0 parent ffff: proto ip \
+# prio 10 u32 match u32 0 0 action mirred egress redirect dev ifb0
+# nl-qdisc-add --dev=ifb0 --parent root plug
+# nl-qdisc-add --dev=ifb0 --parent root --update plug --limit=10000000
+# (10MB limit on buffer)
+#
+# So order of operations when installing a network buffer on vif1.0
+# 1. find a free ifb and bring up the device
+# 2. redirect traffic from vif1.0 to ifb:
+# 2.1 add ingress qdisc to vif1.0 (to capture outgoing packets from guest)
+# 2.2 use tc filter command with actions mirred egress + redirect
+# 3. install plug_qdisc on ifb device, with which we can buffer/release
+# guest's network output from vif1.0
+#
+# Note:
+# 1. If the setup process fails, the script's cleanup is limited to removing the
+# ingress qdisc on the guest vif, so that its traffic can flow normally.
+# The chosen ifb device is not torn down. Libxl has to execute the
+# teardown op to remove other qdiscs and subsequently free the IFB device.
+#
+# 2. The teardown op may be invoked multiple times by libxl.
+
+#============================================================================
+
+# Unlike other vif scripts, vif-common is not needed here as it executes vif
+#specific setup code such as renaming.
+dir=$(dirname "$0")
+. "$dir/xen-hotplug-common.sh"
+
+findCommand "$@"
+
+if [ "$command" != "setup" -a "$command" != "teardown" ]
+then
+ echo "Invalid command: $command"
+ log err "Invalid command: $command"
+ exit 1
+fi
+
+evalVariables "$@"
+
+: ${vifname:?}
+: ${XENBUS_PATH:?}
+
+check_libnl_tools() {
+ if ! command -v nl-qdisc-list > /dev/null 2>&1; then
+ fatal "Unable to find nl-qdisc-list tool"
+ fi
+ if ! command -v nl-qdisc-add > /dev/null 2>&1; then
+ fatal "Unable to find nl-qdisc-add tool"
+ fi
+ if ! command -v nl-qdisc-delete > /dev/null 2>&1; then
+ fatal "Unable to find nl-qdisc-delete tool"
+ fi
+}
+
+# We only check for modules. We don't load them.
+# User/Admin is supposed to load ifb during boot time,
+# ensuring that there are enough free ifbs in the system.
+# Other modules will be loaded automatically by tc commands.
+check_modules() {
+ for m in ifb sch_plug sch_ingress act_mirred cls_u32
+ do
+ if ! modinfo $m > /dev/null 2>&1; then
+ fatal "Unable to find $m kernel module"
+ fi
+ done
+}
+
+xs_write_failed() {
+ local vif=$1
+ local ifb=$2
+ teardown_netbuf "$vifname" "$REMUS_IFB"
+ fatal "failed to write ifb name to xenstore"
+}
+
+#return 0 if the ifb is free
+check_ifb() {
+ local installed=`nl-qdisc-list -d $1`
+ [ -n "$installed" ] && return 1
+
+ for domid in `xenstore-list "/local/domain" 2>/dev/null || true`
+ do
+ [ $domid -eq 0 ] && continue
+ xenstore-exists "/libxl/$domid/remus/netbuf" || continue
+ for devid in `xenstore-list "/libxl/$domid/remus/netbuf" 2>/dev/null || true`
+ do
+ local path="/libxl/$domid/remus/netbuf/$devid/ifb"
+ xenstore-exists $path || continue
+ local ifb=`xenstore-read "$path" 2>/dev/null || true`
+ [ "$ifb" = "$1" ] && return 1
+ done
+ done
+
+ return 0
+}
+
+setup_ifb() {
+
+ for ifb in `ifconfig -a -s|egrep ^ifb|cut -d ' ' -f1`
+ do
+ check_ifb "$ifb" || continue
+ REMUS_IFB="$ifb"
+ break
+ done
+
+ if [ -z "$REMUS_IFB" ]
+ then
+ fatal "Unable to find a free ifb device for $vifname"
+ fi
+
+ #not using xenstore_write that automatically exits on error
+ #because we need to cleanup
+ _xenstore_write "$XENBUS_PATH/ifb" "$REMUS_IFB" || xs_write_failed "$vifname" "$REMUS_IFB"
+ do_or_die ip link set dev "$REMUS_IFB" up
+}
+
+redirect_vif_traffic() {
+ local vif=$1
+ local ifb=$2
+
+ do_or_die tc qdisc add dev "$vif" ingress
+
+ tc filter add dev "$vif" parent ffff: proto ip prio 10 \
+ u32 match u32 0 0 action mirred egress redirect dev "$ifb" >/dev/null 2>&1
+
+ if [ $? -ne 0 ]
+ then
+ do_without_error tc qdisc del dev "$vif" ingress
+ fatal "Failed to redirect traffic from $vif to $ifb"
+ fi
+}
+
+add_plug_qdisc() {
+ local vif=$1
+ local ifb=$2
+
+ nl-qdisc-add --dev="$ifb" --parent root plug >/dev/null 2>&1
+ if [ $? -ne 0 ]
+ then
+ do_without_error tc qdisc del dev "$vif" ingress
+ fatal "Failed to add plug qdisc to $ifb"
+ fi
+
+ #set ifb buffering limit in bytes. Its okay if this command fails
+ nl-qdisc-add --dev="$ifb" --parent root \
+ --update plug --limit=10000000 >/dev/null 2>&1 || true
+}
+
+teardown_netbuf() {
+ local vif=$1
+ local ifb=$2
+
+ #Check if the XENBUS_PATH/ifb exists and has IFB name same as REMUS_IFB.
+ #Otherwise, if the teardown op is called multiple times, then we may end
+ #up freeing another domain's allocated IFB inside the if loop.
+ xenstore-exists "$XENBUS_PATH/ifb" && \
+ local ifb2=`xenstore-read "$XENBUS_PATH/ifb" 2>/dev/null || true`
+
+ if [[ "$ifb2" && "$ifb2" == "$ifb" ]]; then
+ do_without_error ip link set dev "$ifb" down
+ do_without_error nl-qdisc-delete --dev="$ifb" --parent root plug >/dev/null 2>&1
+ xenstore-rm -t "$XENBUS_PATH/ifb" 2>/dev/null || true
+ fi
+ do_without_error tc qdisc del dev "$vif" ingress
+ xenstore-rm -t "$XENBUS_PATH/hotplug-status" 2>/dev/null || true
+ xenstore-rm -t "$XENBUS_PATH/hotplug-error" 2>/dev/null || true
+}
+
+case "$command" in
+ setup)
+ check_libnl_tools
+ check_modules
+
+ claim_lock "pickifb"
+ setup_ifb
+ redirect_vif_traffic "$vifname" "$REMUS_IFB"
+ add_plug_qdisc "$vifname" "$REMUS_IFB"
+ release_lock "pickifb"
+
+ success
+ ;;
+ teardown)
+ teardown_netbuf "$vifname" "$REMUS_IFB"
+ ;;
+esac
+
+log debug "Successful remus-netbuf-setup $command for $vifname, ifb $REMUS_IFB."
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 95eead8..191469b 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -818,6 +818,13 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
/* Convenience aliases */
libxl__remus_devices_state *const rds = &dss->rds;
+
+ if (!libxl__netbuffer_enabled(gc)) {
+ LOG(ERROR, "Remus: No support for network buffering");
+ goto out;
+ }
+ rds->device_kind_flags |= LIBXL__REMUS_DEVICE_NIC;
+
rds->ao = ao;
rds->egc = egc;
rds->domid = domid;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 91ba122..c6f1411 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2606,6 +2606,13 @@ struct libxl__remus_devices_state {
int num_disks;
libxl__multidev multidev;
+
+ /*----- private for concrete (device-specific) layer only -----*/
+
+ /* private for nic device subkind ops */
+ char *netbufscript;
+ struct nl_sock *nlsock;
+ struct nl_cache *qdisc_cache;
};
/*
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 52d593c..5093e0d 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -17,11 +17,496 @@
#include "libxl_internal.h"
+#include <netlink/cache.h>
+#include <netlink/socket.h>
+#include <netlink/attr.h>
+#include <netlink/route/link.h>
+#include <netlink/route/route.h>
+#include <netlink/route/qdisc.h>
+#include <netlink/route/qdisc/plug.h>
+
+typedef struct libxl__remus_device_nic {
+ int devid;
+
+ const char *vif;
+ const char *ifb;
+ struct rtnl_qdisc *qdisc;
+} libxl__remus_device_nic;
+
int libxl__netbuffer_enabled(libxl__gc *gc)
{
return 1;
}
+/*----- init() and cleanup() -----*/
+
+static int nic_init(libxl__remus_devices_state *rds)
+{
+ int rc, ret;
+
+ STATE_AO_GC(rds->ao);
+
+ rds->nlsock = nl_socket_alloc();
+ if (!rds->nlsock) {
+ LOG(ERROR, "cannot allocate nl socket");
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
+ ret = nl_connect(rds->nlsock, NETLINK_ROUTE);
+ if (ret) {
+ LOG(ERROR, "failed to open netlink socket: %s",
+ nl_geterror(ret));
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
+ /* get list of all qdiscs installed on network devs. */
+ ret = rtnl_qdisc_alloc_cache(rds->nlsock, &rds->qdisc_cache);
+ if (ret) {
+ LOG(ERROR, "failed to allocate qdisc cache: %s",
+ nl_geterror(ret));
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
+ rds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
+ libxl__xen_script_dir_path());
+
+ rc = 0;
+
+out:
+ return rc;
+}
+
+static void nic_cleanup(libxl__remus_devices_state *rds)
+{
+ STATE_AO_GC(rds->ao);
+
+ /* free qdisc cache */
+ if (rds->qdisc_cache) {
+ nl_cache_clear(rds->qdisc_cache);
+ nl_cache_free(rds->qdisc_cache);
+ rds->qdisc_cache = NULL;
+ }
+
+ /* close & free nlsock */
+ if (rds->nlsock) {
+ nl_close(rds->nlsock);
+ nl_socket_free(rds->nlsock);
+ rds->nlsock = NULL;
+ }
+}
+
+/*----- setup() and teardown() -----*/
+
+/* helper functions */
+
+/*
+ * If the device has a vifname, then use that instead of
+ * the vifX.Y format.
+ * it must ONLY be used for remus because if driver domains
+ * were in use it would constitute a security vulnerability.
+ */
+static const char *get_vifname(libxl__remus_device *dev,
+ const libxl_device_nic *nic)
+{
+ const char *vifname = NULL;
+ const char *path;
+ int rc;
+
+ STATE_AO_GC(dev->rds->ao);
+
+ /* Convenience aliases */
+ const uint32_t domid = dev->rds->domid;
+
+ path = GCSPRINTF("%s/backend/vif/%d/%d/vifname",
+ libxl__xs_get_dompath(gc, 0), domid, nic->devid);
+ rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
+ if (!rc && !vifname) {
+ vifname = libxl__device_nic_devname(gc, domid,
+ nic->devid,
+ nic->nictype);
+ }
+
+ return vifname;
+}
+
+static void free_qdisc(libxl__remus_device_nic *remus_nic)
+{
+ if (remus_nic->qdisc == NULL)
+ return;
+
+ nl_object_put((struct nl_object *)(remus_nic->qdisc));
+ remus_nic->qdisc = NULL;
+}
+
+static int init_qdisc(libxl__remus_devices_state *rds,
+ libxl__remus_device_nic *remus_nic)
+{
+ int rc, ret, ifindex;
+ struct rtnl_link *ifb = NULL;
+ struct rtnl_qdisc *qdisc = NULL;
+
+ STATE_AO_GC(rds->ao);
+
+ /* Now that we have brought up REMUS_IFB device with plug qdisc for
+ * this vif, so we need to refill the qdisc cache.
+ */
+ ret = nl_cache_refill(rds->nlsock, rds->qdisc_cache);
+ if (ret) {
+ LOG(ERROR, "cannot refill qdisc cache: %s", nl_geterror(ret));
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
+ /* get a handle to the REMUS_IFB interface */
+ ret = rtnl_link_get_kernel(rds->nlsock, 0, remus_nic->ifb, &ifb);
+ if (ret) {
+ LOG(ERROR, "cannot obtain handle for %s: %s", remus_nic->ifb,
+ nl_geterror(ret));
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
+ ifindex = rtnl_link_get_ifindex(ifb);
+ if (!ifindex) {
+ LOG(ERROR, "interface %s has no index", remus_nic->ifb);
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
+ /* Get a reference to the root qdisc installed on the REMUS_IFB, by
+ * querying the qdisc list we obtained earlier. The netbufscript
+ * sets up the plug qdisc as the root qdisc, so we don't have to
+ * search the entire qdisc tree on the REMUS_IFB dev.
+
+ * There is no need to explicitly free this qdisc as its just a
+ * reference from the qdisc cache we allocated earlier.
+ */
+ qdisc = rtnl_qdisc_get_by_parent(rds->qdisc_cache, ifindex, TC_H_ROOT);
+ if (qdisc) {
+ const char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
+ /* Sanity check: Ensure that the root qdisc is a plug qdisc. */
+ if (!tc_kind || strcmp(tc_kind, "plug")) {
+ LOG(ERROR, "plug qdisc is not installed on %s", remus_nic->ifb);
+ rc = ERROR_FAIL;
+ goto out;
+ }
+ remus_nic->qdisc = qdisc;
+ } else {
+ LOG(ERROR, "Cannot get qdisc handle from ifb %s", remus_nic->ifb);
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
+ rc = 0;
+
+out:
+ if (ifb)
+ rtnl_link_put(ifb);
+
+ if (rc && qdisc)
+ nl_object_put((struct nl_object *)qdisc);
+
+ return rc;
+}
+
+/* callbacks */
+
+static void netbuf_setup_script_cb(libxl__egc *egc,
+ libxl__async_exec_state *aes,
+ int status);
+static void netbuf_teardown_script_cb(libxl__egc *egc,
+ libxl__async_exec_state *aes,
+ int status);
+
+/*
+ * the script needs the following env & args
+ * $vifname
+ * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/)
+ * $REMUS_IFB (for teardown)
+ * setup/teardown as command line arg.
+ */
+static void setup_async_exec(libxl__remus_device *dev, char *op)
+{
+ int arraysize, nr = 0;
+ char **env = NULL, **args = NULL;
+ libxl__remus_device_nic *remus_nic = dev->concrete_data;
+ libxl__remus_devices_state *rds = dev->rds;
+ libxl__async_exec_state *aes = &dev->aodev.aes;
+
+ STATE_AO_GC(rds->ao);
+
+ /* Convenience aliases */
+ char *const script = libxl__strdup(gc, rds->netbufscript);
+ const uint32_t domid = rds->domid;
+ const int dev_id = remus_nic->devid;
+ const char *const vif = remus_nic->vif;
+ const char *const ifb = remus_nic->ifb;
+
+ arraysize = 7;
+ GCNEW_ARRAY(env, arraysize);
+ env[nr++] = "vifname";
+ env[nr++] = libxl__strdup(gc, vif);
+ env[nr++] = "XENBUS_PATH";
+ env[nr++] = GCSPRINTF("%s/remus/netbuf/%d",
+ libxl__xs_libxl_path(gc, domid), dev_id);
+ if (!strcmp(op, "teardown") && ifb) {
+ env[nr++] = "REMUS_IFB";
+ env[nr++] = libxl__strdup(gc, ifb);
+ }
+ env[nr++] = NULL;
+ assert(nr <= arraysize);
+
+ arraysize = 3; nr = 0;
+ GCNEW_ARRAY(args, arraysize);
+ args[nr++] = script;
+ args[nr++] = op;
+ args[nr++] = NULL;
+ assert(nr == arraysize);
+
+ aes->ao = dev->rds->ao;
+ aes->what = GCSPRINTF("%s %s", args[0], args[1]);
+ aes->env = env;
+ aes->args = args;
+ aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+ aes->stdfds[0] = -1;
+ aes->stdfds[1] = -1;
+ aes->stdfds[2] = -1;
+
+ if (!strcmp(op, "teardown"))
+ aes->callback = netbuf_teardown_script_cb;
+ else
+ aes->callback = netbuf_setup_script_cb;
+}
+
+/* setup() and teardown() */
+
+static void nic_setup(libxl__remus_device *dev)
+{
+ int rc;
+ libxl__remus_device_nic *remus_nic;
+ const libxl_device_nic *nic = dev->backend_dev;
+
+ STATE_AO_GC(dev->rds->ao);
+
+ /*
+ * thers's no subkind of nic devices, so nic ops is always matched
+ * with nic devices, we begin to setup the nic device
+ */
+ dev->set_up = 1;
+
+ GCNEW(remus_nic);
+ dev->concrete_data = remus_nic;
+ remus_nic->devid = nic->devid;
+ remus_nic->vif = get_vifname(dev, nic);
+ if (!remus_nic->vif) {
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
+ setup_async_exec(dev, "setup");
+ rc = libxl__async_exec_start(gc, &dev->aodev.aes);
+ if (rc)
+ goto out;
+
+ return;
+
+out:
+ dev->aodev.rc = rc;
+ dev->aodev.callback(dev->rds->egc, &dev->aodev);
+}
+
+/*
+ * In return, the script writes the name of REMUS_IFB device (during setup)
+ * to be used for output buffering into XENBUS_PATH/ifb
+ */
+static void netbuf_setup_script_cb(libxl__egc *egc,
+ libxl__async_exec_state *aes,
+ int status)
+{
+ libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+ libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+ libxl__remus_device_nic *remus_nic = dev->concrete_data;
+ libxl__remus_devices_state *rds = dev->rds;
+ const char *out_path_base, *hotplug_error = NULL;
+ int rc;
+
+ STATE_AO_GC(rds->ao);
+
+ /* Convenience aliases */
+ const uint32_t domid = rds->domid;
+ const int devid = remus_nic->devid;
+ const char *const vif = remus_nic->vif;
+ const char **const ifb = &remus_nic->ifb;
+
+ /*
+ * we need to get ifb first because it's needed for teardown
+ */
+ rc = libxl__xs_read_checked(gc, XBT_NULL,
+ GCSPRINTF("%s/remus/netbuf/%d/ifb",
+ libxl__xs_libxl_path(gc, domid),
+ devid),
+ ifb);
+ if (rc)
+ goto out;
+
+ if (!(*ifb)) {
+ LOG(ERROR, "Cannot get ifb dev name for domain %u dev %s",
+ domid, vif);
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
+ out_path_base = GCSPRINTF("%s/remus/netbuf/%d",
+ libxl__xs_libxl_path(gc, domid), devid);
+
+ rc = libxl__xs_read_checked(gc, XBT_NULL,
+ GCSPRINTF("%s/hotplug-error", out_path_base),
+ &hotplug_error);
+ if (rc)
+ goto out;
+
+ if (hotplug_error) {
+ LOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
+ rds->netbufscript, vif, hotplug_error);
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
+ if (status) {
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
+ LOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
+ rc = init_qdisc(rds, remus_nic);
+
+out:
+ aodev->rc = rc;
+ aodev->callback(egc, aodev);
+}
+
+static void nic_teardown(libxl__remus_device *dev)
+{
+ int rc;
+ STATE_AO_GC(dev->rds->ao);
+
+ setup_async_exec(dev, "teardown");
+
+ rc = libxl__async_exec_start(gc, &dev->aodev.aes);
+ if (rc)
+ goto out;
+
+ return;
+
+out:
+ dev->aodev.rc = rc;
+ dev->aodev.callback(dev->rds->egc, &dev->aodev);
+}
+
+static void netbuf_teardown_script_cb(libxl__egc *egc,
+ libxl__async_exec_state *aes,
+ int status)
+{
+ int rc;
+ libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+ libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+ libxl__remus_device_nic *remus_nic = dev->concrete_data;
+
+ if (status)
+ rc = ERROR_FAIL;
+ else
+ rc = 0;
+
+ free_qdisc(remus_nic);
+
+ aodev->rc = rc;
+ aodev->callback(egc, aodev);
+}
+
+/*----- checkpointing APIs -----*/
+
+/* The value of buffer_op, not the value passed to kernel */
+enum {
+ tc_buffer_start,
+ tc_buffer_release
+};
+
+/* API implementations */
+
+static int remus_netbuf_op(libxl__remus_device_nic *remus_nic,
+ libxl__remus_devices_state *rds,
+ int buffer_op)
+{
+ int rc, ret;
+
+ STATE_AO_GC(rds->ao);
+
+ if (buffer_op == tc_buffer_start)
+ ret = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
+ else
+ ret = rtnl_qdisc_plug_release_one(remus_nic->qdisc);
+
+ if (ret) {
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
+ ret = rtnl_qdisc_add(rds->nlsock, remus_nic->qdisc, NLM_F_REQUEST);
+ if (ret) {
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
+ rc = 0;
+
+out:
+ if (rc)
+ LOG(ERROR, "Remus: cannot do netbuf op %s on %s:%s",
+ ((buffer_op == tc_buffer_start) ?
+ "start_new_epoch" : "release_prev_epoch"),
+ remus_nic->ifb, nl_geterror(ret));
+ return rc;
+}
+
+static void nic_postsuspend(libxl__remus_device *dev)
+{
+ int rc;
+ libxl__remus_device_nic *remus_nic = dev->concrete_data;
+
+ STATE_AO_GC(dev->rds->ao);
+
+ rc = remus_netbuf_op(remus_nic, dev->rds, tc_buffer_start);
+
+ dev->aodev.rc = rc;
+ dev->aodev.callback(dev->rds->egc, &dev->aodev);
+}
+
+static void nic_commit(libxl__remus_device *dev)
+{
+ int rc;
+ libxl__remus_device_nic *remus_nic = dev->concrete_data;
+
+ STATE_AO_GC(dev->rds->ao);
+
+ rc = remus_netbuf_op(remus_nic, dev->rds, tc_buffer_release);
+
+ dev->aodev.rc = rc;
+ dev->aodev.callback(dev->rds->egc, &dev->aodev);
+}
+
+const libxl__remus_device_subkind_ops remus_device_nic = {
+ .kind = LIBXL__REMUS_DEVICE_NIC,
+ .init = nic_init,
+ .cleanup = nic_cleanup,
+ .setup = nic_setup,
+ .teardown = nic_teardown,
+ .postsuspend = nic_postsuspend,
+ .commit = nic_commit,
+};
+
/*
* Local variables:
* mode: C
diff --git a/tools/libxl/libxl_nonetbuffer.c b/tools/libxl/libxl_nonetbuffer.c
index 1c72a7f..28a8326 100644
--- a/tools/libxl/libxl_nonetbuffer.c
+++ b/tools/libxl/libxl_nonetbuffer.c
@@ -22,6 +22,31 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
return 0;
}
+static void nic_setup(libxl__remus_device *dev)
+{
+ STATE_AO_GC(dev->rds->ao);
+
+ dev->aodev.rc = ERROR_FAIL;
+ dev->aodev.callback(dev->rds->egc, &dev->aodev);
+}
+
+static int nic_init(libxl__remus_devices_state *rds)
+{
+ return 0;
+}
+
+static void nic_cleanup(libxl__remus_devices_state *rds)
+{
+ return;
+}
+
+const libxl__remus_device_subkind_ops remus_device_nic = {
+ .kind = LIBXL__REMUS_DEVICE_NIC,
+ .init = nic_init,
+ .cleanup = nic_cleanup,
+ .setup = nic_setup,
+};
+
/*
* Local variables:
* mode: C
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index 9ca9468..e9b0e20 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -17,7 +17,9 @@
#include "libxl_internal.h"
+extern const libxl__remus_device_subkind_ops remus_device_nic;
static const libxl__remus_device_subkind_ops *remus_ops[] = {
+ &remus_device_nic,
NULL,
};
--
1.9.1
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH v18 06/11] libxl/remus: setup and control disk replication for DRBD backends
2014-07-28 9:23 [PATCH v18 00/11] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
` (4 preceding siblings ...)
2014-07-28 9:23 ` [PATCH v18 05/11] libxl/remus: setup and control network output buffering Yang Hongyang
@ 2014-07-28 9:24 ` Yang Hongyang
2014-07-28 9:24 ` [PATCH v18 07/11] xl/remus: cmdline switch to explicitly enable unsafe configurations Yang Hongyang
` (5 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: Yang Hongyang @ 2014-07-28 9:24 UTC (permalink / raw)
To: xen-devel
Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
eddie.dong, rshriram, laijs
This patch adds the machinery required for protecting a guest's
disk state, when the guest disk uses a DRBD disk backend.
This patch comprises of two parts:
1. Hotplug scripts: The block-drbd-probe script is responsible for
performing sanity checks on the state of the DRBD disk before the
checkpointing process begins. This script should be invoked by
libxl for each of the guest's disk devices, when starting Remus.
2. Remus drbd disk device: Implements the interfaces required by the
remus abstract device layer. A note about the implementation:
a) setup() is called for each disk attached to the guest.
During setup():
i) The hotplug script is called to perform the sanity check.
ii) Libxl obtains a handle to the DRBD device (/dev/drbd*) and
and subsequently controls disk checkpoint replication using
this handle in the checkpoint callbacks.
c) The preresume() checkpoint callback is executed asynchronously
using libxl__ev_child_fork(), as it may potentially block for more
than few seconds in case of backup failure.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Edits to commit message:
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
---
tools/hotplug/Linux/Makefile | 1 +
tools/hotplug/Linux/block-drbd-probe | 85 ++++++++++++
tools/libxl/Makefile | 2 +-
tools/libxl/libxl.c | 1 +
tools/libxl/libxl_internal.h | 3 +
tools/libxl/libxl_remus_device.c | 2 +
tools/libxl/libxl_remus_disk_drbd.c | 260 +++++++++++++++++++++++++++++++++++
7 files changed, 353 insertions(+), 1 deletion(-)
create mode 100755 tools/hotplug/Linux/block-drbd-probe
create mode 100644 tools/libxl/libxl_remus_disk_drbd.c
diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index 721f8c0..15d1b37 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -24,6 +24,7 @@ XEN_SCRIPTS += xen-hotplug-cleanup
XEN_SCRIPTS += external-device-migrate
XEN_SCRIPTS += vscsi
XEN_SCRIPTS += block-iscsi
+XEN_SCRIPTS += block-drbd-probe
XEN_SCRIPTS += $(XEN_SCRIPTS-y)
XEN_SCRIPT_DATA = xen-script-common.sh locking.sh logging.sh
diff --git a/tools/hotplug/Linux/block-drbd-probe b/tools/hotplug/Linux/block-drbd-probe
new file mode 100755
index 0000000..3a3d446
--- /dev/null
+++ b/tools/hotplug/Linux/block-drbd-probe
@@ -0,0 +1,85 @@
+#! /bin/bash
+#
+# Copyright (C) 2014 FUJITSU LIMITED
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of version 2.1 of the GNU Lesser General Public
+# License as published by the Free Software Foundation.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+#
+# Usage:
+# block-drbd-probe devicename
+#
+# Return value:
+# 0: the device is drbd device
+# 1: the device is not drbd device
+# 2: unkown error
+# 3: the drbd device does not use protocol D
+# 4: the drbd device is not ready
+
+drbd_res=
+
+function get_res_name()
+{
+ local drbd_dev=$1
+ local drbd_dev_list=($(drbdadm sh-dev all))
+ local drbd_res_list=($(drbdadm sh-resource all))
+ local temp_drbd_dev temp_drbd_res
+ local found=0
+
+ for temp_drbd_dev in ${drbd_dev_list[@]}; do
+ if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
+ found=1
+ break
+ fi
+ done
+
+ if [[ $found -eq 0 ]]; then
+ return 1
+ fi
+
+ for temp_drbd_res in ${drbd_res_list[@]}; do
+ temp_drbd_dev=$(drbdadm sh-dev $temp_drbd_res)
+ if [[ "$temp_drbd_dev" == "$drbd_dev" ]]; then
+ drbd_res="$temp_drbd_res"
+ return 0
+ fi
+ done
+
+ # OOPS
+ return 2
+}
+
+get_res_name $1
+rc=$?
+if [[ $rc -ne 0 ]]; then
+ exit $rc
+fi
+
+# check protocol
+drbdsetup $1 show | grep -q "protocol D;"
+if [[ $? -ne 0 ]]; then
+ exit 3
+fi
+
+# check connect status
+state=$(drbdadm cstate "$drbd_res")
+if [[ "$state" != "Connected" ]]; then
+ exit 4
+fi
+
+# check role
+role=$(drbdadm role "$drbd_res")
+if [[ "$role" != "Primary/Secondary" ]]; then
+ exit 4
+fi
+
+exit 0
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 202f1bb..ba10ab7 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -56,7 +56,7 @@ else
LIBXL_OBJS-y += libxl_nonetbuffer.o
endif
-LIBXL_OBJS-y += libxl_remus_device.o
+LIBXL_OBJS-y += libxl_remus_device.o libxl_remus_disk_drbd.o
LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o
LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 191469b..021d77c 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -824,6 +824,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
goto out;
}
rds->device_kind_flags |= LIBXL__REMUS_DEVICE_NIC;
+ rds->device_kind_flags |= LIBXL__REMUS_DEVICE_DISK;
rds->ao = ao;
rds->egc = egc;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index c6f1411..e631eaf 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2613,6 +2613,9 @@ struct libxl__remus_devices_state {
char *netbufscript;
struct nl_sock *nlsock;
struct nl_cache *qdisc_cache;
+
+ /* private for drbd disk subkind ops */
+ char *drbd_probe_script;
};
/*
diff --git a/tools/libxl/libxl_remus_device.c b/tools/libxl/libxl_remus_device.c
index e9b0e20..b19c372 100644
--- a/tools/libxl/libxl_remus_device.c
+++ b/tools/libxl/libxl_remus_device.c
@@ -18,8 +18,10 @@
#include "libxl_internal.h"
extern const libxl__remus_device_subkind_ops remus_device_nic;
+extern const libxl__remus_device_subkind_ops remus_device_drbd_disk;
static const libxl__remus_device_subkind_ops *remus_ops[] = {
&remus_device_nic,
+ &remus_device_drbd_disk,
NULL,
};
diff --git a/tools/libxl/libxl_remus_disk_drbd.c b/tools/libxl/libxl_remus_disk_drbd.c
new file mode 100644
index 0000000..59db54f
--- /dev/null
+++ b/tools/libxl/libxl_remus_disk_drbd.c
@@ -0,0 +1,260 @@
+/*
+ * Copyright (C) 2014 FUJITSU LIMITED
+ * Author Lai Jiangshan <laijs@cn.fujitsu.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_osdeps.h" /* must come before any other headers */
+
+#include "libxl_internal.h"
+
+/*** drbd implementation ***/
+const int DRBD_SEND_CHECKPOINT = 20;
+const int DRBD_WAIT_CHECKPOINT_ACK = 30;
+
+typedef struct libxl__remus_drbd_disk {
+ int ctl_fd;
+ int ackwait;
+} libxl__remus_drbd_disk;
+
+/*----- helper functions, for async calls -----*/
+static void drbd_async_call(libxl__remus_device *dev,
+ void func(libxl__remus_device *),
+ libxl__ev_child_callback callback)
+{
+ int pid = -1, rc;
+ libxl__ao_device *aodev = &dev->aodev;
+ STATE_AO_GC(dev->rds->ao);
+
+ /* Fork and call */
+ pid = libxl__ev_child_fork(gc, &aodev->child, callback);
+ if (pid == -1) {
+ LOG(ERROR, "unable to fork");
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
+ if (!pid) {
+ /* child */
+ func(dev);
+ /* notreached */
+ abort();
+ }
+
+ return;
+
+out:
+ aodev->rc = rc;
+ aodev->callback(dev->rds->egc, aodev);
+}
+
+/*----- init() and cleanup() -----*/
+static int drbd_init(libxl__remus_devices_state *rds)
+{
+ STATE_AO_GC(rds->ao);
+
+ rds->drbd_probe_script = GCSPRINTF("%s/block-drbd-probe",
+ libxl__xen_script_dir_path());
+
+ return 0;
+}
+
+static void drbd_cleanup(libxl__remus_devices_state *rds)
+{
+ return;
+}
+
+/*----- match(), setup() and teardown() -----*/
+
+/* callbacks */
+static void match_async_exec_cb(libxl__egc *egc,
+ libxl__async_exec_state *aes,
+ int status);
+
+/* implementations */
+
+static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev);
+
+static void drbd_setup(libxl__remus_device *dev)
+{
+ STATE_AO_GC(dev->rds->ao);
+
+ match_async_exec(dev->rds->egc, dev);
+}
+
+static void match_async_exec(libxl__egc *egc, libxl__remus_device *dev)
+{
+ int arraysize, nr = 0, rc;
+ const libxl_device_disk *disk = dev->backend_dev;
+ libxl__async_exec_state *aes = &dev->aodev.aes;
+ STATE_AO_GC(dev->rds->ao);
+
+ /* setup env & args */
+ arraysize = 1;
+ GCNEW_ARRAY(aes->env, arraysize);
+ aes->env[nr++] = NULL;
+ assert(nr <= arraysize);
+
+ arraysize = 3;
+ nr = 0;
+ GCNEW_ARRAY(aes->args, arraysize);
+ aes->args[nr++] = dev->rds->drbd_probe_script;
+ aes->args[nr++] = disk->pdev_path;
+ aes->args[nr++] = NULL;
+ assert(nr <= arraysize);
+
+ aes->ao = dev->rds->ao;
+ aes->what = GCSPRINTF("%s %s", aes->args[0], aes->args[1]);
+ aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+ aes->callback = match_async_exec_cb;
+ aes->stdfds[0] = -1;
+ aes->stdfds[1] = -1;
+ aes->stdfds[2] = -1;
+
+ rc = libxl__async_exec_start(gc, aes);
+ if (rc)
+ goto out;
+
+ return;
+
+out:
+ dev->aodev.rc = rc;
+ dev->aodev.callback(egc, &dev->aodev);
+}
+
+static void match_async_exec_cb(libxl__egc *egc,
+ libxl__async_exec_state *aes,
+ int status)
+{
+ int rc;
+ libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes);
+ libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+ libxl__remus_drbd_disk *drbd_disk;
+ const libxl_device_disk *disk = dev->backend_dev;
+
+ STATE_AO_GC(aodev->ao);
+
+ if (status) {
+ rc = ERROR_REMUS_DEVOPS_DOES_NOT_MATCH;
+ goto out;
+ }
+
+ /* ops matched, setup the device */
+ dev->set_up = 1;
+
+ GCNEW(drbd_disk);
+ dev->concrete_data = drbd_disk;
+ drbd_disk->ackwait = 0;
+ drbd_disk->ctl_fd = open(disk->pdev_path, O_RDONLY);
+ if (drbd_disk->ctl_fd < 0) {
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
+ rc = 0;
+
+out:
+ aodev->rc = rc;
+ aodev->callback(egc, aodev);
+}
+
+static void drbd_teardown(libxl__remus_device *dev)
+{
+ libxl__remus_drbd_disk *drbd_disk = dev->concrete_data;
+ STATE_AO_GC(dev->rds->ao);
+
+ close(drbd_disk->ctl_fd);
+ dev->aodev.rc = 0;
+ dev->aodev.callback(dev->rds->egc, &dev->aodev);
+}
+
+/*----- checkpointing APIs -----*/
+
+/* callbacks */
+static void chekpoint_async_call_done(libxl__egc *egc,
+ libxl__ev_child *child,
+ pid_t pid, int status);
+
+/* API implementations */
+
+/* this op will not wait and block, so implement as sync op */
+static void drbd_postsuspend(libxl__remus_device *dev)
+{
+ STATE_AO_GC(dev->rds->ao);
+
+ libxl__remus_drbd_disk *rdd = dev->concrete_data;
+
+ if (!rdd->ackwait) {
+ if (ioctl(rdd->ctl_fd, DRBD_SEND_CHECKPOINT, 0) <= 0)
+ rdd->ackwait = 1;
+ }
+
+ dev->aodev.rc = 0;
+ dev->aodev.callback(dev->rds->egc, &dev->aodev);
+}
+
+
+static void drbd_preresume_async(libxl__remus_device *dev);
+
+static void drbd_preresume(libxl__remus_device *dev)
+{
+ STATE_AO_GC(dev->rds->ao);
+
+ drbd_async_call(dev, drbd_preresume_async, chekpoint_async_call_done);
+}
+
+static void drbd_preresume_async(libxl__remus_device *dev)
+{
+ libxl__remus_drbd_disk *rdd = dev->concrete_data;
+ int ackwait = rdd->ackwait;
+
+ if (ackwait) {
+ ioctl(rdd->ctl_fd, DRBD_WAIT_CHECKPOINT_ACK, 0);
+ ackwait = 0;
+ }
+
+ _exit(ackwait);
+}
+
+static void chekpoint_async_call_done(libxl__egc *egc,
+ libxl__ev_child *child,
+ pid_t pid, int status)
+{
+ int rc;
+ libxl__ao_device *aodev = CONTAINER_OF(child, *aodev, child);
+ libxl__remus_device *dev = CONTAINER_OF(aodev, *dev, aodev);
+ libxl__remus_drbd_disk *rdd = dev->concrete_data;
+
+ STATE_AO_GC(aodev->ao);
+
+ if (!WIFEXITED(status)) {
+ rc = ERROR_FAIL;
+ goto out;
+ }
+
+ rdd->ackwait = WEXITSTATUS(status);
+ rc = 0;
+
+out:
+ aodev->rc = rc;
+ aodev->callback(egc, aodev);
+}
+
+const libxl__remus_device_subkind_ops remus_device_drbd_disk = {
+ .kind = LIBXL__REMUS_DEVICE_DISK,
+ .init = drbd_init,
+ .cleanup = drbd_cleanup,
+ .setup = drbd_setup,
+ .teardown = drbd_teardown,
+ .postsuspend = drbd_postsuspend,
+ .preresume = drbd_preresume,
+};
--
1.9.1
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH v18 07/11] xl/remus: cmdline switch to explicitly enable unsafe configurations
2014-07-28 9:23 [PATCH v18 00/11] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
` (5 preceding siblings ...)
2014-07-28 9:24 ` [PATCH v18 06/11] libxl/remus: setup and control disk replication for DRBD backends Yang Hongyang
@ 2014-07-28 9:24 ` Yang Hongyang
2014-07-28 9:24 ` [PATCH v18 08/11] xl/remus: cmdline switches and config vars to control network buffering Yang Hongyang
` (4 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: Yang Hongyang @ 2014-07-28 9:24 UTC (permalink / raw)
To: xen-devel
Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
eddie.dong, rshriram, laijs
By default, network buffering and disk replication are enabled;
checkpoints are replicated to another standby VM.
This patch allows the user to disable any of these features by
explicitly specifying a 'run in unsafe mode' switch when invoking
the 'xl remus' command. While running Remus in an unsafe mode
makes little sense under normal circumstances, it is useful to be
able to disable one or more features mentioned above for
testing/debugging/profiling purposes.
Unless this option is enabled, it will not be possible to
replicate memory checkpoints to /dev/null (blackhole replication),
disable network buffering or disk replication.
As a starter, the use of blackhole replication now requires that
the unsafe mode be enabled. Subsequent patches will add support
for disabling network buffering and disk replication in a similar
manner.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
---
docs/man/xl.pod.1 | 15 ++++++++++-----
tools/libxl/libxl.c | 5 +++++
tools/libxl/libxl_types.idl | 1 +
tools/libxl/xl_cmdimpl.c | 11 ++++++++++-
tools/libxl/xl_cmdtable.c | 7 +++++--
5 files changed, 31 insertions(+), 8 deletions(-)
diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 30bd4bf..3aedead 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -441,11 +441,6 @@ B<OPTIONS>
Checkpoint domain memory every MS milliseconds (default 200ms).
-=item B<-b>
-
-Replicate memory checkpoints to /dev/null (blackhole).
-Generally useful for debugging.
-
=item B<-u>
Disable memory checkpoint compression.
@@ -460,6 +455,16 @@ If empty, run <host> instead of ssh <host> xl migrate-receive -r [-e].
On the new host, do not wait in the background (on <host>) for the death
of the domain. See the corresponding option of the I<create> subcommand.
+=item B<-F>
+
+Run Remus in unsafe mode. Use this option with caution as failover may
+not work as intended.
+
+=item B<-b>
+
+Replicate memory checkpoints to /dev/null (blackhole).
+Generally useful for debugging. Requires enabling unsafe mode.
+
=back
=item B<pause> I<domain-id>
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 021d77c..6e488ca 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -797,6 +797,11 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
libxl__domain_suspend_state *dss;
int rc;
+ if (!info->unsafe && info->blackhole) {
+ LOG(ERROR, "Unsafe mode must be enabled to replicate to /dev/null");
+ goto out;
+ }
+
libxl_domain_type type = libxl__domain_type(gc, domid);
if (type == LIBXL_DOMAIN_TYPE_INVALID) {
rc = ERROR_FAIL;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 25bd8f3..f4cff51 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -588,6 +588,7 @@ libxl_sched_credit_params = Struct("sched_credit_params", [
libxl_domain_remus_info = Struct("domain_remus_info",[
("interval", integer),
+ ("unsafe", bool),
("blackhole", bool),
("compression", bool),
])
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 01bce2f..3234d45 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7175,13 +7175,17 @@ int main_remus(int argc, char **argv)
memset(&r_info, 0, sizeof(libxl_domain_remus_info));
/* Defaults */
r_info.interval = 200;
+ r_info.unsafe = 0;
r_info.blackhole = 0;
r_info.compression = 1;
- SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) {
+ SWITCH_FOREACH_OPT(opt, "Fbui:s:e", NULL, "remus", 2) {
case 'i':
r_info.interval = atoi(optarg);
break;
+ case 'F':
+ r_info.unsafe = 1;
+ break;
case 'b':
r_info.blackhole = 1;
break;
@@ -7196,6 +7200,11 @@ int main_remus(int argc, char **argv)
break;
}
+ if (!r_info.unsafe && r_info.blackhole) {
+ perror("Unsafe mode must be enabled to replicate to /dev/null");
+ exit(-1);
+ }
+
domid = find_domain(argv[optind]);
host = argv[optind + 1];
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 4279b9f..1e24f1d 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -485,13 +485,16 @@ struct cmd_spec cmd_table[] = {
"Enable Remus HA for domain",
"[options] <Domain> [<host>]",
"-i MS Checkpoint domain memory every MS milliseconds (def. 200ms).\n"
- "-b Replicate memory checkpoints to /dev/null (blackhole)\n"
"-u Disable memory checkpoint compression.\n"
"-s <sshcommand> Use <sshcommand> instead of ssh. String will be passed\n"
" to sh. If empty, run <host> instead of \n"
" ssh <host> xl migrate-receive -r [-e]\n"
"-e Do not wait in the background (on <host>) for the death\n"
- " of the domain."
+ " of the domain.\n"
+ "-F Enable unsafe configurations [-b flags]. Use this option\n"
+ " with caution as failover may not work as intended.\n"
+ "-b Replicate memory checkpoints to /dev/null (blackhole).\n"
+ " Works only in unsafe mode."
},
#endif
{ "devd",
--
1.9.1
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH v18 08/11] xl/remus: cmdline switches and config vars to control network buffering
2014-07-28 9:23 [PATCH v18 00/11] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
` (6 preceding siblings ...)
2014-07-28 9:24 ` [PATCH v18 07/11] xl/remus: cmdline switch to explicitly enable unsafe configurations Yang Hongyang
@ 2014-07-28 9:24 ` Yang Hongyang
2014-07-28 9:24 ` [PATCH v18 09/11] xl/remus: add a cmdline switch to disable disk replication Yang Hongyang
` (3 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: Yang Hongyang @ 2014-07-28 9:24 UTC (permalink / raw)
To: xen-devel
Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
eddie.dong, rshriram, laijs
Add two members in libxl_domain_remus_info:
netbuf: whether netbuf is enabled
netbufscript: the path of the script which will be run to setup
and tear down the guest's interface.
Add cmdline switches to 'xl remus' command to enable or disable
network buffering and a domain-specific hotplug script to setup
network buffering.
Add a new config var 'remus.default.netbufscript' to xl.conf, that
allows the user to override the default global script used to
setup network buffering.
Note: Network buffering is enabled by default. Disabling network
buffering requires enabling unsafe mode.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
---
docs/man/xl.conf.pod.5 | 6 ++++++
docs/man/xl.pod.1 | 11 ++++++++++-
tools/libxl/libxl.c | 16 ++++++++++------
tools/libxl/libxl_netbuffer.c | 9 +++++++--
tools/libxl/libxl_types.idl | 2 ++
tools/libxl/xl.c | 4 ++++
tools/libxl/xl.h | 1 +
tools/libxl/xl_cmdimpl.c | 33 +++++++++++++++++++++++++--------
tools/libxl/xl_cmdtable.c | 7 +++++--
9 files changed, 70 insertions(+), 19 deletions(-)
diff --git a/docs/man/xl.conf.pod.5 b/docs/man/xl.conf.pod.5
index 7c43bde..8ae19bb 100644
--- a/docs/man/xl.conf.pod.5
+++ b/docs/man/xl.conf.pod.5
@@ -105,6 +105,12 @@ Configures the default gateway device to set for virtual network devices.
Default: C<None>
+=item B<remus.default.netbufscript="PATH">
+
+Configures the default script used by Remus to setup network buffering.
+
+Default: C</etc/xen/scripts/remus-netbuf-setup>
+
=item B<output_format="json|sxp">
Configures the default output format used by xl when printing "machine
diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 3aedead..2e5c36a 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -431,7 +431,7 @@ Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
mechanism between the two hosts.
N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
- There is no support for network or disk buffering at the moment.
+ There is no support for disk buffering at the moment.
B<OPTIONS>
@@ -455,6 +455,11 @@ If empty, run <host> instead of ssh <host> xl migrate-receive -r [-e].
On the new host, do not wait in the background (on <host>) for the death
of the domain. See the corresponding option of the I<create> subcommand.
+=item B<-N> I<netbufscript>
+
+Use <netbufscript> to setup network buffering instead of the
+default script (/etc/xen/scripts/remus-netbuf-setup).
+
=item B<-F>
Run Remus in unsafe mode. Use this option with caution as failover may
@@ -465,6 +470,10 @@ not work as intended.
Replicate memory checkpoints to /dev/null (blackhole).
Generally useful for debugging. Requires enabling unsafe mode.
+=item B<-n>
+
+Disable network output buffering. Requires enabling unsafe mode.
+
=back
=item B<pause> I<domain-id>
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 6e488ca..b329a04 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -797,8 +797,9 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
libxl__domain_suspend_state *dss;
int rc;
- if (!info->unsafe && info->blackhole) {
- LOG(ERROR, "Unsafe mode must be enabled to replicate to /dev/null");
+ if (!info->unsafe && (info->blackhole || !info->netbuf)) {
+ LOG(ERROR, "Unsafe mode must be enabled to replicate to /dev/null and "
+ "disable network buffering");
goto out;
}
@@ -824,11 +825,14 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
/* Convenience aliases */
libxl__remus_devices_state *const rds = &dss->rds;
- if (!libxl__netbuffer_enabled(gc)) {
- LOG(ERROR, "Remus: No support for network buffering");
- goto out;
+ if (info->netbuf) {
+ if (!libxl__netbuffer_enabled(gc)) {
+ LOG(ERROR, "Remus: No support for network buffering");
+ goto out;
+ }
+ rds->device_kind_flags |= LIBXL__REMUS_DEVICE_NIC;
}
- rds->device_kind_flags |= LIBXL__REMUS_DEVICE_NIC;
+
rds->device_kind_flags |= LIBXL__REMUS_DEVICE_DISK;
rds->ao = ao;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 5093e0d..e1d02af 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -43,6 +43,7 @@ int libxl__netbuffer_enabled(libxl__gc *gc)
static int nic_init(libxl__remus_devices_state *rds)
{
int rc, ret;
+ libxl__domain_suspend_state *dss = CONTAINER_OF(rds, *dss, rds);
STATE_AO_GC(rds->ao);
@@ -70,8 +71,12 @@ static int nic_init(libxl__remus_devices_state *rds)
goto out;
}
- rds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
- libxl__xen_script_dir_path());
+ if (dss->remus->netbufscript) {
+ rds->netbufscript = libxl__strdup(gc, dss->remus->netbufscript);
+ } else {
+ rds->netbufscript = GCSPRINTF("%s/remus-netbuf-setup",
+ libxl__xen_script_dir_path());
+ }
rc = 0;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index f4cff51..78dcee6 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -591,6 +591,8 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
("unsafe", bool),
("blackhole", bool),
("compression", bool),
+ ("netbuf", bool),
+ ("netbufscript", string),
])
libxl_event_type = Enumeration("event_type", [
diff --git a/tools/libxl/xl.c b/tools/libxl/xl.c
index 4c5a5ee..f014306 100644
--- a/tools/libxl/xl.c
+++ b/tools/libxl/xl.c
@@ -44,6 +44,7 @@ char *default_vifscript = NULL;
char *default_bridge = NULL;
char *default_gatewaydev = NULL;
char *default_vifbackend = NULL;
+char *default_remus_netbufscript = NULL;
enum output_format default_output_format = OUTPUT_FORMAT_JSON;
int claim_mode = 1;
bool progress_use_cr = 0;
@@ -176,6 +177,9 @@ static void parse_global_config(const char *configfile,
if (!xlu_cfg_get_long (config, "claim_mode", &l, 0))
claim_mode = l;
+ xlu_cfg_replace_string (config, "remus.default.netbufscript",
+ &default_remus_netbufscript, 0);
+
xlu_cfg_destroy(config);
}
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 10a2e66..087eb8c 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -170,6 +170,7 @@ extern char *default_vifscript;
extern char *default_bridge;
extern char *default_gatewaydev;
extern char *default_vifbackend;
+extern char *default_remus_netbufscript;
extern char *blkdev_start;
enum output_format {
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 3234d45..ea77116 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7178,8 +7178,9 @@ int main_remus(int argc, char **argv)
r_info.unsafe = 0;
r_info.blackhole = 0;
r_info.compression = 1;
+ r_info.netbuf = 1;
- SWITCH_FOREACH_OPT(opt, "Fbui:s:e", NULL, "remus", 2) {
+ SWITCH_FOREACH_OPT(opt, "Fbuni:s:N:e", NULL, "remus", 2) {
case 'i':
r_info.interval = atoi(optarg);
break;
@@ -7192,6 +7193,12 @@ int main_remus(int argc, char **argv)
case 'u':
r_info.compression = 0;
break;
+ case 'n':
+ r_info.netbuf = 0;
+ break;
+ case 'N':
+ r_info.netbufscript = optarg;
+ break;
case 's':
ssh_command = optarg;
break;
@@ -7200,14 +7207,18 @@ int main_remus(int argc, char **argv)
break;
}
- if (!r_info.unsafe && r_info.blackhole) {
- perror("Unsafe mode must be enabled to replicate to /dev/null");
+ if (!r_info.unsafe && (r_info.blackhole || !r_info.netbuf)) {
+ perror("Unsafe mode must be enabled to replicate to /dev/null and "
+ "disable network buffering");
exit(-1);
}
domid = find_domain(argv[optind]);
host = argv[optind + 1];
+ if (!r_info.netbufscript)
+ r_info.netbufscript = default_remus_netbufscript;
+
if (r_info.blackhole) {
send_fd = open("/dev/null", O_RDWR, 0644);
if (send_fd < 0) {
@@ -7245,13 +7256,19 @@ int main_remus(int argc, char **argv)
/* Point of no return */
rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd, 0);
- /* If we are here, it means backup has failed/domain suspend failed.
- * Try to resume the domain and exit gracefully.
- * TODO: Split-Brain check.
+ /* check if the domain exists. User may have xl destroyed the
+ * domain to force failover
*/
- fprintf(stderr, "remus sender: libxl_domain_suspend failed"
- " (rc=%d)\n", rc);
+ if (libxl_domain_info(ctx, 0, domid)) {
+ fprintf(stderr, "Remus: Primary domain has been destroyed.\n");
+ close(send_fd);
+ return 0;
+ }
+ /* If we are here, it means remus setup/domain suspend/backup has
+ * failed. Try to resume the domain and exit gracefully.
+ * TODO: Split-Brain check.
+ */
if (rc == ERROR_GUEST_TIMEDOUT)
fprintf(stderr, "Failed to suspend domain at primary.\n");
else {
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 1e24f1d..808f400 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -491,10 +491,13 @@ struct cmd_spec cmd_table[] = {
" ssh <host> xl migrate-receive -r [-e]\n"
"-e Do not wait in the background (on <host>) for the death\n"
" of the domain.\n"
- "-F Enable unsafe configurations [-b flags]. Use this option\n"
+ "-N <netbufscript> Use netbufscript to setup network buffering instead of the\n"
+ " default script (/etc/xen/scripts/remus-netbuf-setup).\n"
+ "-F Enable unsafe configurations [-b|-n flags]. Use this option\n"
" with caution as failover may not work as intended.\n"
"-b Replicate memory checkpoints to /dev/null (blackhole).\n"
- " Works only in unsafe mode."
+ " Works only in unsafe mode.\n"
+ "-n Disable network output buffering. Works only in unsafe mode."
},
#endif
{ "devd",
--
1.9.1
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH v18 09/11] xl/remus: add a cmdline switch to disable disk replication
2014-07-28 9:23 [PATCH v18 00/11] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
` (7 preceding siblings ...)
2014-07-28 9:24 ` [PATCH v18 08/11] xl/remus: cmdline switches and config vars to control network buffering Yang Hongyang
@ 2014-07-28 9:24 ` Yang Hongyang
2014-07-28 9:24 ` [PATCH v18 10/11] libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl Yang Hongyang
` (2 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: Yang Hongyang @ 2014-07-28 9:24 UTC (permalink / raw)
To: xen-devel
Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
eddie.dong, rshriram, laijs
Disk replication is enabled by default. This patch adds a cmdline
switch to 'xl remus' command to explicitly disable disk replication.
A new boolean field 'diskbuf' is added to the libxl_domain_remus_info
structure to represent this configuration option inside libxl.
Note: Disabling disk replication requires enabling unsafe mode.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
---
docs/man/xl.pod.1 | 6 +++++-
tools/libxl/libxl.c | 9 +++++----
tools/libxl/libxl_types.idl | 1 +
tools/libxl/xl_cmdimpl.c | 13 +++++++++----
tools/libxl/xl_cmdtable.c | 5 +++--
5 files changed, 23 insertions(+), 11 deletions(-)
diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 2e5c36a..f544945 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -431,7 +431,7 @@ Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
mechanism between the two hosts.
N.B: Remus support in xl is still in experimental (proof-of-concept) phase.
- There is no support for disk buffering at the moment.
+ Disk replication support is limited to DRBD disks.
B<OPTIONS>
@@ -474,6 +474,10 @@ Generally useful for debugging. Requires enabling unsafe mode.
Disable network output buffering. Requires enabling unsafe mode.
+=item B<-d>
+
+Disable disk replication. Requires enabling unsafe mode.
+
=back
=item B<pause> I<domain-id>
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index b329a04..8182966 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -797,9 +797,9 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
libxl__domain_suspend_state *dss;
int rc;
- if (!info->unsafe && (info->blackhole || !info->netbuf)) {
- LOG(ERROR, "Unsafe mode must be enabled to replicate to /dev/null and "
- "disable network buffering");
+ if (!info->unsafe && (info->blackhole || !info->netbuf || !info->diskbuf)) {
+ LOG(ERROR, "Unsafe mode must be enabled to replicate to /dev/null,"
+ "disable network buffering and disk replication");
goto out;
}
@@ -833,7 +833,8 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
rds->device_kind_flags |= LIBXL__REMUS_DEVICE_NIC;
}
- rds->device_kind_flags |= LIBXL__REMUS_DEVICE_DISK;
+ if (info->diskbuf)
+ rds->device_kind_flags |= LIBXL__REMUS_DEVICE_DISK;
rds->ao = ao;
rds->egc = egc;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 78dcee6..542dad3 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -593,6 +593,7 @@ libxl_domain_remus_info = Struct("domain_remus_info",[
("compression", bool),
("netbuf", bool),
("netbufscript", string),
+ ("diskbuf", bool),
])
libxl_event_type = Enumeration("event_type", [
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index ea77116..6ea7894 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7179,8 +7179,9 @@ int main_remus(int argc, char **argv)
r_info.blackhole = 0;
r_info.compression = 1;
r_info.netbuf = 1;
+ r_info.diskbuf = 1;
- SWITCH_FOREACH_OPT(opt, "Fbuni:s:N:e", NULL, "remus", 2) {
+ SWITCH_FOREACH_OPT(opt, "Fbundi:s:N:e", NULL, "remus", 2) {
case 'i':
r_info.interval = atoi(optarg);
break;
@@ -7199,6 +7200,9 @@ int main_remus(int argc, char **argv)
case 'N':
r_info.netbufscript = optarg;
break;
+ case 'd':
+ r_info.diskbuf = 0;
+ break;
case 's':
ssh_command = optarg;
break;
@@ -7207,9 +7211,10 @@ int main_remus(int argc, char **argv)
break;
}
- if (!r_info.unsafe && (r_info.blackhole || !r_info.netbuf)) {
- perror("Unsafe mode must be enabled to replicate to /dev/null and "
- "disable network buffering");
+ if (!r_info.unsafe &&
+ (r_info.blackhole || !r_info.netbuf || !r_info.diskbuf)) {
+ perror("Unsafe mode must be enabled to replicate to /dev/null,"
+ "disable network buffering and disk replication");
exit(-1);
}
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 808f400..3b33b05 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -493,11 +493,12 @@ struct cmd_spec cmd_table[] = {
" of the domain.\n"
"-N <netbufscript> Use netbufscript to setup network buffering instead of the\n"
" default script (/etc/xen/scripts/remus-netbuf-setup).\n"
- "-F Enable unsafe configurations [-b|-n flags]. Use this option\n"
+ "-F Enable unsafe configurations [-b|-n|-d flags]. Use this option\n"
" with caution as failover may not work as intended.\n"
"-b Replicate memory checkpoints to /dev/null (blackhole).\n"
" Works only in unsafe mode.\n"
- "-n Disable network output buffering. Works only in unsafe mode."
+ "-n Disable network output buffering. Works only in unsafe mode.\n"
+ "-d Disable disk replication. Works only in unsafe mode."
},
#endif
{ "devd",
--
1.9.1
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH v18 10/11] libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl
2014-07-28 9:23 [PATCH v18 00/11] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
` (8 preceding siblings ...)
2014-07-28 9:24 ` [PATCH v18 09/11] xl/remus: add a cmdline switch to disable disk replication Yang Hongyang
@ 2014-07-28 9:24 ` Yang Hongyang
2014-07-28 9:24 ` [PATCH v18 11/11] MAINTAINERS: update maintained files of Remus Yang Hongyang
2014-08-07 1:17 ` [PATCH v18 00/11] Remus/Libxl: Remus network buffering and drbd disk Hongyang Yang
11 siblings, 0 replies; 20+ messages in thread
From: Yang Hongyang @ 2014-07-28 9:24 UTC (permalink / raw)
To: xen-devel
Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
eddie.dong, rshriram, laijs
Add LIBXL_HAVE_REMUS to indicate Remus support in libxl
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
tools/libxl/libxl.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 5ae6532..81905b3 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -579,6 +579,12 @@ typedef struct libxl__ctx libxl_ctx;
*/
#define LIBXL_HAVE_CPUPOOL_NAME 1
+/*
+ * LIBXL_HAVE_REMUS
+ * If this is defined, then libxl supports remus.
+ */
+#define LIBXL_HAVE_REMUS 1
+
typedef uint8_t libxl_mac[6];
#define LIBXL_MAC_FMT "%02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx"
#define LIBXL_MAC_FMTLEN ((2*6)+5) /* 6 hex bytes plus 5 colons */
--
1.9.1
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH v18 11/11] MAINTAINERS: update maintained files of Remus
2014-07-28 9:23 [PATCH v18 00/11] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
` (9 preceding siblings ...)
2014-07-28 9:24 ` [PATCH v18 10/11] libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl Yang Hongyang
@ 2014-07-28 9:24 ` Yang Hongyang
2014-08-07 1:17 ` [PATCH v18 00/11] Remus/Libxl: Remus network buffering and drbd disk Hongyang Yang
11 siblings, 0 replies; 20+ messages in thread
From: Yang Hongyang @ 2014-07-28 9:24 UTC (permalink / raw)
To: xen-devel
Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, ian.jackson,
eddie.dong, rshriram, laijs
Add Remus specific hotplug scripts and libxl files
to the list of maintained files.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
MAINTAINERS | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 266e47b..c700aa5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -260,8 +260,15 @@ M: Shriram Rajagopalan <rshriram@cs.ubc.ca>
M: Yang Hongyang <yanghy@cn.fujitsu.com>
S: Maintained
F: docs/README.remus
+F: tools/libxc/xc_domain_save.c
+F: tools/libxc/xc_domain_restore.c
F: tools/blktap2/drivers/block-remus.c
F: tools/blktap2/drivers/hashtable*
+F: tools/libxl/libxl_remus_*
+F: tools/libxl/libxl_netbuffer.c
+F: tools/libxl/libxl_nonetbuffer.c
+F: tools/hotplug/Linux/remus-netbuf-setup
+F: tools/hotplug/Linux/block-drbd-probe
SCHEDULING
M: George Dunlap <george.dunlap@eu.citrix.com>
--
1.9.1
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v18 00/11] Remus/Libxl: Remus network buffering and drbd disk
2014-07-28 9:23 [PATCH v18 00/11] Remus/Libxl: Remus network buffering and drbd disk Yang Hongyang
` (10 preceding siblings ...)
2014-07-28 9:24 ` [PATCH v18 11/11] MAINTAINERS: update maintained files of Remus Yang Hongyang
@ 2014-08-07 1:17 ` Hongyang Yang
11 siblings, 0 replies; 20+ messages in thread
From: Hongyang Yang @ 2014-08-07 1:17 UTC (permalink / raw)
To: xen-devel
Cc: ian.campbell, wency, andrew.cooper3, yunhong.jiang, eddie.dong,
rshriram, ian.jackson, laijs
Ping!
在 07/28/2014 05:23 PM, Yang Hongyang 写道:
> This patch series adds support for network buffering and drbd disk
> in the Remus codebase in libxl.
>
> the code is also hosted on github:
> url: https://github.com/macrosheep/xen/tree/remus-v18
>
> Changes in v18:
> Merge match() and setup() api.
> Reuse libxl__multidev and libxl__ao_device.
> Commit messages and code comments improved. Thanks to Shriram.
> Rebased.
>
> Changes in v17:
> Make remus device abstract layer more generic.
> Addressed Ian J's comments.
>
> Changes in v16:
> Merge libxl__remus_state and libxl__remus_device_state.
> Pass the ops to device abstract layer instead of defined it in the layer.
> Optimized subkind ops APIs.
> Addressed Ian J's comments.
> Rebased.
>
> Changes in v15:
> The first patch in v14 has been taken, so remove it from the patchset.
> Add a patch to Update maintained files of REMUS.
> Rebased.
>
> Changes in v14:
> Addressed IanJ's comments.
> Rebased.
>
> Changes in v13:
> Addressed Konrad's comments.
> Rebased.
>
> Changes in v12:
> Add disk buffering cmdline switch.
>
> Changes in v11:
> Addressed comments from Ian J and Shriram.
> Add drbd disk implement into this patch series.
>
> Changes in V10:
> Restructured the whole patch series.
> Introduce the remus device abstract layer.
> Make remus checkpoint asynchronous.
>
> Changes in V9:
> Use async exec script api to exec scripts.
>
> Changes in V8:
> Applied some comments(by IanJ).
> Merge some struct definitions to it's implementation.
> (2/3/5 in V7 => 3 in V8)
>
> Changes in V7:
> Applied missing comments(by IanJ).
> Applied Shriram comments.
>
> merge netbufering tangled setup/teardown code into one patch.
> (2/6/8 in V6 => 5 in V7. 9/10 in V6 => 7 in V7)
>
> Changes in V6:
> Applied Ian Jackson's comments of V5 series.
> the [PATCH 2/4 V5] is split by small functionalities.
>
> [PATCH 4/4 V5] --> [PATCH 13/13] netbuffer is default enabled.
>
> Changes in V5:
>
> Merge hotplug script patch (2/5) and hotplug script setup/teardown
> patch (3/5) into a single patch.
>
> Changes in V4:
>
> [1/5] Remove check for libnl command line utils in autoconf checks
>
> [2/5] minor nits
>
> [3/5] define LIBXL_HAVE_REMUS_NETBUF in libxl.h
>
> [4/5] clean ups. Make the usleep in checkpoint callback asynchronous
>
> [5/5] minor nits
>
> Changes in V3:
> [1/5] Fix redundant checks in configure scripts
> (based on Ian Campbell's suggestions)
>
> [2/5] Introduce locking in the script, during IFB setup.
> Add xenstore paths used by netbuf scripts
> to xenstore-paths.markdown
>
> [3/5] Hotplug scripts setup/teardown invocations are now asynchronous
> following IanJ's feedback. However, the invocations are still
> sequential.
>
> [5/5] Allow per-domain specification of netbuffer scripts in xl remus
> commmand.
>
> And minor nits throughout the series based on feedback from
> the last version
>
> Changes in V2:
> [1/5] Configure script will automatically enable/disable network
> buffer support depending on the availability of the appropriate
> libnl3 version. [If libnl3 is unavailable, a warning message will be
> printed to let the user know that the feature has been disabled.]
>
> use macros from pkg.m4 instead of pkg-config commands
> removed redundant checks for libnl3 libraries.
>
> [3,4/5] - Minor nits.
>
> Version 1:
>
> [1/5] Changes to autoconf scripts to check for libnl3. Add linker flags
> to libxl Makefile.
>
> [2/5] External script to setup/teardown network buffering using libnl3's
> CLI. This script will be invoked by libxl before starting Remus.
> The script's main job is to bring up an IFB device with plug qdisc
> attached to it. It then re-routes egress traffic from the guest's
> vif to the IFB device.
>
> [3/5] Libxl code to invoke the external setup script, followed by netlink
> related setup to obtain a handle on the output buffers attached
> to each vif.
>
> [4/5] Libxl interaction with network buffer module in the kernel via
> libnl3 API.
>
> [5/5] xl cmdline switch to explicitly enable network buffering when
> starting remus.
>
>
> Few things to note(by shriram):
>
> a) Based on previous email discussions, the setup/teardown task has
> been moved to a hotplug style shell script which can be customized as
> desired, instead of implementing it as C code inside libxl.
>
> b) Libnl3 is not available on NetBSD. Nor is it available on CentOS
> (Linux). So I have made network buffering support an optional feature
> so that it can be disabled if desired.
>
> c) NetBSD does not have libnl3. So I have put the setup script under
> tools/hotplug/Linux folder.
>
> thanks
>
> Legend:
> A - acked
> D - previous acked, but new change introduced so acked-by dropped
> M - Modified
> S - the same version as last round
> No marker - new patch
>
> Yang Hongyang (11):
> libxl: introduce libxl__multidev_prepare_with_aodev
> libxl: add support for async. function calls when using
> libxl__ao_device
> A autoconf: add libnl3 dependency for Remus network buffering support
> M libxl/remus: introduce an abstract Remus device layer
> M libxl/remus: setup and control network output buffering
> M libxl/remus: setup and control disk replication for DRBD backends
> xl/remus: cmdline switch to explicitly enable unsafe configurations
> M xl/remus: cmdline switches and config vars to control network
> buffering
> M xl/remus: add a cmdline switch to disable disk replication
> S libxl/remus: add LIBXL_HAVE_REMUS to indicate Remus support in libxl
> M MAINTAINERS: update maintained files of Remus
>
> MAINTAINERS | 7 +
> README | 4 +
> config/Tools.mk.in | 4 +
> docs/README.remus | 6 +
> docs/man/xl.conf.pod.5 | 6 +
> docs/man/xl.pod.1 | 30 +-
> docs/misc/xenstore-paths.markdown | 4 +
> tools/configure.ac | 16 +
> tools/hotplug/Linux/Makefile | 2 +
> tools/hotplug/Linux/block-drbd-probe | 85 ++++++
> tools/hotplug/Linux/remus-netbuf-setup | 237 +++++++++++++++
> tools/libxl/Makefile | 15 +
> tools/libxl/libxl.c | 65 +++-
> tools/libxl/libxl.h | 6 +
> tools/libxl/libxl_device.c | 14 +-
> tools/libxl/libxl_dom.c | 172 ++++++++++-
> tools/libxl/libxl_internal.h | 204 ++++++++++++-
> tools/libxl/libxl_netbuffer.c | 521 +++++++++++++++++++++++++++++++++
> tools/libxl/libxl_nonetbuffer.c | 56 ++++
> tools/libxl/libxl_remus_device.c | 302 +++++++++++++++++++
> tools/libxl/libxl_remus_disk_drbd.c | 260 ++++++++++++++++
> tools/libxl/libxl_types.idl | 6 +
> tools/libxl/xl.c | 4 +
> tools/libxl/xl.h | 1 +
> tools/libxl/xl_cmdimpl.c | 43 ++-
> tools/libxl/xl_cmdtable.c | 11 +-
> 26 files changed, 2042 insertions(+), 39 deletions(-)
> create mode 100755 tools/hotplug/Linux/block-drbd-probe
> create mode 100644 tools/hotplug/Linux/remus-netbuf-setup
> create mode 100644 tools/libxl/libxl_netbuffer.c
> create mode 100644 tools/libxl/libxl_nonetbuffer.c
> create mode 100644 tools/libxl/libxl_remus_device.c
> create mode 100644 tools/libxl/libxl_remus_disk_drbd.c
>
--
Thanks,
Yang.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread