* [RFC] use event channel to improve suspend speed
@ 2007-05-09 0:01 Brendan Cully
2007-05-10 22:13 ` Brendan Cully
0 siblings, 1 reply; 8+ messages in thread
From: Brendan Cully @ 2007-05-09 0:01 UTC (permalink / raw)
To: Xen Developers
[-- Attachment #1: Type: text/plain, Size: 1503 bytes --]
Hi,
I've been doing a little work on improving the latency of guest domain
suspends. I've added a couple of printfs into xc_domain_save around
the last round, and hooked up a harness to loop over the last round
code every couple of seconds. Here are some numbers for a run of 100
last rounds (from just before the suspend callback to just before it
would exit), on a 3.2 Ghz P4 with 1 GB of RAM, 128 MB of which goes to
a guest. This approximates the best-case downtime for live migration,
I think.
current code:
avg: 133.57 ms, min: 82.53, max: 559.86, median: 135.63
with the attached patch:
avg: 36.05 ms, min: 33.99, max: 52.14, median: 35.51
The patch creates an event channel in the guest that fires the suspend
code. xc_save can use this to suspend the domain instead of calling
back to xend, which then writes a xenstore entry, which then causes a
watch to fire in the guest. It seems the xenstore interaction is
fairly slow and very jittery.
This isn't intended for 3.1, but I thought I'd put it out just in case
anyone else finds it interesting. I'd appreciate comments about the
approach.
There's also a fair amount of latency involved in xend receiving the
notification that the domain has suspended and passing that back on to
xc_save. A quick hack to let xc_save simply loop on xc_domain_getinfo
until the domain suspends indicates that it should be fairly easy to
cut the suspend latency in half again, to about 15ms. I'll see about
finding a clean equivalent of this...
Comments?
[-- Attachment #2: suspend-via-evtchn.patch --]
[-- Type: text/x-patch, Size: 6342 bytes --]
# HG changeset patch
# User Brendan Cully <brendan@cs.ubc.ca>
# Date 1178666168 25200
# Node ID 9b400773f70d590c4a3ccfd6184e58bcc89cbf2f
# Parent 8f19dcd7132965ecd7f96dccaa25cf8aa94a0af3
Set up an event channel to signal guest suspend.
Have xc_save use it directly instead of waiting for xend to signal the
guest via xenstore. This cuts the overhead for the last round in half
in my tests.
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
diff -r 8f19dcd71329 -r 9b400773f70d linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c
--- a/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Mon May 07 15:43:52 2007 -0700
+++ b/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Tue May 08 16:16:08 2007 -0700
@@ -27,6 +27,8 @@ void (*pm_power_off)(void);
void (*pm_power_off)(void);
EXPORT_SYMBOL(pm_power_off);
+int setup_suspend_evtchn(void);
+
void machine_emergency_restart(void)
{
/* We really want to get pending console data out before we die. */
@@ -227,6 +229,7 @@ int __xen_suspend(int fast_suspend)
if (!suspend_cancelled) {
xencons_resume();
xenbus_resume();
+ setup_suspend_evtchn();
} else {
xenbus_suspend_cancel();
}
diff -r 8f19dcd71329 -r 9b400773f70d linux-2.6-xen-sparse/drivers/xen/core/reboot.c
--- a/linux-2.6-xen-sparse/drivers/xen/core/reboot.c Mon May 07 15:43:52 2007 -0700
+++ b/linux-2.6-xen-sparse/drivers/xen/core/reboot.c Tue May 08 16:16:08 2007 -0700
@@ -7,6 +7,7 @@
#include <linux/sysrq.h>
#include <asm/hypervisor.h>
#include <xen/xenbus.h>
+#include <xen/evtchn.h>
#include <linux/kthread.h>
#ifdef HAVE_XEN_PLATFORM_COMPAT_H
@@ -199,6 +200,36 @@ static struct xenbus_watch sysrq_watch =
.callback = sysrq_handler
};
+static irqreturn_t suspend_int(int irq, void* dev_id, struct pt_regs *ptregs)
+{
+ shutting_down = SHUTDOWN_SUSPEND;
+ schedule_work(&shutdown_work);
+
+ return IRQ_HANDLED;
+}
+
+int setup_suspend_evtchn(void)
+{
+ static int irq = -1;
+ int port;
+ char portstr[5]; /* 1024 max? */
+
+ if (irq > 0)
+ unbind_from_irqhandler(irq, NULL);
+
+ /* TODO: get other end dynamically */
+ irq = bind_listening_port_to_irqhandler(0, suspend_int, 0, "suspend", NULL);
+ if (irq <= 0) {
+ return -1;
+ }
+ port = irq_to_evtchn_port(irq);
+ printk(KERN_ERR "suspend: event channel %d\n", port);
+ sprintf(portstr, "%d", port);
+ xenbus_write(XBT_NIL, "device/suspend", "event-channel", portstr);
+
+ return 0;
+}
+
static int setup_shutdown_watcher(void)
{
int err;
@@ -217,6 +248,13 @@ static int setup_shutdown_watcher(void)
if (err) {
printk(KERN_ERR "Failed to set sysrq watcher\n");
return err;
+ }
+
+ /* suspend event channel */
+ err = setup_suspend_evtchn();
+ if (err) {
+ printk(KERN_ERR "Failed to register suspend event channel\n");
+ return err;
}
return 0;
diff -r 8f19dcd71329 -r 9b400773f70d tools/python/xen/xend/XendCheckpoint.py
--- a/tools/python/xen/xend/XendCheckpoint.py Mon May 07 15:43:52 2007 -0700
+++ b/tools/python/xen/xend/XendCheckpoint.py Tue May 08 16:16:08 2007 -0700
@@ -92,6 +92,7 @@ def save(fd, dominfo, network, live, dst
if line == "suspend":
log.debug("Suspending %d ...", dominfo.getDomid())
dominfo.shutdown('suspend')
+ if line in ('suspend', 'suspended'):
dominfo.waitForShutdown()
dominfo.migrateDevices(network, dst, DEV_MIGRATE_STEP2,
domain_name)
diff -r 8f19dcd71329 -r 9b400773f70d tools/xcutils/xc_save.c
--- a/tools/xcutils/xc_save.c Mon May 07 15:43:52 2007 -0700
+++ b/tools/xcutils/xc_save.c Tue May 08 16:16:08 2007 -0700
@@ -23,8 +23,13 @@
#include <xenctrl.h>
#include <xenguest.h>
+#include <errno.h>
+
/* defined in xc_linux_save. Yes, this is cheezy. */
extern int do_last_round;
+
+static int xce = -1;
+static int suspend_evtchn = -1;
/**
* Issue a suspend request through stdout, and receive the acknowledgement
@@ -32,9 +37,24 @@ extern int do_last_round;
*/
static int suspend(int domid)
{
+ static char* suspend = "suspend\n";
+ static char* suspended = "suspended\n";
+
char ans[30];
-
- printf("suspend\n");
+ char* msg;
+ int rc;
+
+ msg = suspend;
+
+ if (suspend_evtchn > 0) {
+ rc = xc_evtchn_notify(xce, suspend_evtchn);
+ if (rc < 0)
+ fprintf(stderr, "failed to notify suspend event channel: %d\n", rc);
+ else
+ msg = suspended;
+ }
+
+ printf(msg);
fflush(stdout);
return (fgets(ans, sizeof(ans), stdin) != NULL &&
@@ -183,6 +203,59 @@ static void sigusr1(int sig)
do_last_round = 1;
}
+static int setup_suspend_evtchn(unsigned int domid)
+{
+ struct xs_handle* xs;
+ char path[128];
+ char *portstr;
+ unsigned int plen;
+ int port;
+ int rc = -1;
+
+ xce = xc_evtchn_open();
+ if (xce < 0) {
+ fprintf(stderr, "failed to open event channel handle\n");
+ return -1;
+ }
+
+ xs = xs_daemon_open_readonly();
+ if (!xs) {
+ fprintf(stderr, "failed to open xenstore handle\n");
+ goto xce_err;
+ }
+
+ sprintf(path, "/local/domain/%d/device/suspend/event-channel", domid);
+
+ portstr = xs_read(xs, XBT_NULL, path, &plen);
+ if (!portstr || !plen) {
+ fprintf(stderr, "failed to read suspend event channel\n");
+ goto xs_err;
+ }
+ port = atoi(portstr);
+ free(portstr);
+
+ fprintf(stderr, "binding to suspend evtchn %u:%d\n", domid, port);
+
+ suspend_evtchn = xc_evtchn_bind_interdomain(xce, domid, port);
+ if (suspend_evtchn < 0) {
+ fprintf(stderr, "failed to bind suspend event channel: %d (%s)\n",
+ suspend_evtchn, strerror(errno));
+ goto xs_err;
+ }
+
+ rc = 0;
+
+ xs_err:
+ xs_daemon_close(xs);
+ xce_err:
+ if (xce >= 0 && rc < 0) {
+ xc_evtchn_close(xce);
+ xce = -1;
+ }
+
+ return rc;
+}
+
int
main(int argc, char **argv)
{
@@ -207,12 +280,17 @@ main(int argc, char **argv)
sa.sa_flags = SA_RESTART;
sigaction(SIGUSR1, &sa, NULL);
+ setup_suspend_evtchn(domid);
+
ret = xc_domain_save(xc_fd, io_fd, domid, maxit, max_f, flags,
&suspend,
flags & XCFLAGS_CONTINUAL ? checkpoint_cb : NULL,
!!(flags & XCFLAGS_HVM),
&init_qemu_maps, &qemu_flip_buffer);
+ if (xce >= 0)
+ xc_evtchn_close(xce);
+
xc_interface_close(xc_fd);
return ret;
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] use event channel to improve suspend speed
2007-05-09 0:01 [RFC] use event channel to improve suspend speed Brendan Cully
@ 2007-05-10 22:13 ` Brendan Cully
2007-05-10 23:00 ` Daniel P. Berrange
0 siblings, 1 reply; 8+ messages in thread
From: Brendan Cully @ 2007-05-10 22:13 UTC (permalink / raw)
To: Xen Developers
The posted patch was a fairly conservative approach (backward
compatible, equivalent to existing semantics). I've done some
more experimental work that reduces the time for the final round to
about 5ms. Here are the stats for 100 checkpoints:
avg: 5.62 ms, min: 3.96, max: 13.70, median: 4.86
It turns out the biggest remaining delay is (surprise!) xenstored. To
get the above numbers I unwired xenstored from VIRQ_DOM_EXC and let
xc_save bind to it.
Obviously this isn't a practical approach. I'd love to hear any ideas
about the right way to avoid the xenstore penalty though. My current
thought is that it might be possible to arrange to register a dynamic
virq from xc_save into xen for a target domain, and then have xen fire
it on suspend instead of DOM_EXC (iff it's installed, otherwise use
the normal path).
Any advice would be welcome.
On Tuesday, 08 May 2007 at 17:01, Brendan Cully wrote:
> Hi,
>
> I've been doing a little work on improving the latency of guest domain
> suspends. I've added a couple of printfs into xc_domain_save around
> the last round, and hooked up a harness to loop over the last round
> code every couple of seconds. Here are some numbers for a run of 100
> last rounds (from just before the suspend callback to just before it
> would exit), on a 3.2 Ghz P4 with 1 GB of RAM, 128 MB of which goes to
> a guest. This approximates the best-case downtime for live migration,
> I think.
>
> current code:
> avg: 133.57 ms, min: 82.53, max: 559.86, median: 135.63
>
> with the attached patch:
> avg: 36.05 ms, min: 33.99, max: 52.14, median: 35.51
>
> The patch creates an event channel in the guest that fires the suspend
> code. xc_save can use this to suspend the domain instead of calling
> back to xend, which then writes a xenstore entry, which then causes a
> watch to fire in the guest. It seems the xenstore interaction is
> fairly slow and very jittery.
>
> This isn't intended for 3.1, but I thought I'd put it out just in case
> anyone else finds it interesting. I'd appreciate comments about the
> approach.
>
> There's also a fair amount of latency involved in xend receiving the
> notification that the domain has suspended and passing that back on to
> xc_save. A quick hack to let xc_save simply loop on xc_domain_getinfo
> until the domain suspends indicates that it should be fairly easy to
> cut the suspend latency in half again, to about 15ms. I'll see about
> finding a clean equivalent of this...
>
> Comments?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] use event channel to improve suspend speed
2007-05-10 22:13 ` Brendan Cully
@ 2007-05-10 23:00 ` Daniel P. Berrange
2007-05-11 0:06 ` Brendan Cully
2007-05-11 6:55 ` Keir Fraser
0 siblings, 2 replies; 8+ messages in thread
From: Daniel P. Berrange @ 2007-05-10 23:00 UTC (permalink / raw)
To: xen-devel
On Thu, May 10, 2007 at 03:13:10PM -0700, Brendan Cully wrote:
> The posted patch was a fairly conservative approach (backward
> compatible, equivalent to existing semantics). I've done some
> more experimental work that reduces the time for the final round to
> about 5ms. Here are the stats for 100 checkpoints:
>
> avg: 5.62 ms, min: 3.96, max: 13.70, median: 4.86
>
> It turns out the biggest remaining delay is (surprise!) xenstored. To
> get the above numbers I unwired xenstored from VIRQ_DOM_EXC and let
> xc_save bind to it.
>
> Obviously this isn't a practical approach. I'd love to hear any ideas
> about the right way to avoid the xenstore penalty though. My current
> thought is that it might be possible to arrange to register a dynamic
> virq from xc_save into xen for a target domain, and then have xen fire
> it on suspend instead of DOM_EXC (iff it's installed, otherwise use
> the normal path).
It would be interesting to know what aspect of the xenstore interaction
is responsible for the slowdown. In particular, whether it is a fundamental
architectural constraint, or whether it is merely due to the poor performance
of the current impl. We already know from previous tests that XenD impl of
transactions absolutely kills performance of various XenD operations due to
the vast amount of unneccessary I/O it does.
If fixing the XenstoreD transaction code were to help suspend performance
too, it might be a better option than re-writing all code which touches
xenstore. A quick test of putting /var/lib/xenstored on a ramdisk would
be a way of testing whether its the I/O which is hurting suspend time.
Dan
--
|=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=|
|=- Perl modules: http://search.cpan.org/~danberr/ -=|
|=- Projects: http://freshmeat.net/~danielpb/ -=|
|=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] use event channel to improve suspend speed
2007-05-10 23:00 ` Daniel P. Berrange
@ 2007-05-11 0:06 ` Brendan Cully
2007-05-11 6:55 ` Keir Fraser
1 sibling, 0 replies; 8+ messages in thread
From: Brendan Cully @ 2007-05-11 0:06 UTC (permalink / raw)
To: Daniel P. Berrange; +Cc: xen-devel
On Friday, 11 May 2007 at 00:00, Daniel P. Berrange wrote:
> On Thu, May 10, 2007 at 03:13:10PM -0700, Brendan Cully wrote:
> > The posted patch was a fairly conservative approach (backward
> > compatible, equivalent to existing semantics). I've done some
> > more experimental work that reduces the time for the final round to
> > about 5ms. Here are the stats for 100 checkpoints:
> >
> > avg: 5.62 ms, min: 3.96, max: 13.70, median: 4.86
> >
> > It turns out the biggest remaining delay is (surprise!) xenstored. To
> > get the above numbers I unwired xenstored from VIRQ_DOM_EXC and let
> > xc_save bind to it.
> >
> > Obviously this isn't a practical approach. I'd love to hear any ideas
> > about the right way to avoid the xenstore penalty though. My current
> > thought is that it might be possible to arrange to register a dynamic
> > virq from xc_save into xen for a target domain, and then have xen fire
> > it on suspend instead of DOM_EXC (iff it's installed, otherwise use
> > the normal path).
>
> It would be interesting to know what aspect of the xenstore interaction
> is responsible for the slowdown. In particular, whether it is a fundamental
> architectural constraint, or whether it is merely due to the poor performance
> of the current impl. We already know from previous tests that XenD impl of
> transactions absolutely kills performance of various XenD operations due to
> the vast amount of unneccessary I/O it does.
>
> If fixing the XenstoreD transaction code were to help suspend performance
> too, it might be a better option than re-writing all code which touches
> xenstore. A quick test of putting /var/lib/xenstored on a ramdisk would
> be a way of testing whether its the I/O which is hurting suspend time.
That's certainly part of it. If I rewrite xc_save to set up a watch on
@releaseDomain, then select on the xs handle (deferring actually
reading the watch until after the checkpoint), then I get the
following timings:
/var/lib/xenstored on ext3:
avg: 29.41 ms, min: 27.65, max: 40.33, median: 29.30
on tmpfs:
avg: 17.58 ms, min: 7.05, max: 43.88, median: 16.57
It's still awfully jittery though, and significantly slower. I'd guess
that the watch mechanism is the problem. I haven't looked very closely
at its internals, but I wonder if it's just delivering synchronous
notifications to the watcher list in order (in this case, making
xc_save wait until xend has handled the watch).
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] use event channel to improve suspend speed
2007-05-10 23:00 ` Daniel P. Berrange
2007-05-11 0:06 ` Brendan Cully
@ 2007-05-11 6:55 ` Keir Fraser
2007-05-25 0:06 ` Brendan Cully
1 sibling, 1 reply; 8+ messages in thread
From: Keir Fraser @ 2007-05-11 6:55 UTC (permalink / raw)
To: Daniel P. Berrange, xen-devel
On 11/5/07 00:00, "Daniel P. Berrange" <berrange@redhat.com> wrote:
> It would be interesting to know what aspect of the xenstore interaction
> is responsible for the slowdown. In particular, whether it is a fundamental
> architectural constraint, or whether it is merely due to the poor performance
> of the current impl. We already know from previous tests that XenD impl of
> transactions absolutely kills performance of various XenD operations due to
> the vast amount of unneccessary I/O it does.
>
> If fixing the XenstoreD transaction code were to help suspend performance
> too, it might be a better option than re-writing all code which touches
> xenstore. A quick test of putting /var/lib/xenstored on a ramdisk would
> be a way of testing whether its the I/O which is hurting suspend time.
Yes. We could go either way -- it wouldn't be too bad to add support via
dynamic VIRQ_DOM_EXC for example, or add other things to get xenstore off
the critical path for save/restore. But if the problem is that xenstored
sucks it probably is worth investing a bit of time to tackle the problem
directly and see where the time is going. We could end up with optimisations
which have benefits beyond just save/restore.
-- Keir
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] use event channel to improve suspend speed
2007-05-11 6:55 ` Keir Fraser
@ 2007-05-25 0:06 ` Brendan Cully
2007-05-25 6:46 ` Keir Fraser
0 siblings, 1 reply; 8+ messages in thread
From: Brendan Cully @ 2007-05-25 0:06 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel, Daniel P. Berrange
[-- Attachment #1: Type: text/plain, Size: 2692 bytes --]
On Friday, 11 May 2007 at 07:55, Keir Fraser wrote:
> On 11/5/07 00:00, "Daniel P. Berrange" <berrange@redhat.com> wrote:
>
> > It would be interesting to know what aspect of the xenstore interaction
> > is responsible for the slowdown. In particular, whether it is a fundamental
> > architectural constraint, or whether it is merely due to the poor performance
> > of the current impl. We already know from previous tests that XenD impl of
> > transactions absolutely kills performance of various XenD operations due to
> > the vast amount of unneccessary I/O it does.
> >
> > If fixing the XenstoreD transaction code were to help suspend performance
> > too, it might be a better option than re-writing all code which touches
> > xenstore. A quick test of putting /var/lib/xenstored on a ramdisk would
> > be a way of testing whether its the I/O which is hurting suspend time.
>
> Yes. We could go either way -- it wouldn't be too bad to add support via
> dynamic VIRQ_DOM_EXC for example, or add other things to get xenstore off
> the critical path for save/restore. But if the problem is that xenstored
> sucks it probably is worth investing a bit of time to tackle the problem
> directly and see where the time is going. We could end up with optimisations
> which have benefits beyond just save/restore.
I'm sure xenstore could be made significantly faster, but barring a
redesign maybe it's better just to use it for low-frequency
transactions with pretty loose latency expectations? Running the
suspend notification through xenstore, to xend and finally back to
xc_save (as the current code does) seems convoluted, and bound to
create opportunities for bad scheduling compared to directly notifying
xc_save.
In case there's interest, I'll attach the two patches I'm using to
speed up checkpointing (and live migration downtime). As I mentioned
earlier, the first patch should be semantically equivalent to existing
code, and cuts downtime to about 30-35ms. The second notifies xend
that the domain has been suspended asynchronously, so that final round
memory copying may begin before device migration stage 2. This is a
semantic change, but I can't think of a concrete drawback. It's a
little rough-and-ready -- suggestions for improvement are welcome.
Here are some stats on final round time (100 runs):
xen 3.1:
avg: 93.40 ms, min: 72.59, max: 432.46, median: 85.10
patch 1 (trigger suspend via event channel):
avg: 43.69 ms, min: 35.21, max: 409.50, median: 37.21
patch 1, /var/lib/xenstored on tmpfs:
avg: 33.88 ms, min: 27.01, max: 369.21, median: 28.34
patch 2 (receive suspended notification via event channel):
avg: 4.95 ms, min: 3.46, max: 14.73, median: 4.63
[-- Attachment #2: suspend-evtchn.patch --]
[-- Type: text/x-patch, Size: 6441 bytes --]
# HG changeset patch
# User Brendan Cully <brendan@cs.ubc.ca>
# Date 1180051484 25200
# Node ID 5c5b8a69c9d631399b3a139dea2026f7a11f6dd7
# Parent 6649c29d3720fa9087eade3a9c77f5785ff54279
Set up an event channel to signal guest suspend.
Have xc_save use it directly instead of waiting for xend to signal the
guest via xenstore. This cuts the overhead for the last round in half
in my tests.
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
diff -r 6649c29d3720 -r 5c5b8a69c9d6 linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c
--- a/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Thu May 24 17:04:44 2007 -0700
+++ b/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Thu May 24 17:04:44 2007 -0700
@@ -27,6 +27,8 @@ void (*pm_power_off)(void);
void (*pm_power_off)(void);
EXPORT_SYMBOL(pm_power_off);
+int setup_suspend_evtchn(void);
+
void machine_emergency_restart(void)
{
/* We really want to get pending console data out before we die. */
@@ -230,6 +232,7 @@ int __xen_suspend(int fast_suspend)
if (!suspend_cancelled) {
xencons_resume();
xenbus_resume();
+ setup_suspend_evtchn();
} else {
xenbus_suspend_cancel();
}
diff -r 6649c29d3720 -r 5c5b8a69c9d6 linux-2.6-xen-sparse/drivers/xen/core/reboot.c
--- a/linux-2.6-xen-sparse/drivers/xen/core/reboot.c Thu May 24 17:04:44 2007 -0700
+++ b/linux-2.6-xen-sparse/drivers/xen/core/reboot.c Thu May 24 17:04:44 2007 -0700
@@ -7,6 +7,7 @@
#include <linux/sysrq.h>
#include <asm/hypervisor.h>
#include <xen/xenbus.h>
+#include <xen/evtchn.h>
#include <linux/kthread.h>
#ifdef HAVE_XEN_PLATFORM_COMPAT_H
@@ -194,6 +195,36 @@ static struct xenbus_watch sysrq_watch =
.callback = sysrq_handler
};
+static irqreturn_t suspend_int(int irq, void* dev_id, struct pt_regs *ptregs)
+{
+ shutting_down = SHUTDOWN_SUSPEND;
+ schedule_work(&shutdown_work);
+
+ return IRQ_HANDLED;
+}
+
+int setup_suspend_evtchn(void)
+{
+ static int irq = -1;
+ int port;
+ char portstr[5]; /* 1024 max? */
+
+ if (irq > 0)
+ unbind_from_irqhandler(irq, NULL);
+
+ /* TODO: get other end dynamically */
+ irq = bind_listening_port_to_irqhandler(0, suspend_int, 0, "suspend", NULL);
+ if (irq <= 0) {
+ return -1;
+ }
+ port = irq_to_evtchn_port(irq);
+ printk(KERN_ERR "suspend: event channel %d\n", port);
+ sprintf(portstr, "%d", port);
+ xenbus_write(XBT_NIL, "device/suspend", "event-channel", portstr);
+
+ return 0;
+}
+
static int setup_shutdown_watcher(void)
{
int err;
@@ -212,6 +243,13 @@ static int setup_shutdown_watcher(void)
if (err) {
printk(KERN_ERR "Failed to set sysrq watcher\n");
return err;
+ }
+
+ /* suspend event channel */
+ err = setup_suspend_evtchn();
+ if (err) {
+ printk(KERN_ERR "Failed to register suspend event channel\n");
+ return err;
}
return 0;
diff -r 6649c29d3720 -r 5c5b8a69c9d6 tools/python/xen/xend/XendCheckpoint.py
--- a/tools/python/xen/xend/XendCheckpoint.py Thu May 24 17:04:44 2007 -0700
+++ b/tools/python/xen/xend/XendCheckpoint.py Thu May 24 17:04:44 2007 -0700
@@ -92,6 +92,7 @@ def save(fd, dominfo, network, live, dst
if line == "suspend":
log.debug("Suspending %d ...", dominfo.getDomid())
dominfo.shutdown('suspend')
+ if line in ('suspend', 'suspended'):
dominfo.waitForShutdown()
dominfo.migrateDevices(network, dst, DEV_MIGRATE_STEP2,
domain_name)
diff -r 6649c29d3720 -r 5c5b8a69c9d6 tools/xcutils/xc_save.c
--- a/tools/xcutils/xc_save.c Thu May 24 17:04:44 2007 -0700
+++ b/tools/xcutils/xc_save.c Thu May 24 17:04:44 2007 -0700
@@ -23,8 +23,13 @@
#include <xenctrl.h>
#include <xenguest.h>
+#include <errno.h>
+
/* defined in xc_linux_save. Yes, this is cheezy. */
extern int do_last_round;
+
+static int xce = -1;
+static int suspend_evtchn = -1;
/**
* Issue a suspend request through stdout, and receive the acknowledgement
@@ -32,9 +37,24 @@ extern int do_last_round;
*/
static int suspend(int domid)
{
+ static char* suspend = "suspend\n";
+ static char* suspended = "suspended\n";
+
char ans[30];
-
- printf("suspend\n");
+ char* msg;
+ int rc;
+
+ msg = suspend;
+
+ if (suspend_evtchn > 0) {
+ rc = xc_evtchn_notify(xce, suspend_evtchn);
+ if (rc < 0)
+ fprintf(stderr, "failed to notify suspend event channel: %d\n", rc);
+ else
+ msg = suspended;
+ }
+
+ printf(msg);
fflush(stdout);
return (fgets(ans, sizeof(ans), stdin) != NULL &&
@@ -164,6 +184,68 @@ static void sigusr1(int sig)
do_last_round = 1;
}
+static int setup_suspend_evtchn(unsigned int domid)
+{
+ struct xs_handle* xs;
+ char path[128];
+ char *portstr;
+ unsigned int plen;
+ int port;
+ int rc = -1;
+
+ xce = xc_evtchn_open();
+ if (xce < 0) {
+ fprintf(stderr, "failed to open event channel handle\n");
+ return -1;
+ }
+
+ xs = xs_daemon_open_readonly();
+ if (!xs) {
+ fprintf(stderr, "failed to open xenstore handle\n");
+ goto xce_err;
+ }
+
+ sprintf(path, "/local/domain/%d/device/suspend/event-channel", domid);
+
+ portstr = xs_read(xs, XBT_NULL, path, &plen);
+ if (!portstr || !plen) {
+ fprintf(stderr, "failed to read suspend event channel\n");
+ goto xs_err;
+ }
+ port = atoi(portstr);
+ free(portstr);
+
+ fprintf(stderr, "binding to suspend evtchn %u:%d\n", domid, port);
+
+ suspend_evtchn = xc_evtchn_bind_interdomain(xce, domid, port);
+ if (suspend_evtchn < 0) {
+ fprintf(stderr, "failed to bind suspend event channel: %d (%s)\n",
+ suspend_evtchn, strerror(errno));
+ goto xs_err;
+ }
+
+ rc = 0;
+
+ xs_err:
+ xs_daemon_close(xs);
+ xce_err:
+ if (xce >= 0 && rc < 0) {
+ xc_evtchn_close(xce);
+ xce = -1;
+ }
+
+ return rc;
+}
+
+static void release_suspend_evtchn(void)
+{
+ if (xce >= 0) {
+ if (suspend_evtchn > 0)
+ xc_evtchn_unbind(xce, suspend_evtchn);
+ xc_evtchn_close(xce);
+ }
+}
+
int
main(int argc, char **argv)
{
@@ -188,10 +270,14 @@ main(int argc, char **argv)
sa.sa_flags = SA_RESTART;
sigaction(SIGUSR1, &sa, NULL);
+ setup_suspend_evtchn(domid);
+
ret = xc_domain_save(xc_fd, io_fd, domid, maxit, max_f, flags,
&suspend, !!(flags & XCFLAGS_HVM),
&init_qemu_maps, &qemu_flip_buffer);
+ release_suspend_evtchn();
+
xc_interface_close(xc_fd);
return ret;
[-- Attachment #3: subscribe-suspend.patch --]
[-- Type: text/x-patch, Size: 8354 bytes --]
# HG changeset patch
# User Brendan Cully <brendan@cs.ubc.ca>
# Date 1180051484 25200
# Node ID fcc62feb1d2c92cfece16c0b11b8bf5ac3c411cb
# Parent 5c5b8a69c9d631399b3a139dea2026f7a11f6dd7
Add facility to subscribe to a domain via event channel.
This event channel will be notified when the domain transitions to the
suspended state, which can be much faster than raising VIRQ_DOM_EXC
and waiting for the notification to be propagated via xenstore.
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
diff -r 5c5b8a69c9d6 -r fcc62feb1d2c tools/libxc/xc_domain.c
--- a/tools/libxc/xc_domain.c Thu May 24 17:04:44 2007 -0700
+++ b/tools/libxc/xc_domain.c Thu May 24 17:04:44 2007 -0700
@@ -696,6 +696,17 @@ int xc_get_hvm_param(int handle, domid_t
return rc;
}
+int xc_dom_subscribe(int xc_handle, domid_t dom, evtchn_port_t port)
+{
+ DECLARE_DOMCTL;
+
+ domctl.cmd = XEN_DOMCTL_subscribe;
+ domctl.domain = dom;
+ domctl.u.subscribe.port = port;
+
+ return do_domctl(xc_handle, &domctl);
+}
+
/*
* Local variables:
* mode: C
diff -r 5c5b8a69c9d6 -r fcc62feb1d2c tools/libxc/xenctrl.h
--- a/tools/libxc/xenctrl.h Thu May 24 17:04:44 2007 -0700
+++ b/tools/libxc/xenctrl.h Thu May 24 17:04:44 2007 -0700
@@ -728,6 +728,12 @@ evtchn_port_t xc_evtchn_pending(int xce_
*/
int xc_evtchn_unmask(int xce_handle, evtchn_port_t port);
+/*
+ * Subscribe to state changes in a domain via evtchn.
+ * Returns -1 on failure, in which case errno will be set appropriately.
+ */
+int xc_dom_subscribe(int xc_handle, domid_t domid, evtchn_port_t port);
+
/**************************
* GRANT TABLE OPERATIONS *
**************************/
diff -r 5c5b8a69c9d6 -r fcc62feb1d2c tools/python/xen/xend/XendCheckpoint.py
--- a/tools/python/xen/xend/XendCheckpoint.py Thu May 24 17:04:44 2007 -0700
+++ b/tools/python/xen/xend/XendCheckpoint.py Thu May 24 17:04:44 2007 -0700
@@ -92,7 +92,7 @@ def save(fd, dominfo, network, live, dst
if line == "suspend":
log.debug("Suspending %d ...", dominfo.getDomid())
dominfo.shutdown('suspend')
- if line in ('suspend', 'suspended'):
+ if line in ('suspend'):
dominfo.waitForShutdown()
dominfo.migrateDevices(network, dst, DEV_MIGRATE_STEP2,
domain_name)
diff -r 5c5b8a69c9d6 -r fcc62feb1d2c tools/xcutils/xc_save.c
--- a/tools/xcutils/xc_save.c Thu May 24 17:04:44 2007 -0700
+++ b/tools/xcutils/xc_save.c Thu May 24 17:04:44 2007 -0700
@@ -28,8 +28,12 @@
/* defined in xc_linux_save. Yes, this is cheezy. */
extern int do_last_round;
+unsigned int xc_fd;
static int xce = -1;
+/* notify this to cause guest to start suspend */
static int suspend_evtchn = -1;
+/* get call back on this when guest has finished suspend */
+static int suspended_evtchn = -1;
/**
* Issue a suspend request through stdout, and receive the acknowledgement
@@ -50,8 +54,28 @@ static int suspend(int domid)
rc = xc_evtchn_notify(xce, suspend_evtchn);
if (rc < 0)
fprintf(stderr, "failed to notify suspend event channel: %d\n", rc);
- else
+ else {
msg = suspended;
+ if (suspended_evtchn > 0) {
+ do {
+ rc = xc_evtchn_pending(xce);
+ } while (rc > 0 && rc != suspended_evtchn);
+ if (rc <= 0) {
+ fprintf(stderr, "failed to received suspend notification: %d\n", rc);
+ return 0;
+ }
+ if (xc_evtchn_unmask(xce, suspended_evtchn) < 0) {
+ fprintf(stderr, "failed to unmask suspend notification channel: %d\n",
+ rc);
+ return 0;
+ }
+
+ printf(msg);
+ fflush(stdout);
+
+ return 1;
+ }
+ }
}
printf(msg);
@@ -224,8 +248,35 @@ static int setup_suspend_evtchn(unsigned
goto xs_err;
}
+ suspended_evtchn = xc_evtchn_bind_unbound_port(xce, domid);
+ if (suspended_evtchn < 0) {
+ fprintf(stderr,
+ "failed to bind suspend notification event channel: %d (%s)\n",
+ suspended_evtchn, strerror(errno));
+ goto se_err;
+ }
+
+ rc = xc_dom_subscribe(xc_fd, domid, suspended_evtchn);
+ if (rc < 0) {
+ fprintf(stderr,
+ "failed to subscribe to domain: %d (%s)\n", rc, strerror(errno));
+ goto se2_err;
+ }
+ fprintf(stderr, "subscribed to domain %d on port %d\n",
+ domid, suspended_evtchn);
+
rc = 0;
+ se2_err:
+ if (rc < 0) {
+ xc_evtchn_unbind(xce, suspended_evtchn);
+ suspended_evtchn = -1;
+ }
+ se_err:
+ if (rc < 0) {
+ xc_evtchn_unbind(xce, suspend_evtchn);
+ suspend_evtchn = -1;
+ }
xs_err:
xs_daemon_close(xs);
xce_err:
@@ -237,8 +288,13 @@ static int setup_suspend_evtchn(unsigned
return rc;
}
-static void release_suspend_evtchn(void)
-{
+static void release_suspend_evtchn(unsigned int domid)
+{
+ /* TODO: teach xen to clean up if port is unbound */
+ if (suspended_evtchn > 0) {
+ xc_dom_subscribe(xc_fd, domid, 0);
+ suspended_evtchn = 0;
+ }
if (xce >= 0) {
if (suspend_evtchn > 0)
xc_evtchn_unbind(xce, suspend_evtchn);
@@ -249,7 +305,7 @@ int
int
main(int argc, char **argv)
{
- unsigned int xc_fd, io_fd, domid, maxit, max_f, flags;
+ unsigned int io_fd, domid, maxit, max_f, flags;
int ret;
struct sigaction sa;
@@ -276,7 +332,7 @@ main(int argc, char **argv)
&suspend, !!(flags & XCFLAGS_HVM),
&init_qemu_maps, &qemu_flip_buffer);
- release_suspend_evtchn();
+ release_suspend_evtchn(domid);
xc_interface_close(xc_fd);
diff -r 5c5b8a69c9d6 -r fcc62feb1d2c xen/common/domain.c
--- a/xen/common/domain.c Thu May 24 17:04:44 2007 -0700
+++ b/xen/common/domain.c Thu May 24 17:04:44 2007 -0700
@@ -103,7 +103,13 @@ static void __domain_finalise_shutdown(s
for_each_vcpu ( d, v )
vcpu_sleep_nosync(v);
- send_guest_global_virq(dom0, VIRQ_DOM_EXC);
+ if ( d->shutdown_code == SHUTDOWN_suspend
+ && d->suspend_evtchn > 0 )
+ {
+ evtchn_set_pending(dom0->vcpu[0], d->suspend_evtchn);
+ }
+ else
+ send_guest_global_virq(dom0, VIRQ_DOM_EXC);
}
static void vcpu_check_shutdown(struct vcpu *v)
diff -r 5c5b8a69c9d6 -r fcc62feb1d2c xen/common/domctl.c
--- a/xen/common/domctl.c Thu May 24 17:04:44 2007 -0700
+++ b/xen/common/domctl.c Thu May 24 17:04:44 2007 -0700
@@ -707,6 +707,21 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc
}
break;
+ case XEN_DOMCTL_subscribe:
+ {
+ struct domain *d;
+
+ ret = -ESRCH;
+ d = rcu_lock_domain_by_id(op->domain);
+ if ( d != NULL )
+ {
+ d->suspend_evtchn = op->u.subscribe.port;
+ rcu_unlock_domain(d);
+ ret = 0;
+ }
+ }
+ break;
+
default:
ret = arch_do_domctl(op, u_domctl);
break;
diff -r 5c5b8a69c9d6 -r fcc62feb1d2c xen/include/public/domctl.h
--- a/xen/include/public/domctl.h Thu May 24 17:04:44 2007 -0700
+++ b/xen/include/public/domctl.h Thu May 24 17:04:44 2007 -0700
@@ -429,7 +429,13 @@ typedef struct xen_domctl_sendtrigger xe
typedef struct xen_domctl_sendtrigger xen_domctl_sendtrigger_t;
DEFINE_XEN_GUEST_HANDLE(xen_domctl_sendtrigger_t);
-
+#define XEN_DOMCTL_subscribe 29
+struct xen_domctl_subscribe {
+ uint32_t port; /* IN */
+};
+typedef struct xen_domctl_subscribe xen_domctl_subscribe_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_subscribe_t);
+
struct xen_domctl {
uint32_t cmd;
uint32_t interface_version; /* XEN_DOMCTL_INTERFACE_VERSION */
@@ -459,6 +465,7 @@ struct xen_domctl {
struct xen_domctl_hvmcontext hvmcontext;
struct xen_domctl_address_size address_size;
struct xen_domctl_sendtrigger sendtrigger;
+ struct xen_domctl_subscribe subscribe;
uint8_t pad[128];
} u;
};
diff -r 5c5b8a69c9d6 -r fcc62feb1d2c xen/include/xen/sched.h
--- a/xen/include/xen/sched.h Thu May 24 17:04:44 2007 -0700
+++ b/xen/include/xen/sched.h Thu May 24 17:04:44 2007 -0700
@@ -201,6 +201,10 @@ struct domain
bool_t is_shut_down; /* fully shut down? */
int shutdown_code;
+ /* If this is not 0, send suspend notification here instead of
+ * raising DOM_EXC */
+ int suspend_evtchn;
+
atomic_t pause_count;
unsigned long vm_assist;
[-- Attachment #4: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] use event channel to improve suspend speed
2007-05-25 0:06 ` Brendan Cully
@ 2007-05-25 6:46 ` Keir Fraser
2007-05-25 23:41 ` Brendan Cully
0 siblings, 1 reply; 8+ messages in thread
From: Keir Fraser @ 2007-05-25 6:46 UTC (permalink / raw)
To: Brendan Cully; +Cc: xen-devel, Daniel P. Berrange
On 25/5/07 01:06, "Brendan Cully" <brendan@cs.ubc.ca> wrote:
> In case there's interest, I'll attach the two patches I'm using to
> speed up checkpointing (and live migration downtime). As I mentioned
> earlier, the first patch should be semantically equivalent to existing
> code, and cuts downtime to about 30-35ms. The second notifies xend
> that the domain has been suspended asynchronously, so that final round
> memory copying may begin before device migration stage 2. This is a
> semantic change, but I can't think of a concrete drawback. It's a
> little rough-and-ready -- suggestions for improvement are welcome.
Can patch 2 be used without patch 1? The fact it doesn't need to change the
guest interface again is a big advantage. And it seems to provide by far the
larger proportional speedup.
-- Keir
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] use event channel to improve suspend speed
2007-05-25 6:46 ` Keir Fraser
@ 2007-05-25 23:41 ` Brendan Cully
0 siblings, 0 replies; 8+ messages in thread
From: Brendan Cully @ 2007-05-25 23:41 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel, Daniel P. Berrange
On Friday, 25 May 2007 at 07:46, Keir Fraser wrote:
> On 25/5/07 01:06, "Brendan Cully" <brendan@cs.ubc.ca> wrote:
>
> > In case there's interest, I'll attach the two patches I'm using to
> > speed up checkpointing (and live migration downtime). As I mentioned
> > earlier, the first patch should be semantically equivalent to existing
> > code, and cuts downtime to about 30-35ms. The second notifies xend
> > that the domain has been suspended asynchronously, so that final round
> > memory copying may begin before device migration stage 2. This is a
> > semantic change, but I can't think of a concrete drawback. It's a
> > little rough-and-ready -- suggestions for improvement are welcome.
>
> Can patch 2 be used without patch 1? The fact it doesn't need to change the
> guest interface again is a big advantage. And it seems to provide by far the
> larger proportional speedup.
Yes, it's possible to separate them out. But they work best together,
since that's the only case where xenstore is completely removed from
the suspend path. I've done another version of patch 1 in which
xc_save triggers the suspend by writing to xenstore (instead of asking
xend to do it), with a vanilla guest kernel. With patch 2 also
applied, I get this:
avg: 23.17 ms, min: 9.08, max: 545.51, median: 13.66
implying the base penalty for using xenstore is about 5-10ms, and we
still suffer some pretty serious jitter.
The same test with /var/lib/xenstored mounted on tmpfs is much less
jittery:
avg: 14.62 ms, min: 8.27, max: 32.67, median: 11.71
but there's still a 5-20ms xenstore penalty.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-05-25 23:41 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-09 0:01 [RFC] use event channel to improve suspend speed Brendan Cully
2007-05-10 22:13 ` Brendan Cully
2007-05-10 23:00 ` Daniel P. Berrange
2007-05-11 0:06 ` Brendan Cully
2007-05-11 6:55 ` Keir Fraser
2007-05-25 0:06 ` Brendan Cully
2007-05-25 6:46 ` Keir Fraser
2007-05-25 23:41 ` Brendan Cully
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.