* Issues about domU suspending/resuming
@ 2011-07-28 16:13 Gustavo Pimentel
2011-07-28 17:00 ` Keir Fraser
0 siblings, 1 reply; 7+ messages in thread
From: Gustavo Pimentel @ 2011-07-28 16:13 UTC (permalink / raw)
To: keir; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 225 bytes --]
Hi, I have looked into the MAINTAINERS file on the xen source, but I
didn't find the specific maintainer for responsible for issues about
domU suspending/resuming procedure.
Can you point me to the right person?
[-- Attachment #1.2: Type: text/html, Size: 2571 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Issues about domU suspending/resuming
2011-07-28 16:13 Issues about domU suspending/resuming Gustavo Pimentel
@ 2011-07-28 17:00 ` Keir Fraser
2011-07-28 17:16 ` Gustavo Pimentel
0 siblings, 1 reply; 7+ messages in thread
From: Keir Fraser @ 2011-07-28 17:00 UTC (permalink / raw)
To: Gustavo Pimentel; +Cc: xen-devel
On 28/07/2011 17:13, "Gustavo Pimentel" <gustavo.pimentel@efacec.com> wrote:
> Hi, I have looked into the MAINTAINERS file on the xen source, but I didn¹t
> find the specific maintainer for responsible for issues about domU
> suspending/resuming procedure.
> Can you point me to the right person?
It crosses multiple subsystems. It could be a guest kernel bug for example,
or a toolstack bug. Post a bug report to xen-devel and we can try to triage
it.
-- Keir
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Issues about domU suspending/resuming
2011-07-28 17:00 ` Keir Fraser
@ 2011-07-28 17:16 ` Gustavo Pimentel
2011-07-28 17:32 ` Keir Fraser
0 siblings, 1 reply; 7+ messages in thread
From: Gustavo Pimentel @ 2011-07-28 17:16 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
It's not really a bug, but a performance analysis.
I'm using remus for a HA system and after a system analysis using remus log, I notice that suspending a the guest can oscillate between 0.299ms to 812.909ms. And the resuming of the same guest oscillates between 0.387ms to 1745,579ms. The guest it's a virtual machine with 64MB of RAM, without CPU load.
It's very strange this large range of values of the suspend/resuming of a guest.
> -----Original Message-----
> From: Keir Fraser [mailto:keir.xen@gmail.com]
> Sent: quinta-feira, 28 de Julho de 2011 18:00
> To: Gustavo Pimentel
> Cc: xen-devel@lists.xensource.com
> Subject: Re: Issues about domU suspending/resuming
>
> On 28/07/2011 17:13, "Gustavo Pimentel" <gustavo.pimentel@efacec.com> wrote:
>
> > Hi, I have looked into the MAINTAINERS file on the xen source, but I didn¹t
> > find the specific maintainer for responsible for issues about domU
> > suspending/resuming procedure.
> > Can you point me to the right person?
>
> It crosses multiple subsystems. It could be a guest kernel bug for example,
> or a toolstack bug. Post a bug report to xen-devel and we can try to triage
> it.
>
> -- Keir
>
>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Issues about domU suspending/resuming
2011-07-28 17:16 ` Gustavo Pimentel
@ 2011-07-28 17:32 ` Keir Fraser
2011-07-28 18:22 ` Gustavo Pimentel
0 siblings, 1 reply; 7+ messages in thread
From: Keir Fraser @ 2011-07-28 17:32 UTC (permalink / raw)
To: Gustavo Pimentel; +Cc: xen-devel
On 28/07/2011 18:16, "Gustavo Pimentel" <gustavo.pimentel@efacec.com> wrote:
> It's not really a bug, but a performance analysis.
> I'm using remus for a HA system and after a system analysis using remus log, I
> notice that suspending a the guest can oscillate between 0.299ms to 812.909ms.
> And the resuming of the same guest oscillates between 0.387ms to 1745,579ms.
> The guest it's a virtual machine with 64MB of RAM, without CPU load.
>
> It's very strange this large range of values of the suspend/resuming of a
> guest.
First point of contact would be the listed Remus maintainer. More data would
be useful I expect -- e.g., does the variation occur with basic live
migration (no Remus)?
-- Keir
>> -----Original Message-----
>> From: Keir Fraser [mailto:keir.xen@gmail.com]
>> Sent: quinta-feira, 28 de Julho de 2011 18:00
>> To: Gustavo Pimentel
>> Cc: xen-devel@lists.xensource.com
>> Subject: Re: Issues about domU suspending/resuming
>>
>> On 28/07/2011 17:13, "Gustavo Pimentel" <gustavo.pimentel@efacec.com> wrote:
>>
>>> Hi, I have looked into the MAINTAINERS file on the xen source, but I didn¹t
>>> find the specific maintainer for responsible for issues about domU
>>> suspending/resuming procedure.
>>> Can you point me to the right person?
>>
>> It crosses multiple subsystems. It could be a guest kernel bug for example,
>> or a toolstack bug. Post a bug report to xen-devel and we can try to triage
>> it.
>>
>> -- Keir
>>
>>
>>
>>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Issues about domU suspending/resuming
2011-07-28 17:32 ` Keir Fraser
@ 2011-07-28 18:22 ` Gustavo Pimentel
2011-07-28 19:07 ` Keir Fraser
0 siblings, 1 reply; 7+ messages in thread
From: Gustavo Pimentel @ 2011-07-28 18:22 UTC (permalink / raw)
To: Keir Fraser; +Cc: rshriram, xen-devel
I didn't test the suspend/resuming with basic live migration.
Do you know how can I invoke some xen api in order to suspend/resuming the guest? I could make a simple program to be able to log times of suspending/resuming and on the future to analysis the basic live migration.
Meanwhile I leave here some information about the my test system:
I using xen4.2-unstable on a Intel(R) Pentium(R) 4 CPU 3.20GHz processor with hyperthread, using credit scheduler. I have pinning dom0 to vcpu0 and domU to vcpu1.
I have on dom0 the kernel version 2.6.32.40 and on domU the kernel version 2.6.18 kernel which has fast suspend support.
> -----Original Message-----
> From: Keir Fraser [mailto:keir.xen@gmail.com]
> Sent: quinta-feira, 28 de Julho de 2011 18:33
> To: Gustavo Pimentel
> Cc: xen-devel@lists.xensource.com
> Subject: Re: Issues about domU suspending/resuming
>
> On 28/07/2011 18:16, "Gustavo Pimentel" <gustavo.pimentel@efacec.com> wrote:
>
> > It's not really a bug, but a performance analysis.
> > I'm using remus for a HA system and after a system analysis using remus log, I
> > notice that suspending a the guest can oscillate between 0.299ms to 812.909ms.
> > And the resuming of the same guest oscillates between 0.387ms to 1745,579ms.
> > The guest it's a virtual machine with 64MB of RAM, without CPU load.
> >
> > It's very strange this large range of values of the suspend/resuming of a
> > guest.
>
> First point of contact would be the listed Remus maintainer. More data would
> be useful I expect -- e.g., does the variation occur with basic live
> migration (no Remus)?
>
> -- Keir
>
> >> -----Original Message-----
> >> From: Keir Fraser [mailto:keir.xen@gmail.com]
> >> Sent: quinta-feira, 28 de Julho de 2011 18:00
> >> To: Gustavo Pimentel
> >> Cc: xen-devel@lists.xensource.com
> >> Subject: Re: Issues about domU suspending/resuming
> >>
> >> On 28/07/2011 17:13, "Gustavo Pimentel" <gustavo.pimentel@efacec.com>
> wrote:
> >>
> >>> Hi, I have looked into the MAINTAINERS file on the xen source, but I didn¹t
> >>> find the specific maintainer for responsible for issues about domU
> >>> suspending/resuming procedure.
> >>> Can you point me to the right person?
> >>
> >> It crosses multiple subsystems. It could be a guest kernel bug for example,
> >> or a toolstack bug. Post a bug report to xen-devel and we can try to triage
> >> it.
> >>
> >> -- Keir
> >>
> >>
> >>
> >>
> >
> >
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Issues about domU suspending/resuming
2011-07-28 18:22 ` Gustavo Pimentel
@ 2011-07-28 19:07 ` Keir Fraser
2011-08-02 8:29 ` Gustavo Pimentel
0 siblings, 1 reply; 7+ messages in thread
From: Keir Fraser @ 2011-07-28 19:07 UTC (permalink / raw)
To: Gustavo Pimentel; +Cc: rshriram, xen-devel
Perhaps run a program in domU which calls gettimedofday() in a loop and logs
big deltas between successive calls. If the domain is otherwise idle, you
can probbaly set the threshold to only trigger during the suspend phase of
live migration. Then you don't need to hook into the suspend/resume parts of
the toolstack.
-- Keir
On 28/07/2011 19:22, "Gustavo Pimentel" <gustavo.pimentel@efacec.com> wrote:
> I didn't test the suspend/resuming with basic live migration.
> Do you know how can I invoke some xen api in order to suspend/resuming the
> guest? I could make a simple program to be able to log times of
> suspending/resuming and on the future to analysis the basic live migration.
> Meanwhile I leave here some information about the my test system:
> I using xen4.2-unstable on a Intel(R) Pentium(R) 4 CPU 3.20GHz processor with
> hyperthread, using credit scheduler. I have pinning dom0 to vcpu0 and domU to
> vcpu1.
> I have on dom0 the kernel version 2.6.32.40 and on domU the kernel version
> 2.6.18 kernel which has fast suspend support.
>
>> -----Original Message-----
>> From: Keir Fraser [mailto:keir.xen@gmail.com]
>> Sent: quinta-feira, 28 de Julho de 2011 18:33
>> To: Gustavo Pimentel
>> Cc: xen-devel@lists.xensource.com
>> Subject: Re: Issues about domU suspending/resuming
>>
>> On 28/07/2011 18:16, "Gustavo Pimentel" <gustavo.pimentel@efacec.com> wrote:
>>
>>> It's not really a bug, but a performance analysis.
>>> I'm using remus for a HA system and after a system analysis using remus log,
>>> I
>>> notice that suspending a the guest can oscillate between 0.299ms to
>>> 812.909ms.
>>> And the resuming of the same guest oscillates between 0.387ms to 1745,579ms.
>>> The guest it's a virtual machine with 64MB of RAM, without CPU load.
>>>
>>> It's very strange this large range of values of the suspend/resuming of a
>>> guest.
>>
>> First point of contact would be the listed Remus maintainer. More data would
>> be useful I expect -- e.g., does the variation occur with basic live
>> migration (no Remus)?
>>
>> -- Keir
>>
>>>> -----Original Message-----
>>>> From: Keir Fraser [mailto:keir.xen@gmail.com]
>>>> Sent: quinta-feira, 28 de Julho de 2011 18:00
>>>> To: Gustavo Pimentel
>>>> Cc: xen-devel@lists.xensource.com
>>>> Subject: Re: Issues about domU suspending/resuming
>>>>
>>>> On 28/07/2011 17:13, "Gustavo Pimentel" <gustavo.pimentel@efacec.com>
>> wrote:
>>>>
>>>>> Hi, I have looked into the MAINTAINERS file on the xen source, but I
>>>>> didn¹t
>>>>> find the specific maintainer for responsible for issues about domU
>>>>> suspending/resuming procedure.
>>>>> Can you point me to the right person?
>>>>
>>>> It crosses multiple subsystems. It could be a guest kernel bug for example,
>>>> or a toolstack bug. Post a bug report to xen-devel and we can try to triage
>>>> it.
>>>>
>>>> -- Keir
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Issues about domU suspending/resuming
2011-07-28 19:07 ` Keir Fraser
@ 2011-08-02 8:29 ` Gustavo Pimentel
0 siblings, 0 replies; 7+ messages in thread
From: Gustavo Pimentel @ 2011-08-02 8:29 UTC (permalink / raw)
To: Keir Fraser; +Cc: rshriram, xen-devel
[-- Attachment #1: Type: text/plain, Size: 4074 bytes --]
I used the attached file (suspend_resume.c) in order to generate a log interpreted by a "homemade" program wrote by me to create a graph analysis of the suspend/resume of a guest during 30 minutes with suspend/resume cycles of 100ms.
If you observe the ChartAnalysis.png you can see during the 30 minutes of the suspend/resume test of the guest is very consistent, but it happen about 6 spikes (3 of them reaching about 400ms, about 600 to 700 times more than the normal average). This test as perform with no CPU load on the dom0 and domU. I leave you here some stats:
Minimum=0.468 ms
Arithmetic Mean=0.652 ms
Geometric Mean=0.552 ms
Maximum=401.251 ms
> -----Original Message-----
> From: Keir Fraser [mailto:keir.xen@gmail.com]
> Sent: quinta-feira, 28 de Julho de 2011 20:07
> To: Gustavo Pimentel
> Cc: xen-devel@lists.xensource.com; rshriram@cs.ubc.ca
> Subject: Re: Issues about domU suspending/resuming
>
> Perhaps run a program in domU which calls gettimedofday() in a loop and logs
> big deltas between successive calls. If the domain is otherwise idle, you
> can probbaly set the threshold to only trigger during the suspend phase of
> live migration. Then you don't need to hook into the suspend/resume parts of
> the toolstack.
>
> -- Keir
>
>
> On 28/07/2011 19:22, "Gustavo Pimentel" <gustavo.pimentel@efacec.com> wrote:
>
> > I didn't test the suspend/resuming with basic live migration.
> > Do you know how can I invoke some xen api in order to suspend/resuming the
> > guest? I could make a simple program to be able to log times of
> > suspending/resuming and on the future to analysis the basic live migration.
> > Meanwhile I leave here some information about the my test system:
> > I using xen4.2-unstable on a Intel(R) Pentium(R) 4 CPU 3.20GHz processor with
> > hyperthread, using credit scheduler. I have pinning dom0 to vcpu0 and domU to
> > vcpu1.
> > I have on dom0 the kernel version 2.6.32.40 and on domU the kernel version
> > 2.6.18 kernel which has fast suspend support.
> >
> >> -----Original Message-----
> >> From: Keir Fraser [mailto:keir.xen@gmail.com]
> >> Sent: quinta-feira, 28 de Julho de 2011 18:33
> >> To: Gustavo Pimentel
> >> Cc: xen-devel@lists.xensource.com
> >> Subject: Re: Issues about domU suspending/resuming
> >>
> >> On 28/07/2011 18:16, "Gustavo Pimentel" <gustavo.pimentel@efacec.com>
> wrote:
> >>
> >>> It's not really a bug, but a performance analysis.
> >>> I'm using remus for a HA system and after a system analysis using remus log,
> >>> I
> >>> notice that suspending a the guest can oscillate between 0.299ms to
> >>> 812.909ms.
> >>> And the resuming of the same guest oscillates between 0.387ms to
> 1745,579ms.
> >>> The guest it's a virtual machine with 64MB of RAM, without CPU load.
> >>>
> >>> It's very strange this large range of values of the suspend/resuming of a
> >>> guest.
> >>
> >> First point of contact would be the listed Remus maintainer. More data would
> >> be useful I expect -- e.g., does the variation occur with basic live
> >> migration (no Remus)?
> >>
> >> -- Keir
> >>
> >>>> -----Original Message-----
> >>>> From: Keir Fraser [mailto:keir.xen@gmail.com]
> >>>> Sent: quinta-feira, 28 de Julho de 2011 18:00
> >>>> To: Gustavo Pimentel
> >>>> Cc: xen-devel@lists.xensource.com
> >>>> Subject: Re: Issues about domU suspending/resuming
> >>>>
> >>>> On 28/07/2011 17:13, "Gustavo Pimentel" <gustavo.pimentel@efacec.com>
> >> wrote:
> >>>>
> >>>>> Hi, I have looked into the MAINTAINERS file on the xen source, but I
> >>>>> didn¹t
> >>>>> find the specific maintainer for responsible for issues about domU
> >>>>> suspending/resuming procedure.
> >>>>> Can you point me to the right person?
> >>>>
> >>>> It crosses multiple subsystems. It could be a guest kernel bug for example,
> >>>> or a toolstack bug. Post a bug report to xen-devel and we can try to triage
> >>>> it.
> >>>>
> >>>> -- Keir
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>
[-- Attachment #2: suspend_resume.c --]
[-- Type: application/octet-stream, Size: 14658 bytes --]
/******************************************************************************
* suspend_resume.c
*
* Continuously suspends and resumes the domain. Nothing else is done.
* Adapted from libcheckpoint API.
*
* Copyright (c) 2011 Shriram Rajagopalan (rshriram@cs.ubc.ca).
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation;
* version 2.1 of the License.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*
*/
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <signal.h>
#include <sys/stat.h>
#include <unistd.h>
#include <time.h>
#include <xenctrl.h>
#include <xenguest.h>
#include <xs.h>
typedef enum {
dt_unknown,
dt_pv,
dt_hvm,
dt_pvhvm /* HVM with PV drivers */
} checkpoint_domtype;
typedef struct {
xc_interface *xch;
xc_evtchn *xce; /* event channel handle */
struct xs_handle* xsh; /* xenstore handle */
int watching_shutdown; /* state of watch on @releaseDomain */
unsigned int domid;
checkpoint_domtype domtype;
int fd;
int suspend_evtchn;
char* errstr;
} checkpoint_state;
static char errbuf[256];
static int setup_suspend_evtchn(checkpoint_state* s);
static void release_suspend_evtchn(checkpoint_state *s);
static int setup_shutdown_watch(checkpoint_state* s);
static int check_shutdown(checkpoint_state* s);
static void release_shutdown_watch(checkpoint_state* s);
static int evtchn_suspend(checkpoint_state* s);
static int compat_suspend(checkpoint_state* s);
static int pollfd(checkpoint_state* s, int fd);
static int switch_qemu_logdirty(checkpoint_state* s, int enable);
static int suspend_hvm(checkpoint_state* s);
static int suspend_qemu(checkpoint_state* s);
static int resume_qemu(checkpoint_state* s);
/* Returns a string describing the most recent error returned by
* a checkpoint function. Static -- do not free. */
char* checkpoint_error(checkpoint_state* s)
{
return s->errstr;
}
void checkpoint_init(checkpoint_state* s)
{
s->xch = NULL;
s->xce = NULL;
s->xsh = NULL;
s->watching_shutdown = 0;
s->domid = 0;
s->domtype = dt_unknown;
s->suspend_evtchn = -1;
s->errstr = NULL;
}
void checkpoint_close(checkpoint_state* s);
/* open a checkpoint session to guest domid */
int checkpoint_open(checkpoint_state* s, unsigned int domid)
{
xc_dominfo_t dominfo;
unsigned long pvirq;
s->domid = domid;
s->xch = xc_interface_open(0,0,0);
if (!s->xch) {
s->errstr = "could not open control interface (are you root?)";
return -1;
}
s->xsh = xs_daemon_open();
if (!s->xsh) {
checkpoint_close(s);
s->errstr = "could not open xenstore handle";
return -1;
}
s->xce = xc_evtchn_open(NULL, 0);
if (s->xce == NULL) {
checkpoint_close(s);
s->errstr = "could not open event channel handle";
return -1;
}
if (xc_domain_getinfo(s->xch, s->domid, 1, &dominfo) < 0) {
checkpoint_close(s);
s->errstr = "could not get domain info";
return -1;
}
if (dominfo.hvm) {
if (xc_get_hvm_param(s->xch, s->domid, HVM_PARAM_CALLBACK_IRQ, &pvirq)) {
checkpoint_close(s);
s->errstr = "could not get HVM callback IRQ";
return -1;
}
s->domtype = pvirq ? dt_pvhvm : dt_hvm;
} else
s->domtype = dt_pv;
if (setup_shutdown_watch(s) < 0) {
checkpoint_close(s);
return -1;
}
if (s->domtype == dt_pv) {
if (setup_suspend_evtchn(s) < 0) {
fprintf(stderr, "WARNING: suspend event channel unavailable, "
"falling back to slow xenstore signalling\n");
}
}
if ((s->domtype > dt_pv) && switch_qemu_logdirty(s, 1))
return -1;
return 0;
}
void checkpoint_close(checkpoint_state* s)
{
if (s->domtype > dt_pv)
switch_qemu_logdirty(s, 0);
release_shutdown_watch(s);
release_suspend_evtchn(s);
if (s->xch) {
xc_interface_close(s->xch);
s->xch = NULL;
}
if (s->xce != NULL) {
xc_evtchn_close(s->xce);
s->xce = NULL;
}
if (s->xsh) {
xs_daemon_close(s->xsh);
s->xsh = NULL;
}
s->domid = 0;
}
/* suspend the domain. Returns 0 on failure, 1 on success */
int checkpoint_suspend(checkpoint_state* s)
{
int rc;
if (s->suspend_evtchn >= 0)
rc = evtchn_suspend(s);
else if (s->domtype == dt_hvm)
rc = suspend_hvm(s);
else
rc = compat_suspend(s);
return rc < 0 ? 0 : 1;
}
/* let guest execution resume */
int checkpoint_resume(checkpoint_state* s)
{
int rc;
if (xc_domain_resume(s->xch, s->domid, 1)) {
snprintf(errbuf, sizeof(errbuf), "error resuming domain: %d", errno);
s->errstr = errbuf;
return -1;
}
if (s->domtype > dt_pv && resume_qemu(s) < 0)
return -1;
/* restore watchability in xenstore */
if (xs_resume_domain(s->xsh, s->domid) < 0)
fprintf(stderr, "error resuming domain in xenstore\n");
return 0;
}
/* Set up event channel used to signal a guest to suspend itself */
static int setup_suspend_evtchn(checkpoint_state* s)
{
int port;
port = xs_suspend_evtchn_port(s->domid);
if (port < 0) {
s->errstr = "failed to read suspend event channel";
return -1;
}
s->suspend_evtchn = xc_suspend_evtchn_init(s->xch, s->xce, s->domid, port);
if (s->suspend_evtchn < 0) {
s->errstr = "failed to bind suspend event channel";
return -1;
}
fprintf(stderr, "bound to suspend event channel %u:%d as %d\n", s->domid, port,
s->suspend_evtchn);
return 0;
}
/* release suspend event channels bound to guest */
static void release_suspend_evtchn(checkpoint_state *s)
{
/* TODO: teach xen to clean up if port is unbound */
if (s->xce != NULL && s->suspend_evtchn >= 0) {
xc_suspend_evtchn_release(s->xch, s->xce, s->domid, s->suspend_evtchn);
s->suspend_evtchn = -1;
}
}
static int setup_shutdown_watch(checkpoint_state* s)
{
char buf[16];
/* write domain ID to watch so we can ignore other domain shutdowns */
snprintf(buf, sizeof(buf), "%u", s->domid);
if ( !xs_watch(s->xsh, "@releaseDomain", buf) ) {
fprintf(stderr, "Could not bind to shutdown watch\n");
return -1;
}
/* watch fires once on registration */
s->watching_shutdown = 1;
check_shutdown(s);
return 0;
}
/* returns -1 on error or death, 0 if domain is running, 1 if suspended */
static int check_shutdown(checkpoint_state* s) {
unsigned int count;
int xsfd;
char **vec;
char buf[16];
xc_dominfo_t info;
/* for hvms, wait for the xenstore watch */
if (s->domtype > dt_pv) {
xsfd = xs_fileno(s->xsh);
/* loop on watch if it fires for another domain */
while (1) {
if (pollfd(s, xsfd) < 0)
return -1;
vec = xs_read_watch(s->xsh, &count);
if (s->watching_shutdown == 1) {
s->watching_shutdown = 2;
return 0;
}
if (!vec) {
fprintf(stderr, "empty watch fired\n");
continue;
}
snprintf(buf, sizeof(buf), "%d", s->domid);
if (!strcmp(vec[XS_WATCH_TOKEN], buf))
break;
}
}
if (xc_domain_getinfo(s->xch, s->domid, 1, &info) != 1
|| info.domid != s->domid) {
snprintf(errbuf, sizeof(errbuf),
"error getting info for domain %u", s->domid);
s->errstr = errbuf;
return -1;
}
if (!info.shutdown) {
snprintf(errbuf, sizeof(errbuf),
"domain %u not shut down", s->domid);
s->errstr = errbuf;
return 0;
}
if (info.shutdown_reason != SHUTDOWN_suspend)
return -1;
return 1;
}
static void release_shutdown_watch(checkpoint_state* s) {
char buf[16];
if (!s->xsh)
return;
if (!s->watching_shutdown)
return;
snprintf(buf, sizeof(buf), "%u", s->domid);
if (!xs_unwatch(s->xsh, "@releaseDomain", buf))
fprintf(stderr, "Could not release shutdown watch\n");
s->watching_shutdown = 0;
}
static int evtchn_suspend(checkpoint_state* s)
{
int rc;
rc = xc_evtchn_notify(s->xce, s->suspend_evtchn);
if (rc < 0) {
snprintf(errbuf, sizeof(errbuf),
"failed to notify suspend event channel: %d", rc);
s->errstr = errbuf;
return -1;
}
do
if (!(rc = pollfd(s, xc_evtchn_fd(s->xce))))
rc = xc_evtchn_pending(s->xce);
while (rc >= 0 && rc != s->suspend_evtchn);
if (rc <= 0)
return -1;
if (xc_evtchn_unmask(s->xce, s->suspend_evtchn) < 0) {
snprintf(errbuf, sizeof(errbuf),
"failed to unmask suspend notification channel: %d", rc);
s->errstr = errbuf;
return -1;
}
if (check_shutdown(s) != 1)
return -1;
return 0;
}
/* suspend through xenstore if suspend event channel is unavailable */
static int compat_suspend(checkpoint_state* s)
{
char path[128];
sprintf(path, "/local/domain/%u/control/shutdown", s->domid);
if (!xs_write(s->xsh, XBT_NULL, path, "suspend", 7)) {
s->errstr = "error signalling qemu logdirty";
return -1;
}
if (check_shutdown(s) != 1)
return -1;
return 0;
}
/* returns -1 if fd does not become readable within timeout */
static int pollfd(checkpoint_state* s, int fd)
{
fd_set rfds;
struct timeval tv;
int rc;
FD_ZERO(&rfds);
FD_SET(fd, &rfds);
tv.tv_sec = 5;
tv.tv_usec = 500000;
rc = select(fd + 1, &rfds, NULL, NULL, &tv);
if (rc < 0) {
snprintf(errbuf, sizeof(errbuf),
"error polling fd: %s", strerror(errno));
s->errstr = errbuf;
} else if (!rc) {
snprintf(errbuf, sizeof(errbuf), "timeout polling fd");
s->errstr = errbuf;
} else if (! FD_ISSET(fd, &rfds)) {
snprintf(errbuf, sizeof(errbuf), "unknown error polling fd");
s->errstr = errbuf;
} else
return 0;
return -1;
}
/* adapted from the eponymous function in xc_save */
static int switch_qemu_logdirty(checkpoint_state *s, int enable)
{
char path[128];
char *tail, *cmd, *response;
char **vec;
unsigned int len;
sprintf(path, "/local/domain/0/device-model/%u/logdirty/", s->domid);
tail = path + strlen(path);
strcpy(tail, "ret");
if (!xs_watch(s->xsh, path, "qemu-logdirty-ret")) {
s->errstr = "error watching qemu logdirty return";
return 1;
}
/* null fire. XXX unify with shutdown watch! */
vec = xs_read_watch(s->xsh, &len);
free(vec);
strcpy(tail, "cmd");
cmd = enable ? "enable" : "disable";
if (!xs_write(s->xsh, XBT_NULL, path, cmd, strlen(cmd))) {
s->errstr = "error signalling qemu logdirty";
return 1;
}
vec = xs_read_watch(s->xsh, &len);
free(vec);
strcpy(tail, "ret");
xs_unwatch(s->xsh, path, "qemu-logdirty-ret");
response = xs_read(s->xsh, XBT_NULL, path, &len);
if (!len || strcmp(response, cmd)) {
if (len)
free(response);
s->errstr = "qemu logdirty command failed";
return 1;
}
free(response);
fprintf(stderr, "qemu logdirty mode: %s\n", cmd);
return 0;
}
static int suspend_hvm(checkpoint_state *s)
{
int rc = -1;
fprintf(stderr, "issuing HVM suspend hypercall\n");
rc = xc_domain_shutdown(s->xch, s->domid, SHUTDOWN_suspend);
if (rc < 0) {
s->errstr = "shutdown hypercall failed";
return -1;
}
fprintf(stderr, "suspend hypercall returned %d\n", rc);
if (check_shutdown(s) != 1)
return -1;
rc = suspend_qemu(s);
return rc;
}
static int suspend_qemu(checkpoint_state *s)
{
char path[128];
fprintf(stderr, "pausing QEMU\n");
sprintf(path, "/local/domain/0/device-model/%d/command", s->domid);
if (!xs_write(s->xsh, XBT_NULL, path, "save", 4)) {
fprintf(stderr, "error signalling QEMU to save\n");
return -1;
}
sprintf(path, "/local/domain/0/device-model/%d/state", s->domid);
do {
char* state;
unsigned int len;
state = xs_read(s->xsh, XBT_NULL, path, &len);
if (!state) {
s->errstr = "error reading QEMU state";
return -1;
}
if (!strcmp(state, "paused")) {
free(state);
return 0;
}
free(state);
usleep(1000);
} while(1);
return -1;
}
static int resume_qemu(checkpoint_state *s)
{
char path[128];
fprintf(stderr, "resuming QEMU\n");
sprintf(path, "/local/domain/0/device-model/%d/command", s->domid);
if (!xs_write(s->xsh, XBT_NULL, path, "continue", 8)) {
fprintf(stderr, "error signalling QEMU to resume\n");
return -1;
}
return 0;
}
static uint64_t tv_delta(struct timeval *new, struct timeval *old)
{
return (((new->tv_sec - old->tv_sec)*1000000) +
(new->tv_usec - old->tv_usec));
}
int quit=0;
void stopme(int signum)
{
quit = 1;
}
int main(int argc, char *argv[])
{
int domid, interval, runtime = 0;
checkpoint_state s;
int iter = 0;
unsigned int scall, rcall;
struct timeval suspendCall, suspendTime, resumeTime;
if (argc <3)
{
fprintf(stderr, "usage: suspend_resume <domID (not name!)> interval(ms) [testTime (s)]\n");
exit(1);
}
signal(SIGINT, stopme);
signal(SIGALRM, stopme);
signal(SIGTERM, stopme);
domid = atoi(argv[1]);
interval = atoi(argv[2]);
if (argc >3)
runtime = atoi(argv[3]);
checkpoint_init(&s);
if (checkpoint_open(&s, domid) < 0)
{
fprintf(stderr, "error setting up suspend interface to dom %d\n",domid);
exit(1);
}
if (runtime)
alarm(runtime);
while(!quit)
{
iter++;
gettimeofday(&suspendCall,0);
if (!checkpoint_suspend(&s))
{
fprintf(stderr, "failed to suspend domain %d\n", domid);
exit(1);
}
gettimeofday(&suspendTime, 0);
if (checkpoint_resume(&s) < 0)
{
fprintf(stderr, "failed to resume domain %d\n", domid);
exit(1);
}
gettimeofday(&resumeTime, 0);
scall = (unsigned int)(tv_delta(&suspendTime,&suspendCall));
rcall = (unsigned int)(tv_delta(&resumeTime,&suspendTime));
printf("REMUS:%d:suspendAt:%lu.%06lu:scall:%u:rcall:%u:dcall:%u:suspendFor:%u:ctime:%u:flush:%u:commit:%u:tosend:%u:comp:%u\n",
iter, suspendCall.tv_sec, suspendCall.tv_usec, scall, rcall, 0, 0, 0, 0, 0, 0, 0, 0, 0);
usleep(interval * 1000);
}
checkpoint_close(&s);
}
[-- Attachment #3: ChartAnalysis.png --]
[-- Type: image/png, Size: 17433 bytes --]
[-- Attachment #4: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-08-02 8:29 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-28 16:13 Issues about domU suspending/resuming Gustavo Pimentel
2011-07-28 17:00 ` Keir Fraser
2011-07-28 17:16 ` Gustavo Pimentel
2011-07-28 17:32 ` Keir Fraser
2011-07-28 18:22 ` Gustavo Pimentel
2011-07-28 19:07 ` Keir Fraser
2011-08-02 8:29 ` Gustavo Pimentel
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.