xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Shriram Rajagopalan <rshriram@cs.ubc.ca>
To: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH v5 00/21] libxl: domain save/restore: run in a separate process
Date: Wed, 27 Jun 2012 12:42:41 -0400	[thread overview]
Message-ID: <CAP8mzPM6F=gKB4hGi1JZqkuvYzcOmv8-SfLJODRC5Ls0kkiuUQ@mail.gmail.com> (raw)
In-Reply-To: <CAP8mzPOLED3CTHh4_YfqXcb0MF4WGdsPWM_1JX4MktS_4OU6vQ@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 6212 bytes --]

On Wed, Jun 27, 2012 at 12:09 PM, Shriram Rajagopalan <rshriram@cs.ubc.ca>wrote:

> On Wed, Jun 27, 2012 at 11:59 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:
>
>> Ian Jackson writes ("Re: [PATCH v5 00/21] libxl: domain save/restore: run
>> in a separate process"):
>> > However, when I apply my series, I can indeed produce an assertion
>> > failure:
>> ...
>> > So I have indeed made matters worse.
>>
>> I found two bugs:
>>
>> 1. The void* passed to the callback was being treated as a
>> libxl__domain_suspend_state* by the remus callbacks; this is a
>> holdover from a much earlier version of the series.  It should be
>> converted to a libxl__save_helper_state and then the dss extracted
>> with CONTAINER_OF.
>>
>> 2. The way remus works means that the toolstack save callback is
>> invoked more than once, which the helper's implementation was not
>> prepared to deal with.  Fix this by moving the rewind of the fd into
>> the helper.
>>
>> Fixes for these are below.  With this, on top of my series, seem to I
>> get the same behaviour as with the baseline.  Would you like to try it ?
>>
>>
> Sure, I ll give it a shot.
> Btw, my earlier mail was in response to remus not
> working on the baseline setup on your dev environment.
>
>
The fix works for 2 out of 3 cases
 blackhole replication (xl remus -b)
 localhost replication with failover i.e. destroy primary (xl remus domU
localhost)

However, it crashes the guest for localhost replication, when I destroy the
backup
i.e. xl destroy domU--incoming . The primary guest would generally resume,
but in this
case its in --sc- state.
NB: This seems to happen in baseline xen-unstable too!.

xc: error: unexpected PFN mapping failure pfn 180e map_mfn 43b808 p2m_mfn
43b808: Internal error
libxl: error: libxl_create.c:760:libxl__xc_domain_restore_done: restoring
domain: Resource temporarily unavailable
libxl: error: libxl_create.c:844:domcreate_rebuild_done: cannot (re-)build
domain: -3
libxl: error: libxl.c:1220:libxl_domain_destroy: non-existant domain 17
libxl: error: libxl_create.c:995:domcreate_complete: unable to destroy
domain 17 following failed creation
migration target: Domain creation failed (code -3).
..
Total Data Sent= 12.597 MB
libxl: debug: libxl_dom.c:801:libxl__domain_suspend_common_callback:
issuing PV suspend request via XenBus control node
libxl: debug: libxl_dom.c:805:libxl__domain_suspend_common_callback: wait
for the guest to acknowledge suspend request
libxl: debug: libxl_dom.c:852:libxl__domain_suspend_common_callback: guest
acknowledged suspend request
libxl: debug: libxl_dom.c:856:libxl__domain_suspend_common_callback: wait
for the guest to suspend
libxl: debug: libxl_dom.c:870:libxl__domain_suspend_common_callback: guest
has suspended
pagetables=2,cache_misses=0,emptypages=45
libxl: error: libxl_utils.c:363:libxl_read_exactly: file/stream truncated
reading ipc msg header from domain 16 save/restore helper stdout pipe
libxl: error: libxl_exec.c:129:libxl_report_child_exitstatus: domain 16
save/restore helper [3148] died due to fatal signal Broken pipe
libxl: debug: libxl_event.c:1434:libxl__ao_complete: ao 0x1b08c80:
complete, rc=-3
libxl: debug: libxl_event.c:1406:libxl__ao__destroy: ao 0x1b08c80: destroy
remus sender: libxl_domain_suspend failed (rc=-3)
Remus: Backup failed? resuming domain at primary.
xc: debug: hypercall buffer: total allocations:2116 total releases:2116
xc: debug: hypercall buffer: current allocations:0 maximum allocations:2
xc: debug: hypercall buffer: cache current size:2
xc: debug: hypercall buffer: cache hits:1729 misses:2 toobig:385




>
>
>> Thanks,
>> Ian.
>>
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index abc5932..069aca1 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -984,7 +984,8 @@ static int libxl__remus_domain_suspend_callback(void
>> *data)
>>
>>  static int libxl__remus_domain_resume_callback(void *data)
>>  {
>> -    libxl__domain_suspend_state *dss = data;
>> +    libxl__save_helper_state *shs = data;
>> +    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
>>     STATE_AO_GC(dss->ao);
>>
>>      /* Resumes the domain and the device model */
>> @@ -1002,7 +1003,8 @@ static void remus_checkpoint_dm_saved(libxl__egc
>> *egc,
>>
>>  static void libxl__remus_domain_checkpoint_callback(void *data)
>>  {
>> -    libxl__domain_suspend_state *dss = data;
>> +    libxl__save_helper_state *shs = data;
>> +    libxl__domain_suspend_state *dss = CONTAINER_OF(shs, *dss, shs);
>>     libxl__egc *egc = dss->shs.egc;
>>     STATE_AO_GC(dss->ao);
>>
>> diff --git a/tools/libxl/libxl_save_callout.c
>> b/tools/libxl/libxl_save_callout.c
>> index 6332beb..078b7ee 100644
>> --- a/tools/libxl/libxl_save_callout.c
>> +++ b/tools/libxl/libxl_save_callout.c
>> @@ -105,13 +105,6 @@ void libxl__xc_domain_save(libxl__egc *egc,
>> libxl__domain_suspend_state *dss,
>>                                 toolstack_data_buf, toolstack_data_len,
>>                                  "toolstack data tmpfile", 0);
>>          if (r) { rc = ERROR_FAIL; goto out; }
>> -
>> -        r = lseek(toolstack_data_fd, 0, SEEK_SET);
>> -        if (r) {
>> -            LOGE(ERROR, "rewind toolstack data tmpfile");
>> -            rc = ERROR_FAIL;
>> -            goto out;
>> -        }
>>     }
>>
>>     const unsigned long argnums[] = {
>> diff --git a/tools/libxl/libxl_save_helper.c
>> b/tools/libxl/libxl_save_helper.c
>> index 3bdfa28..772251a 100644
>> --- a/tools/libxl/libxl_save_helper.c
>> +++ b/tools/libxl/libxl_save_helper.c
>> @@ -171,12 +171,14 @@ static int toolstack_save_cb(uint32_t domid,
>> uint8_t **buf,
>>  {
>>     assert(toolstack_save_fd > 0);
>>
>> +    int r = lseek(toolstack_save_fd, 0, SEEK_SET);
>> +    if (r) fail(errno,"rewind toolstack data tmpfile");
>> +
>>      *buf = xmalloc(toolstack_save_len);
>> -    int r = read_exactly(toolstack_save_fd, *buf, toolstack_save_len);
>> +    r = read_exactly(toolstack_save_fd, *buf, toolstack_save_len);
>>     if (r<0) fail(errno,"read toolstack data");
>>      if (r==0) fail(0,"read toolstack data eof");
>>
>> -    toolstack_save_fd = -1;
>>     *len = toolstack_save_len;
>>     return 0;
>>  }
>>
>>
>

[-- Attachment #1.2: Type: text/html, Size: 7981 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2012-06-27 16:42 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-26 17:54 [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
2012-06-26 17:54 ` [PATCH 01/21] libxc: xc_domain_restore, make toolstack_restore const-correct Ian Jackson
2012-06-26 17:54 ` [PATCH 02/21] libxc: Do not segfault if (e.g.) switch_qemu_logdirty fails Ian Jackson
2012-06-26 17:55 ` [PATCH 03/21] libxl: domain save: rename variables etc Ian Jackson
2012-06-26 17:55 ` [PATCH 04/21] libxl: domain restore: reshuffle, preparing for ao Ian Jackson
2012-06-26 17:55 ` [PATCH 05/21] libxl: domain save: API changes for asynchrony Ian Jackson
2012-06-26 17:55 ` [PATCH 06/21] libxl: domain save/restore: run in a separate process Ian Jackson
2012-06-26 17:55 ` [PATCH 07/21] libxl: rename libxl_dom:save_helper to physmap_path Ian Jackson
2012-06-26 17:55 ` [PATCH 08/21] libxl: provide libxl__xs_*_checked and libxl__xs_transaction_* Ian Jackson
2012-06-26 17:55 ` [PATCH 09/21] libxl: wait for qemu to acknowledge logdirty command Ian Jackson
2012-06-26 17:55 ` [PATCH 10/21] libxl: datacopier: provide "prefix data" facility Ian Jackson
2012-06-26 17:55 ` [PATCH 11/21] libxl: prepare for asynchronous writing of qemu save file Ian Jackson
2012-06-26 17:55 ` [PATCH 12/21] libxl: Make libxl__domain_save_device_model asynchronous Ian Jackson
2012-06-26 17:55 ` [PATCH 13/21] libxl: Add a gc to libxl_get_cpu_topology Ian Jackson
2012-06-26 17:55 ` [PATCH 14/21] libxl: Do not pass NULL as gc_opt; introduce NOGC Ian Jackson
2012-06-26 17:55 ` [PATCH 15/21] libxl: Get compiler to warn about gc_opt==NULL Ian Jackson
2012-06-28 17:56   ` Ian Jackson
2012-06-26 17:55 ` [PATCH 16/21] xl: Handle return value from libxl_domain_suspend correctly Ian Jackson
2012-06-26 17:55 ` [PATCH 17/21] libxl: do not leak dms->saved_state Ian Jackson
2012-06-26 17:55 ` [PATCH 18/21] libxl: do not leak spawned middle children Ian Jackson
2012-06-26 17:55 ` [PATCH 19/21] libxl: do not leak an event struct on ignored ao progress Ian Jackson
2012-06-26 17:55 ` [PATCH 20/21] libxl: further fixups re LIBXL_DOMAIN_TYPE Ian Jackson
2012-06-26 17:55 ` [PATCH 21/21] libxl: DO NOT APPLY enforce prohibition on internal Ian Jackson
2012-06-26 18:00 ` [PATCH v5 00/21] libxl: domain save/restore: run in a separate process Ian Jackson
2012-06-26 18:44   ` Shriram Rajagopalan
2012-06-27  1:25     ` Shriram Rajagopalan
2012-06-27 13:46       ` Ian Jackson
2012-06-27 15:59         ` Ian Jackson
2012-06-27 16:09           ` Shriram Rajagopalan
2012-06-27 16:42             ` Shriram Rajagopalan [this message]
2012-06-28 11:24               ` Ian Jackson
2012-06-27 16:06         ` Shriram Rajagopalan
2012-06-27 13:17     ` Ian Jackson
2012-06-27 13:28       ` Shriram Rajagopalan
2012-06-28 13:38 ` [PATCH v6 " Ian Jackson
2012-06-28 13:50   ` Ian Campbell
2012-06-28 14:24     ` Ian Jackson
2012-06-28 14:44       ` Ian Campbell
2012-06-28 15:17       ` Shriram Rajagopalan
2012-06-28 17:45   ` Ian Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAP8mzPM6F=gKB4hGi1JZqkuvYzcOmv8-SfLJODRC5Ls0kkiuUQ@mail.gmail.com' \
    --to=rshriram@cs.ubc.ca \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).