All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: rshriram@cs.ubc.ca
Cc: Yang Hongyang <yanghy@cn.fujitsu.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save
Date: Wed, 16 Jul 2014 17:33:21 +0100	[thread overview]
Message-ID: <53C6A951.3060605@citrix.com> (raw)
In-Reply-To: <CAP8mzPPkooX80fhAL9XFkUUNfY69177xesh_jwahsc7y-n1p_A@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 3999 bytes --]

On 16/07/14 17:02, Shriram Rajagopalan wrote:


>  
>
>>      
>>
>>         +            rc = send_domain_memory_live(ctx);
>>         +        }
>>         +        else
>>         +        {
>>         +            DPRINTF("Starting nonlive save");
>>         +            rc = send_domain_memory_nonlive(ctx);
>>         +        }
>>
>>         -    if ( rc )
>>         -        goto err;
>>         +        if ( rc )
>>         +            goto err;
>>
>>         -    /* Refresh domain information now it has paused. */
>>         -    if ( (xc_domain_getinfo(xch, ctx->domid, 1,
>>         &ctx->dominfo) != 1) ||
>>         -         (ctx->dominfo.domid != ctx->domid) )
>>         -    {
>>         -        PERROR("Unable to refresh domain information");
>>         -        rc = -1;
>>         -        goto err;
>>         -    }
>>         -    else if ( (!ctx->dominfo.shutdown ||
>>         -               ctx->dominfo.shutdown_reason !=
>>         SHUTDOWN_suspend ) &&
>>         -              !ctx->dominfo.paused )
>>         -    {
>>         -        ERROR("Domain has not been suspended");
>>         -        rc = -1;
>>         -        goto err;
>>         -    }
>>         +        /* Refresh domain information now it has paused. */
>>         +        if ( (xc_domain_getinfo(xch, ctx->domid, 1,
>>         &ctx->dominfo) != 1) ||
>>         +             (ctx->dominfo.domid != ctx->domid) )
>>         +        {
>>         +            PERROR("Unable to refresh domain information");
>>         +            rc = -1;
>>         +            goto err;
>>         +        }
>>         +        else if ( (!ctx->dominfo.shutdown ||
>>         +                  ctx->dominfo.shutdown_reason !=
>>         SHUTDOWN_suspend ) &&
>>         +                  !ctx->dominfo.paused )
>>         +        {
>>         +            ERROR("Domain has not been suspended");
>>         +            rc = -1;
>>         +            goto err;
>>         +        }
>>
>>
>>     A silly question: Shouldn't this check for 'suspended status' be
>>     before the call to 
>>     send_domain_memory_live() [under remus]. I am assuming that the
>>     send_domain_memory_live() function is the one that sends the
>>     dirty page data
>>     out on the wire.
>
>     Even for non-live migrates, we have to ensure that the VM has
>     entered the suspend state.  A PV guest cannot possibly recover
>     from resume if it didn't voluntarily suspend as it will loose its
>     reference to the start info page (whos mfn lives in vcpu0.edx for
>     the duration of the migrate and must be updated on the receiving
>     side).  Any VM which doesn't sufficiently quiesce its fronends
>     risks ring and memory corruption on resume.
>
>     In this case, the send_domain_memory_XXX functions do take care of
>     pausing the domain at the appropriate time, but this here is a
>     sanity check.
>
>
> True. I am assuming that the send_domain_memory_XXX functions do some
> heavy lifting (copying out dirty pages onto the fd, etc).  So,
> shouldn't the sanity 
> check occur right after issuing a suspend op, i.e., in the
> send_domain_memory_XXX functions,
> before attempting to send the domain memory on the wire?
>
> Assuming that the check is already there in those functions, then this
> block 
> above would be redundant.
>
>

You are correct.  The code is in the way it is because of the order in
which I implemented things when building live migration from first
principles.

suspend_domain() confirming that the domain has indeed suspended is a
recent addition, and does indeed negate the refresh of the domain
information in save().  I will fix that in the upcoming series.

However, I think keeping the sanity check that the domain has actually
been paused is still a good idea.  It is a stated precondition of the
following functions and I know for certain that debugging the tail of
migration with an unpaused VM causes some very subtle bugs on resume.

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 9380 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2014-07-16 16:33 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-09  7:47 [RFC PATCH 0/3] Remus: add remus support for migration v2 Yang Hongyang
2014-07-09  7:47 ` [RFC PATCH 1/3] remus: add a bool var to indicate checkpointed stream Yang Hongyang
2014-07-09  9:45   ` Andrew Cooper
2014-07-09  9:53     ` Hongyang Yang
2014-07-09  7:47 ` [RFC PATCH 2/3] remus: implement remus checkpoint in v2 save Yang Hongyang
2014-07-09 10:53   ` Andrew Cooper
2014-07-10  3:25     ` Hongyang Yang
2014-07-10  8:49       ` Ian Campbell
2014-07-10  9:24       ` Andrew Cooper
2014-07-16 15:22   ` Shriram Rajagopalan
2014-07-16 15:38     ` Andrew Cooper
2014-07-16 16:02       ` Shriram Rajagopalan
2014-07-16 16:33         ` Andrew Cooper [this message]
2014-07-09  7:47 ` [RFC PATCH 3/3] remus: adjust x86 pv restore to support remus Yang Hongyang
2014-07-09 11:16   ` Andrew Cooper
2014-07-09 11:26     ` Andrew Cooper
2014-07-10  3:30       ` Hongyang Yang
2014-07-10  9:25         ` Andrew Cooper
2014-07-10  9:32           ` Hongyang Yang
2014-07-10  9:42             ` Andrew Cooper
2014-07-10  9:47               ` Hongyang Yang
2014-07-09  8:53 ` [RFC PATCH 0/3] Remus: add remus support for migration v2 Ian Campbell
2014-07-09  9:56   ` Hongyang Yang
2014-07-09  9:42 ` Andrew Cooper
2014-07-09 10:06   ` Hongyang Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53C6A951.3060605@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=rshriram@cs.ubc.ca \
    --cc=xen-devel@lists.xen.org \
    --cc=yanghy@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.