xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Brendan Cully <Brendan@cs.ubc.ca>
To: Keir Fraser <keir.fraser@eu.citrix.com>
Cc: Andreas Olsowski <andreas.olsowski@uni.leuphana.de>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>
Subject: Re: slow live magration / xc_restore on xen4 pvops
Date: Wed, 2 Jun 2010 18:04:19 -0700	[thread overview]
Message-ID: <20100603010418.GB2028@kremvax.cs.ubc.ca> (raw)
In-Reply-To: <C82C445E.167B0%keir.fraser@eu.citrix.com>

On Wednesday, 02 June 2010 at 17:24, Keir Fraser wrote:
> On 02/06/2010 17:18, "Ian Jackson" <Ian.Jackson@eu.citrix.com> wrote:
> 
> > Andreas Olsowski writes ("[Xen-devel] slow live magration / xc_restore on xen4
> > pvops"):
> >> [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal
> >> error: Error when reading batch size
> >> [2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal
> >> error: error when buffering batch, finishing
> > 
> > These errors, and the slowness of migrations, are caused by changes
> > made to support Remus.  Previously, a migration would be regarded as
> > complete as soon as the final information including CPU states was
> > received at the migration target.  xc_domain_restore would return
> > immediately at that point.
> 
> This probably needs someone with Remus knowledge to take a look, to keep all
> cases working correctly. I'll Cc Brendan. It'd be good to get this fixed for
> a 4.0.1 in a few weeks.

I've done a bit of profiling of the restore code and observed the
slowness here too. It looks to me like it's probably related to
superpage changes. The big hit appears to be at the front of the
restore process during calls to allocate_mfn_list, under the
normal_page case. It looks like we're calling
xc_domain_memory_populate_physmap once per page here, instead of
batching the allocation? I haven't had time to investigate further
today, but I think this is the culprit.

> 
>  -- Keir
> 
> > Since the Remus patches, xc_domain_restore waits until it gets an IO
> > error, and also has a very short timeout which induces IO errors if
> > nothing is received if there is no timeout.  This is correct in the
> > Remus case but wrong in the normal case.
> > 
> > The code should be changed so that xc_domain_restore
> >  (a) takes an explicit parameter for the IO timeout, which
> >      should default to something much longer than the 100ms or so of
> >      the Remus case, and
> >  (b) gets told whether
> >     (i) it should return immediately after receiving the "tail"
> >         which contains the CPU state; or
> >     (ii) it should attempt to keep reading after receiving the "tail"
> >         and only return when the connection fails.
> > 
> > In the case (b)(i), which should be the usual case, the behaviour
> > should be that which we would get if changeset 20406:0f893b8f7c15 was
> > reverted.  The offending code is mostly this, from 20406:
> > 
> > +    // DPRINTF("Buffered checkpoint\n");
> > +
> > +    if ( pagebuf_get(&pagebuf, io_fd, xc_handle, dom) ) {
> > +        ERROR("error when buffering batch, finishing\n");
> > +        goto finish;
> > +    }
> > +    memset(&tmptail, 0, sizeof(tmptail));
> > +    if ( buffer_tail(&tmptail, io_fd, max_vcpu_id, vcpumap,
> > +                     ext_vcpucontext) < 0 ) {
> > +        ERROR ("error buffering image tail, finishing");
> > +        goto finish;
> > +    }
> > +    tailbuf_free(&tailbuf);
> > +    memcpy(&tailbuf, &tmptail, sizeof(tailbuf));
> > +
> > +    goto loadpages;
> > +
> > +  finish:
> > 
> > Ian.
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

  reply	other threads:[~2010-06-03  1:04 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-01 17:49 XCP AkshayKumar Mehta
2010-06-01 19:06 ` XCP Jonathan Ludlam
2010-06-01 19:15   ` XCP AkshayKumar Mehta
2010-06-03  3:03   ` XCP AkshayKumar Mehta
2010-06-03 10:24     ` XCP Jonathan Ludlam
2010-06-03 17:20       ` XCP AkshayKumar Mehta
2010-08-31  1:33       ` XCP - iisues with XCP .5 AkshayKumar Mehta
2010-06-01 21:17 ` slow live magration / xc_restore on xen4 pvops Andreas Olsowski
2010-06-02  7:11   ` Keir Fraser
2010-06-02 15:46     ` Andreas Olsowski
2010-06-02 15:55       ` Keir Fraser
2010-06-02 16:18   ` Ian Jackson
2010-06-02 16:20     ` Ian Jackson
2010-06-02 16:24     ` Keir Fraser
2010-06-03  1:04       ` Brendan Cully [this message]
2010-06-03  4:31         ` Brendan Cully
2010-06-03  5:47         ` Keir Fraser
2010-06-03  6:45           ` Brendan Cully
2010-06-03  6:53             ` Jeremy Fitzhardinge
2010-06-03  6:55             ` Brendan Cully
2010-06-03  7:12               ` Keir Fraser
2010-06-03  8:58             ` Zhai, Edwin
2010-06-09 13:32               ` Keir Fraser
2010-06-02 16:27     ` Brendan Cully
2010-06-03 10:01       ` Ian Jackson
2010-06-03 15:03         ` Brendan Cully
2010-06-03 15:18           ` Keir Fraser
2010-06-03 17:15           ` Ian Jackson
2010-06-03 17:29             ` Brendan Cully
2010-06-03 18:02               ` Ian Jackson
2010-06-02 22:59   ` Andreas Olsowski
2010-06-10  9:27     ` Keir Fraser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100603010418.GB2028@kremvax.cs.ubc.ca \
    --to=brendan@cs.ubc.ca \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=andreas.olsowski@uni.leuphana.de \
    --cc=keir.fraser@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).