qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Alex Bligh <alex@alex.org.uk>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] When does live migration give up?
Date: Wed, 04 Sep 2013 20:34:27 +0200	[thread overview]
Message-ID: <52277D33.1060602@redhat.com> (raw)
In-Reply-To: <4E05C0A561DF799779B6023D@nimrod.local>

Il 04/09/2013 20:05, Alex Bligh ha scritto:
> Paolo,
> 
> --On 4 September 2013 19:07:53 +0200 Paolo Bonzini <pbonzini@redhat.com>
> wrote:
> 
>> Il 04/09/2013 17:24, Alex Bligh ha scritto:
>>> We have seen a situation when migrating about 50 VMs at once where some
>>> of them fail. I think this is because they are dirtying pages faster
>>> than
>>> they can be transmitted.
>>
>> No, migration never "gives up".  It may never converge, but it keeps
>> trying until cancelled.
>>
>> Could it be that you are choosing migration server ports from a small
>> range, and some of them are failing because two migrations pick the same
>> random port for the destination (which is where the server socket lies)?
> 
> Should not be that. We create FDs (which are sockets) and pass them in at
> both ends.

Do you mean something like this?

   destination
      socket()
      bind() to { sin_port = 0, sin_addr.s_addr = INADDR_ANY }
      listen()
      getsockname()
      send address to source
      accept()
      start QEMU with file descriptor returned by accept

   source
      read address
      socket()
      connect()
      pass socket file descriptor to QEMU and migrate to it

Anything that doesn't use sin_port = 0 and getsockname() is prone to
race conditions.

> Approx 10% of migrations die after many minutes on the
> customer's platform. This does not appear to happen if migrations are
> not carried out 50 at a time.

Dying after many minutes usually means that the destination is not set
up the same as the source, as you said below.

Paolo

> We appear to be getting something other than 'ms' returned through the
> monitoring system. Unhelpfully what that is is not logged.
> 
> Is there anything (apart from the socket closing prematurely) which can
> cause a failed migration after many minutes? We've seen problems where
> the destination is not set up the same as the source (e.g. different
> numbers of NICs) but IIRC that fails much earlier.
> 
> To make things easier (cough), this is qemu 1.0 (as shipped with Ubuntu
> Precise).
> 

  reply	other threads:[~2013-09-04 18:34 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-04 15:24 [Qemu-devel] When does live migration give up? Alex Bligh
2013-09-04 17:07 ` Paolo Bonzini
2013-09-04 18:05   ` Alex Bligh
2013-09-04 18:34     ` Paolo Bonzini [this message]
2013-09-04 22:37       ` Alex Bligh
2013-09-04 18:35     ` Alex Bligh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52277D33.1060602@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=alex@alex.org.uk \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).