All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: qemu-devel@nongnu.org, Laurent Vivier <lvivier@redhat.com>,
	Alexey Perevalov <a.perevalov@samsung.com>,
	Juan Quintela <quintela@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	berrange@redhat.com
Subject: Re: [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery
Date: Mon, 21 Aug 2017 15:47:44 +0800	[thread overview]
Message-ID: <20170821074744.GA30356@pxdev.xzpeter.org> (raw)
In-Reply-To: <20170803155753.GD3673@work-vm>

On Thu, Aug 03, 2017 at 04:57:54PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > As we all know that postcopy migration has a potential risk to lost
> > the VM if the network is broken during the migration. This series
> > tries to solve the problem by allowing the migration to pause at the
> > failure point, and do recovery after the link is reconnected.
> > 
> > There was existing work on this issue from Md Haris Iqbal:
> > 
> > https://lists.nongnu.org/archive/html/qemu-devel/2016-08/msg03468.html
> > 
> > This series is a totally re-work of the issue, based on Alexey
> > Perevalov's recved bitmap v8 series:
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06401.html
> 
> 
> Hi Peter,
>   See my comments on the individual patches; but at a top level I think
> it looks pretty good.
> 
>   I still worry about two related things, one I see is similar to what
> you discussed with Dan.
> 
>   1) Is what happens if we end up hanging on a missing page with the bql
>   taken and can't use the monitor.
>   Checking my notes from when I was chatting to Harris last year,
>     'info cpu' was pretty good at doing this because it needed the vcpus
>   to come out of their loops, so if any vcpu was blocked on memory we'd
>   block waiting.  The other case is where an emulated IO device accesses
>   it, and that's easiest by doing a migrate with inbound network
>   traffic.
>   In this case, will your 'accept' still work?

It will not work.

To solve this problem, I posted the series:

  [RFC 0/6] monitor: allow per-monitor thread

Let's see whether that is acceptable.

> 
>   2) Similar to Dan's question of what happens if the network just hangs
>   as opposed to gives an error;  it should eventually sort itself out
>   with TCP timeouts - eventually.  Perhaps the easiest way to test this
>   is just to add a iptables -j DROP  for the migration port - it's
>   probably easier to trigger (1).

Yeah, so I think I'll just avoid considering this for now.  Thanks,

-- 
Peter Xu

      reply	other threads:[~2017-08-21  7:47 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-28  8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 01/29] migration: fix incorrect postcopy recved_bitmap Peter Xu
2017-07-31 16:34   ` Dr. David Alan Gilbert
2017-08-01  2:11     ` Peter Xu
2017-08-01  5:48       ` Alexey Perevalov
2017-08-01  6:02         ` Peter Xu
2017-08-01  6:12           ` Alexey Perevalov
2017-07-28  8:06 ` [Qemu-devel] [RFC 02/29] migration: fix comment disorder in RAMState Peter Xu
2017-07-31 16:39   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 03/29] io: fix qio_channel_socket_accept err handling Peter Xu
2017-07-31 16:53   ` Dr. David Alan Gilbert
2017-08-01  2:25     ` Peter Xu
2017-08-01  8:32       ` Daniel P. Berrange
2017-08-01  8:55         ` Dr. David Alan Gilbert
2017-08-02  3:21           ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 04/29] bitmap: introduce bitmap_invert() Peter Xu
2017-07-31 17:11   ` Dr. David Alan Gilbert
2017-08-01  2:43     ` Peter Xu
2017-08-01  8:40       ` Dr. David Alan Gilbert
2017-08-02  3:20         ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 05/29] bitmap: introduce bitmap_count_one() Peter Xu
2017-07-31 17:58   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 06/29] migration: dump str in migrate_set_state trace Peter Xu
2017-07-31 18:27   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 07/29] migration: better error handling with QEMUFile Peter Xu
2017-07-31 18:39   ` Dr. David Alan Gilbert
2017-08-01  5:49     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 08/29] migration: reuse mis->userfault_quit_fd Peter Xu
2017-07-31 18:42   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 09/29] migration: provide postcopy_fault_thread_notify() Peter Xu
2017-07-31 18:45   ` Dr. David Alan Gilbert
2017-08-01  3:01     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 10/29] migration: new property "x-postcopy-fast" Peter Xu
2017-07-31 18:52   ` Dr. David Alan Gilbert
2017-08-01  3:13     ` Peter Xu
2017-08-01  8:50       ` Dr. David Alan Gilbert
2017-08-02  3:31         ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 11/29] migration: new postcopy-pause state Peter Xu
2017-07-28 15:53   ` Eric Blake
2017-07-31  7:02     ` Peter Xu
2017-07-31 19:06   ` Dr. David Alan Gilbert
2017-08-01  6:28     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy Peter Xu
2017-08-01  9:47   ` Dr. David Alan Gilbert
2017-08-02  5:06     ` Peter Xu
2017-08-03 14:03       ` Dr. David Alan Gilbert
2017-08-04  3:43         ` Peter Xu
2017-08-04  9:33           ` Dr. David Alan Gilbert
2017-08-04  9:44             ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 13/29] migration: allow src return path to pause Peter Xu
2017-08-01 10:01   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 14/29] migration: allow send_rq to fail Peter Xu
2017-08-01 10:30   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 15/29] migration: allow fault thread to pause Peter Xu
2017-08-01 10:41   ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option Peter Xu
2017-07-28 15:57   ` Eric Blake
2017-07-31  7:05     ` Peter Xu
2017-08-01 10:42   ` Dr. David Alan Gilbert
2017-08-01 11:03   ` Daniel P. Berrange
2017-08-02  5:56     ` Peter Xu
2017-08-02  9:28       ` Daniel P. Berrange
2017-07-28  8:06 ` [Qemu-devel] [RFC 17/29] migration: rebuild channel on source Peter Xu
2017-08-01 10:59   ` Dr. David Alan Gilbert
2017-08-02  6:14     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 18/29] migration: new state "postcopy-recover" Peter Xu
2017-08-01 11:36   ` Dr. David Alan Gilbert
2017-08-02  6:42     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 19/29] migration: let dst listen on port always Peter Xu
2017-08-01 10:56   ` Daniel P. Berrange
2017-08-02  7:02     ` Peter Xu
2017-08-02  9:26       ` Daniel P. Berrange
2017-08-02 11:02         ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 20/29] migration: wakeup dst ram-load-thread for recover Peter Xu
2017-08-03  9:28   ` Dr. David Alan Gilbert
2017-08-04  5:46     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 21/29] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
2017-08-03  9:49   ` Dr. David Alan Gilbert
2017-08-04  6:08     ` Peter Xu
2017-08-04  6:15       ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 22/29] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
2017-08-03 10:50   ` Dr. David Alan Gilbert
2017-08-04  6:59     ` Peter Xu
2017-08-04  9:49       ` Dr. David Alan Gilbert
2017-08-07  6:11         ` Peter Xu
2017-08-07  9:04           ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
2017-08-03 11:05   ` Dr. David Alan Gilbert
2017-08-04  7:04     ` Peter Xu
2017-08-04  7:09       ` Peter Xu
2017-08-04  8:30         ` Dr. David Alan Gilbert
2017-08-04  9:22           ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 24/29] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
2017-08-03 11:21   ` Dr. David Alan Gilbert
2017-08-04  7:23     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 25/29] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
2017-08-03 11:38   ` Dr. David Alan Gilbert
2017-08-04  7:39     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 26/29] migration: synchronize dirty bitmap for resume Peter Xu
2017-08-03 11:56   ` Dr. David Alan Gilbert
2017-08-04  7:49     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 27/29] migration: setup ramstate " Peter Xu
2017-08-03 12:37   ` Dr. David Alan Gilbert
2017-08-04  8:39     ` Peter Xu
2017-07-28  8:06 ` [Qemu-devel] [RFC 28/29] migration: final handshake for the resume Peter Xu
2017-08-03 13:47   ` Dr. David Alan Gilbert
2017-08-04  9:05     ` Peter Xu
2017-08-04  9:53       ` Dr. David Alan Gilbert
2017-07-28  8:06 ` [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed Peter Xu
2017-08-03 13:54   ` Dr. David Alan Gilbert
2017-08-04  8:52     ` Peter Xu
2017-08-04  9:52       ` Dr. David Alan Gilbert
2017-08-07  6:57         ` Peter Xu
2017-07-28 10:06 ` [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
2017-08-03 15:57 ` Dr. David Alan Gilbert
2017-08-21  7:47   ` Peter Xu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170821074744.GA30356@pxdev.xzpeter.org \
    --to=peterx@redhat.com \
    --cc=a.perevalov@samsung.com \
    --cc=aarcange@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.