qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Christopher Covington <cov@codeaurora.org>
Cc: Robert Love <rlove@google.com>, Dave Hansen <dave@sr71.net>,
	Jan Kara <jack@suse.cz>,
	kvm@vger.kernel.org, Neil Brown <neilb@suse.de>,
	Stefan Hajnoczi <stefanha@gmail.com>,
	qemu-devel@nongnu.org, crml <criu@openvz.org>,
	linux-mm@kvack.org, KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	Michel Lespinasse <walken@google.com>,
	Taras Glek <tglek@mozilla.com>,
	Juan Quintela <quintela@redhat.com>,
	Hugh Dickins <hughd@google.com>,
	Isaku Yamahata <yamahata@valinux.co.jp>,
	Mel Gorman <mgorman@suse.de>,
	Android Kernel Team <kernel-team@android.com>,
	Andrew Jones <drjones@redhat.com>, Mel Gorman <mel@csn.ul.ie>,
	"Huangpeng (Peter)" <peter.huangpeng@huawei.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Anthony Liguori <anthony@codemonkey.ws>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Keith Packard <keithp@keithp.com>,
	Wenchao Xia <wenchaoqemu@gmail.com>,
	linux-kernel@vger.kernel.org, Minchan Kim <minchan@kernel.org>,
	Dmitry Adamushko <dmitry.adamushko@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mike Hommey <mh@glandium.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [Qemu-devel] [PATCH 00/10] RFC: userfault
Date: Thu, 3 Jul 2014 16:08:53 +0200	[thread overview]
Message-ID: <20140703140853.GG21667@redhat.com> (raw)
In-Reply-To: <53B55E63.7080309@codeaurora.org>

Hi Christopher,

On Thu, Jul 03, 2014 at 09:45:07AM -0400, Christopher Covington wrote:
> CRIU uses the soft dirty bit in /proc/pid/clear_refs and /proc/pid/pagemap to
> implement its pre-copy memory migration.
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/soft-dirty.txt
> 
> Would it make sense to use a similar interaction model of peeking and poking
> at /proc/pid/ files for post-copy memory migration facilities?

We plan to use the pagemap information to optimize precopy live
migration, but that's orthogonal with postcopy live migration.

We already combine precopy and postcopy live migration.

In addition to the dirty bit tracking with softdirty clear_refs
feature, the pagemap bits can also tell for example which pages are
missing in the source node, instead of the current memcmp(0) that
avoids to transfer zero pages. With pagemap we can skip a superfluous
zero page fault (David suggested this).

Postcopy live migration poses a different problem. And without
postcopy there's no way to migrate 100GByte guests with heavy load
inside them, in fact even the first "optimistic" precopy pass should
only migrate those pages that already got the dirty bit set by the
time we attempt to send them.

With postcopy we can also guarantee that the maximum amount of data
transferred during precopy+postcopy is twice the size of the guest. So
you know exactly the maximum time live migration will take depending
on your network bandwidth and it cannot fail no matter the load or the
size of the guest. Slowing down the guest with autoconverge isn't
needed anymore.

The userfault only happens in the destination node. The problem we
face is that we must start the guest in the destination node despite
significant amount of its memory is still in the source node.

With postcopy migration the pages aren't dirty nor present in the
destination node, they're just holes, and in fact we already exactly
know which are missing without having to check pagemap.

It's up to the guest OS which pages it decides to touch, we cannot
know. We already know where are holes, we don't know if the guest will
touch the holes during its runtime while the memory is still
externalized.

If the guest touches any hole we need to stop the guest somehow and we
must be let know immediately so we transfer the page, fill the hole,
and let it continue ASAP.

pagemap/clear_refs can't stop the guest and let us know immediately
about the fact the guest touched a hole.

It's not just about the guest shadow mmu accesses, it could also be
O_DIRECT from qemu that triggers the fault and in that case GUP stops,
we fill the hole and then GUP and O_DIRECT succeeds without even
noticing it has been stopped by an userfault.

Thanks,
Andrea

  reply	other threads:[~2014-07-03 14:10 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-02 16:50 [Qemu-devel] [PATCH 00/10] RFC: userfault Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] [PATCH 01/10] mm: madvise MADV_USERFAULT: prepare vm_flags to allow more than 32bits Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] [PATCH 02/10] mm: madvise MADV_USERFAULT Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] [PATCH 03/10] mm: PT lock: export double_pt_lock/unlock Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] [PATCH 04/10] mm: rmap preparation for remap_anon_pages Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] [PATCH 05/10] mm: swp_entry_swapcount Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] [PATCH 06/10] mm: sys_remap_anon_pages Andrea Arcangeli
2014-07-04 11:30   ` Michael Kerrisk
2014-07-02 16:50 ` [Qemu-devel] [PATCH 07/10] waitqueue: add nr wake parameter to __wake_up_locked_key Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] [PATCH 08/10] userfaultfd: add new syscall to provide memory externalization Andrea Arcangeli
2014-07-03  1:56   ` Andy Lutomirski
2014-07-03 13:19     ` Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] [PATCH 09/10] userfaultfd: make userfaultfd_write non blocking Andrea Arcangeli
2014-07-02 16:50 ` [Qemu-devel] [PATCH 10/10] userfaultfd: use VM_FAULT_RETRY in handle_userfault() Andrea Arcangeli
2014-07-03  1:51 ` [Qemu-devel] [PATCH 00/10] RFC: userfault Andy Lutomirski
2014-07-03 13:45 ` Christopher Covington
2014-07-03 14:08   ` Andrea Arcangeli [this message]
2014-07-03 15:41 ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140703140853.GG21667@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anthony@codemonkey.ws \
    --cc=cov@codeaurora.org \
    --cc=criu@openvz.org \
    --cc=dave@sr71.net \
    --cc=dgilbert@redhat.com \
    --cc=dmitry.adamushko@gmail.com \
    --cc=drjones@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=keithp@keithp.com \
    --cc=kernel-team@android.com \
    --cc=kosaki.motohiro@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=mgorman@suse.de \
    --cc=mh@glandium.org \
    --cc=minchan@kernel.org \
    --cc=neilb@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=rlove@google.com \
    --cc=stefanha@gmail.com \
    --cc=tglek@mozilla.com \
    --cc=walken@google.com \
    --cc=wenchaoqemu@gmail.com \
    --cc=yamahata@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).