From: Andrea Arcangeli <aarcange@redhat.com>
To: Avi Kivity <avi@redhat.com>
Cc: Isaku Yamahata <yamahata@valinux.co.jp>,
t.hirofuchi@aist.go.jp, qemu-devel@nongnu.org,
kvm@vger.kernel.org, satoshi.itoh@aist.go.jp
Subject: Re: [Qemu-devel] [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy
Date: Mon, 2 Jan 2012 18:05:51 +0100 [thread overview]
Message-ID: <20120102170551.GF4172@redhat.com> (raw)
In-Reply-To: <4EFC8EE9.9030802@redhat.com>
On Thu, Dec 29, 2011 at 06:01:45PM +0200, Avi Kivity wrote:
> On 12/29/2011 06:00 PM, Avi Kivity wrote:
> > The NFS client has exactly the same issue, if you mount it with the intr
> > option. In fact you could use the NFS client as a trivial umem/cuse
> > prototype.
>
> Actually, NFS can return SIGBUS, it doesn't care about restarting daemons.
During KVMForum I suggested to a few people that it could be done
entirely in userland with PROT_NONE. So the problem is if we do it in
userland with the current functionality you'll run out of VMAs and
slowdown performance too much.
But all you need is the ability to map single pages in the address
space. The only special requirement is that a new vma must not be
created during the map operation. It'd be very similar to
remap_file_pages for MAP_SHARED, it also was created to avoid having
to create new vmas on a large MAP_SHARED mapping and no other reason
at all. In our case we deal with a large MAP_ANONYMOUS mapping and we
must alter the pte without creating new vmas but the problem is very
similar to remap_file_pages.
Qemu in the dst node can do:
mmap(MAP_ANONYMOUS....)
fault_area_prepare(start, end, signalnr)
prepare_fault_area will map the range with the magic pte.
Then when the signalnr fires, you do:
send(givemepageX)
recv(&tmpaddr_aligned, PAGE_SIZE,...);
fault_area_map(final_dest_aligned, tmpaddr_aligned, size)
map_fault_area will check the pgprot of the two vmas mapping
final_dest_aligned and tmpaddr_aligned have the same vma->vm_pgprot
and various other vma bits, and if all ok, it'll just copy the pte
from tmpaddr_aligned, to final_dest_aligned and it'll update the
page->index. It can fail if the page is shared to avoid dealing with
the non-linearity of the page mapped in multiple vmas.
You basically need a bypass to avoid altering the pgprot of the vma,
and enter into the pte a "magic" thing that fires signal handlers
if accessed, without having to create new vmas. gup/gup_fast and stuff
should just always fallback into handle_mm_fault when encountering such a
thing, so returning failure as if gup_fast was run on a address beyond
the end of the i_size in the MAP_SHARED case.
THP already works on /dev/zero mmaps as long as it's a MAP_PRIVATE,
KSM should work too but I doubt anybody tested it on MAP_PRIVATE of
/dev/zero.
The device driver provides an advantage in being self contained but I
doubt it's simpler. I suppose after migration is complete you'll still
switch the vma back to regular anonymous vma so leading to the same
result?
The patch 2/2 is small and self contained so it's quite attractive, I
didn't see patch 1/2, was it posted?
Thanks,
Andrea
next prev parent reply other threads:[~2012-01-02 17:05 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-29 1:26 [Qemu-devel] [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy Isaku Yamahata
2011-12-29 1:26 ` [Qemu-devel] [PATCH 1/2] export necessary symbols Isaku Yamahata
2011-12-29 1:26 ` [Qemu-devel] [PATCH 2/2] umem: chardevice for kvm postcopy Isaku Yamahata
2011-12-29 11:17 ` Avi Kivity
2011-12-29 12:22 ` Isaku Yamahata
2011-12-29 12:47 ` Avi Kivity
2012-01-05 4:08 ` [Qemu-devel] 回复: " thfbjyddx
2012-01-05 10:48 ` [Qemu-devel] 回??: " Isaku Yamahata
2012-01-05 11:10 ` Tommy
2012-01-05 12:18 ` Isaku Yamahata
2012-01-05 15:02 ` Tommy Tang
[not found] ` <4F05BB68.9050302@hotmail.com>
2012-01-05 15:05 ` Tommy Tang
2012-01-06 7:02 ` thfbjyddx
2012-01-06 17:13 ` [Qemu-devel] 回??: [PATCH 2/2] umem: chardevice for kvm?postcopy Isaku Yamahata
2011-12-29 1:31 ` [Qemu-devel] [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy Isaku Yamahata
2011-12-29 11:24 ` Avi Kivity
2011-12-29 12:39 ` Isaku Yamahata
2011-12-29 12:55 ` Avi Kivity
2011-12-29 13:49 ` Isaku Yamahata
2011-12-29 13:52 ` Avi Kivity
2011-12-29 14:18 ` Isaku Yamahata
2011-12-29 14:35 ` Avi Kivity
2011-12-29 14:49 ` Isaku Yamahata
2011-12-29 14:55 ` Avi Kivity
2011-12-29 15:53 ` Isaku Yamahata
2011-12-29 16:00 ` Avi Kivity
2011-12-29 16:01 ` Avi Kivity
2012-01-02 17:05 ` Andrea Arcangeli [this message]
2012-01-02 17:55 ` Paolo Bonzini
2012-01-03 14:25 ` Andrea Arcangeli
2012-01-12 13:57 ` Avi Kivity
2012-01-13 2:06 ` Andrea Arcangeli
2012-01-04 3:03 ` Isaku Yamahata
2012-01-12 13:59 ` Avi Kivity
2012-01-13 1:09 ` Benoit Hudzia
2012-01-13 1:31 ` Takuya Yoshikawa
2012-01-13 9:40 ` Benoit Hudzia
2012-01-13 2:03 ` Isaku Yamahata
2012-01-13 2:15 ` Isaku Yamahata
2012-01-13 9:55 ` Benoit Hudzia
2012-01-13 9:48 ` Benoit Hudzia
2012-01-13 2:09 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120102170551.GF4172@redhat.com \
--to=aarcange@redhat.com \
--cc=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
--cc=satoshi.itoh@aist.go.jp \
--cc=t.hirofuchi@aist.go.jp \
--cc=yamahata@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).