linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Stefan Hajnoczi <stefanha@gmail.com>,
	wenchao <wenchaolinux@gmail.com>, Mel Gorman <mgorman@suse.de>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	hughd@google.com, walken@google.com,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	kirill.shutemov@linux.intel.com,
	Anthony Liguori <anthony@codemonkey.ws>,
	KVM <kvm@vger.kernel.org>
Subject: Re: [RFC PATCH V1 0/6] mm: add a new option MREMAP_DUP to mmrep syscall
Date: Mon, 30 Dec 2013 18:23:42 -0200	[thread overview]
Message-ID: <20131230202342.GA7973@amt.cnet> (raw)
In-Reply-To: <52AFE828.3010500@linux.vnet.ibm.com>

On Tue, Dec 17, 2013 at 01:59:04PM +0800, Xiao Guangrong wrote:
> 
> CCed KVM guys.
> 
> On 05/10/2013 01:11 PM, Stefan Hajnoczi wrote:
> > On Fri, May 10, 2013 at 4:28 AM, wenchao <wenchaolinux@gmail.com> wrote:
> >> ao? 2013-5-9 22:13, Mel Gorman a??e??:
> >>
> >>> On Thu, May 09, 2013 at 05:50:05PM +0800, wenchaolinux@gmail.com wrote:
> >>>>
> >>>> From: Wenchao Xia <wenchaolinux@gmail.com>
> >>>>
> >>>>    This serial try to enable mremap syscall to cow some private memory
> >>>> region,
> >>>> just like what fork() did. As a result, user space application would got
> >>>> a
> >>>> mirror of those region, and it can be used as a snapshot for further
> >>>> processing.
> >>>>
> >>>
> >>> What not just fork()? Even if the application was threaded it should be
> >>> managable to handle fork just for processing the private memory region
> >>> in question. I'm having trouble figuring out what sort of application
> >>> would require an interface like this.
> >>>
> >>   It have some troubles: parent - child communication, sometimes
> >> page copy.
> >>   I'd like to snapshot qemu guest's RAM, currently solution is:
> >> 1) fork()
> >> 2) pipe guest RAM data from child to parent.
> >> 3) parent write down the contents.
> >>
> >>   To avoid complex communication for data control, and file content
> >> protecting, So let parent instead of child handling the data with
> >> a pipe, but this brings additional copy(). I think an explicit API
> >> cow mapping an memory region inside one process, could avoid it,
> >> and faster and cow less pages, also make user space code nicer.
> > 
> > A new Linux-specific API is not portable and not available on existing
> > hosts.  Since QEMU supports non-Linux host operating systems the
> > fork() approach is preferable.
> > 
> > If you're worried about the memory copy - which should be benchmarked
> > - then vmsplice(2) can be used in the child process and splice(2) can
> > be used in the parent.  It probably doesn't help though since QEMU
> > scans RAM pages to find all-zero pages before sending them over the
> > socket, and at that point the memory copy might not make much
> > difference.
> > 
> > Perhaps other applications can use this new flag better, but for QEMU
> > I think fork()'s portability is more important than the convenience of
> > accessing the CoW pages in the same process.
> 
> Yup, I agree with you that the new syscall sometimes is not a good solution.
> 
> Currently, we're working on live-update[1] that will be enabled on Qemu firstly,
> this feature let the guest run on the new Qemu binary smoothly without
> restart, it's good for us to do security-update.
> 
> In this case, we need to move the guest memory on old qemu instance to the
> new one, fork() can not help because we need to exec() a new instance, after
> that all memory mapping will be destroyed.
> 
> We tried to enable SPLICE_F_MOVE[2] for vmsplice() to move the memory without
> memory-copy but the performance isn't so good as we expected: it's due to
> some limitations: the page-size, lock, message-size limitation on pipe, etc.
> Of course, we will continue to improve this, but wenchao's patch seems a new
> direction for us.
> 
> To coordinate with your fork() approach, maybe we can introduce a new flag
> for VMA, something like: VM_KEEP_ONEXEC, to tell exec() to do not destroy
> this VMA. How about this or you guy have new idea? Really appreciate for your
> suggestion.
> 
> [1] http://marc.info/?l=qemu-devel&m=138597598700844&w=2
> [2] https://lkml.org/lkml/2013/10/25/285

Hi,

What is the purpose of snapshotting guest RAM here, in the context of
local migration?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-12-30 20:38 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-09  9:50 [RFC PATCH V1 0/6] mm: add a new option MREMAP_DUP to mmrep syscall wenchaolinux
2013-05-09  9:50 ` [RFC PATCH V1 1/6] mm: add parameter remove_old in move_huge_pmd() wenchaolinux
2013-05-09  9:50 ` [RFC PATCH V1 2/6] mm : allow copy between different addresses for copy_one_pte() wenchaolinux
2013-05-09  9:50 ` [RFC PATCH V1 3/6] mm : export rss vec helper functions wenchaolinux
2013-05-09  9:50 ` [RFC PATCH V1 4/6] mm : export is_cow_mapping() wenchaolinux
2013-05-09  9:50 ` [RFC PATCH V1 5/6] mm : add parameter remove_old in move_page_tables wenchaolinux
2013-05-09  9:50 ` [RFC PATCH V1 6/6] mm : add new option MREMAP_DUP to mremap() syscall wenchaolinux
2013-05-09 14:13 ` [RFC PATCH V1 0/6] mm: add a new option MREMAP_DUP to mmrep syscall Mel Gorman
2013-05-10  2:28   ` wenchao
2013-05-10  5:11     ` Stefan Hajnoczi
2013-12-17  5:59       ` Xiao Guangrong
2013-12-30 20:23         ` Marcelo Tosatti [this message]
2013-12-31 12:06           ` Xiao Guangrong
2013-12-31 18:53             ` Marcelo Tosatti
2014-01-06  7:41               ` Xiao Guangrong
2013-05-10  9:22     ` Kirill A. Shutemov
2013-05-11 14:16       ` Pavel Emelyanov
2013-05-13  2:40         ` wenchao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131230202342.GA7973@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anthony@codemonkey.ws \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=stefanha@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=walken@google.com \
    --cc=wenchaolinux@gmail.com \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).