linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Cc: Stefan Hajnoczi <stefanha@gmail.com>,
	wenchao <wenchaolinux@gmail.com>, Mel Gorman <mgorman@suse.de>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	hughd@google.com, walken@google.com,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	kirill.shutemov@linux.intel.com,
	Anthony Liguori <anthony@codemonkey.ws>,
	KVM <kvm@vger.kernel.org>
Subject: Re: [RFC PATCH V1 0/6] mm: add a new option MREMAP_DUP to mmrep syscall
Date: Tue, 31 Dec 2013 16:53:29 -0200	[thread overview]
Message-ID: <20131231185328.GA22414@amt.cnet> (raw)
In-Reply-To: <943AC3BD-C4EB-4B6C-BE34-AB921938AAF0@linux.vnet.ibm.com>

On Tue, Dec 31, 2013 at 08:06:51PM +0800, Xiao Guangrong wrote:
> 
> On Dec 31, 2013, at 4:23 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> 
> > On Tue, Dec 17, 2013 at 01:59:04PM +0800, Xiao Guangrong wrote:
> >> 
> >> CCed KVM guys.
> >> 
> >> On 05/10/2013 01:11 PM, Stefan Hajnoczi wrote:
> >>> On Fri, May 10, 2013 at 4:28 AM, wenchao <wenchaolinux@gmail.com> wrote:
> >>>> ao? 2013-5-9 22:13, Mel Gorman a??e??:
> >>>> 
> >>>>> On Thu, May 09, 2013 at 05:50:05PM +0800, wenchaolinux@gmail.com wrote:
> >>>>>> 
> >>>>>> From: Wenchao Xia <wenchaolinux@gmail.com>
> >>>>>> 
> >>>>>>  This serial try to enable mremap syscall to cow some private memory
> >>>>>> region,
> >>>>>> just like what fork() did. As a result, user space application would got
> >>>>>> a
> >>>>>> mirror of those region, and it can be used as a snapshot for further
> >>>>>> processing.
> >>>>>> 
> >>>>> 
> >>>>> What not just fork()? Even if the application was threaded it should be
> >>>>> managable to handle fork just for processing the private memory region
> >>>>> in question. I'm having trouble figuring out what sort of application
> >>>>> would require an interface like this.
> >>>>> 
> >>>> It have some troubles: parent - child communication, sometimes
> >>>> page copy.
> >>>> I'd like to snapshot qemu guest's RAM, currently solution is:
> >>>> 1) fork()
> >>>> 2) pipe guest RAM data from child to parent.
> >>>> 3) parent write down the contents.
> >>>> 
> >>>> To avoid complex communication for data control, and file content
> >>>> protecting, So let parent instead of child handling the data with
> >>>> a pipe, but this brings additional copy(). I think an explicit API
> >>>> cow mapping an memory region inside one process, could avoid it,
> >>>> and faster and cow less pages, also make user space code nicer.
> >>> 
> >>> A new Linux-specific API is not portable and not available on existing
> >>> hosts.  Since QEMU supports non-Linux host operating systems the
> >>> fork() approach is preferable.
> >>> 
> >>> If you're worried about the memory copy - which should be benchmarked
> >>> - then vmsplice(2) can be used in the child process and splice(2) can
> >>> be used in the parent.  It probably doesn't help though since QEMU
> >>> scans RAM pages to find all-zero pages before sending them over the
> >>> socket, and at that point the memory copy might not make much
> >>> difference.
> >>> 
> >>> Perhaps other applications can use this new flag better, but for QEMU
> >>> I think fork()'s portability is more important than the convenience of
> >>> accessing the CoW pages in the same process.
> >> 
> >> Yup, I agree with you that the new syscall sometimes is not a good solution.
> >> 
> >> Currently, we're working on live-update[1] that will be enabled on Qemu firstly,
> >> this feature let the guest run on the new Qemu binary smoothly without
> >> restart, it's good for us to do security-update.
> >> 
> >> In this case, we need to move the guest memory on old qemu instance to the
> >> new one, fork() can not help because we need to exec() a new instance, after
> >> that all memory mapping will be destroyed.
> >> 
> >> We tried to enable SPLICE_F_MOVE[2] for vmsplice() to move the memory without
> >> memory-copy but the performance isn't so good as we expected: it's due to
> >> some limitations: the page-size, lock, message-size limitation on pipe, etc.
> >> Of course, we will continue to improve this, but wenchao's patch seems a new
> >> direction for us.
> >> 
> >> To coordinate with your fork() approach, maybe we can introduce a new flag
> >> for VMA, something like: VM_KEEP_ONEXEC, to tell exec() to do not destroy
> >> this VMA. How about this or you guy have new idea? Really appreciate for your
> >> suggestion.
> >> 
> >> [1] http://marc.info/?l=qemu-devel&m=138597598700844&w=2
> >> [2] https://lkml.org/lkml/2013/10/25/285
> > 
> > Hi,
> > 
> 
> Hi Marcelo,
> 
> 
> > What is the purpose of snapshotting guest RAM here, in the context of
> > local migration?
> 
> RAM-shapshotting and local-migration are on the different ways.
> Why i asked for your guya??s suggestion here is  beacuse i  thought
> they need do a same thing that moves memory from one process
> to another in a efficient way. Your idea? :)

Another possibility is to use memory that is not anonymous for guest
RAM, such as hugetlbfs or tmpfs. 

IIRC ksm and thp have limitations wrt tmpfs.

Still curious about RAM snapshotting.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-12-31 19:38 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-09  9:50 [RFC PATCH V1 0/6] mm: add a new option MREMAP_DUP to mmrep syscall wenchaolinux
2013-05-09  9:50 ` [RFC PATCH V1 1/6] mm: add parameter remove_old in move_huge_pmd() wenchaolinux
2013-05-09  9:50 ` [RFC PATCH V1 2/6] mm : allow copy between different addresses for copy_one_pte() wenchaolinux
2013-05-09  9:50 ` [RFC PATCH V1 3/6] mm : export rss vec helper functions wenchaolinux
2013-05-09  9:50 ` [RFC PATCH V1 4/6] mm : export is_cow_mapping() wenchaolinux
2013-05-09  9:50 ` [RFC PATCH V1 5/6] mm : add parameter remove_old in move_page_tables wenchaolinux
2013-05-09  9:50 ` [RFC PATCH V1 6/6] mm : add new option MREMAP_DUP to mremap() syscall wenchaolinux
2013-05-09 14:13 ` [RFC PATCH V1 0/6] mm: add a new option MREMAP_DUP to mmrep syscall Mel Gorman
2013-05-10  2:28   ` wenchao
2013-05-10  5:11     ` Stefan Hajnoczi
2013-12-17  5:59       ` Xiao Guangrong
2013-12-30 20:23         ` Marcelo Tosatti
2013-12-31 12:06           ` Xiao Guangrong
2013-12-31 18:53             ` Marcelo Tosatti [this message]
2014-01-06  7:41               ` Xiao Guangrong
2013-05-10  9:22     ` Kirill A. Shutemov
2013-05-11 14:16       ` Pavel Emelyanov
2013-05-13  2:40         ` wenchao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131231185328.GA22414@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anthony@codemonkey.ws \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=stefanha@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=walken@google.com \
    --cc=wenchaolinux@gmail.com \
    --cc=xiaoguangrong.eric@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).