All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org,
	t.hirofuchi@aist.go.jp, satoshi.itoh@aist.go.jp,
	Andrea Arcangeli <aarcange@redhat.com>,
	Stefan Hajnoczi <stefanha@gmail.com>, Dor Laor <dlaor@redhat.com>,
	Yaniv Kaul <ykaul@redhat.com>,
	Anthony Liguori <anthony@codemonkey.ws>,
	"Nadav Har'El" <nyh@math.technion.ac.il>
Subject: Re: [PATCH][RFC] post copy chardevice (was Re: [RFC] postcopy livemigration proposal)
Date: Tue, 16 Aug 2011 06:40:35 -0700	[thread overview]
Message-ID: <4E4A7353.9030708@redhat.com> (raw)
In-Reply-To: <20110816014226.GJ13791@valinux.co.jp>

On 08/15/2011 06:42 PM, Isaku Yamahata wrote:
> On Mon, Aug 15, 2011 at 12:29:37PM -0700, Avi Kivity wrote:
> >  On 08/12/2011 04:07 AM, Isaku Yamahata wrote:
> >>  This is a character device to hook page access.
> >>  The page fault in the area is reported to another user process by
> >>  this chardriver. Then, the process fills the page contents and
> >>  resolves the page fault.
> >
> >  Have you considered CUSE (character device in userspace, fs/fuse/cuse.c)?
>
> By looking at dev.c and cuse.c, it doesn't seem to support mmap and
> fault handler.

If performance is sufficient, this would be the preferred path.  Enhance 
an existing API which can be useful to others, rather than add a new one.

> >>  +
> >>  +struct kvm_vmem_make_pages_present {
> >>  +	__u32 nr;
> >>  +	struct kvm_vmem_page_range __user *ranges;
> >>  +};
> >
> >  This is madvise(MADV_WILLNEED), is it not?
>
> Another process, not qemu process, issues it,
> and it make the pages are present in qemu process address space.

That process just issues these calls in a loop until all memory is 
present, yes? it seems those few lines could be easily added to qemu.

>
>
> >  Can you explain these in some more detail?
>
>
> KVM_CRATE_VMEM_DEV: create vmem-dev device from kvm device
>                      for qemu
> KVM_CREATE_VMEM: create vmem device from vmem-dev device.
>                   (note:qemu creates more than one memory region.)
>
>
> KVM_VMEM_WAIT_READY: wait for KVM_VMEM_READY
>                       for qemu
> KVM_VMEM_READY: unblock KVM_VMEM_WAIT_READY
>                  for daemon uses
> These are for qemu and daemon to synchronise to enter postcopy stage.

This are eliminated if we fold the daemon into qemu.  Also, could just a 
semaphore or other synchronization mechanism.

>
> KVM_VMEM_GET_PAGE_REQUEST: retrieve page fault of qemu process

Equivalent to the fault callback of CUSE (if we add it)?

> KVM_VMEM_MARK_PAGE_CACHED: mark the specified pages pulled from the source
>                             for daemon

Equivalent to returning from that callback with a new page?

> KVM_VMEM_MAKE_PAGES_PRESENT: make the specified pages present in qemu
>                               virtual address space
>                               for daemon uses
> KVM_VMEM_MAKE_VMA_ANONYMOUS: make the specified vma in the qemu process
>                               anonymous
> 			     I'm not sure whether this can be implemented
>                               or not.
>
> I think The following the work flow on the destination helps.
>
>          qemu on the destination
>                |
>                V
>          open(/dev/kvm)
>                |
>                V
>          KVM_CREATE_VMEM_DEV
>                |
>                V
>          Here we have two file descriptors to
>          vmem device and shmem file
>                |
>                |
>                |                                  daemon on the destination
>                V
>          fork()---------------------------------------,
>                |                                      |
>                V                                      |
>          close(socket)                                V
>          close(shmem)                              mmap(shmem file)
>                |                                      |
>                V                                      V
>          mmap(vmem device) for guest RAM           close(shmem file)
>                |                                      |
>                V                                      |
>          KVM_VMEM_READY_WAIT<---------------------KVM_VMEM_READY
>                |                                      |
>                V                                      |
>          close(vmem device)                        Here the daemon takes over
>                |                                   the owner of the socket
>          entering post copy stage                  to the source
>          start guest execution                        |
>                |                                      |
>                V                                      V
>          access guest RAM                          KVM_VMEM_GET_PAGE_REQUEST
>                |                                      |
>                V                                      V
>          page fault ------------------------------>page offset is returned
>          block                                        |
>                                                       V
>                                                    pull page from the source
>                                                    write the page contents
>                                                    to the shmem.
>                                                       |
>                                                       V
>          unblock<-----------------------------KVM_VMEM_MARK_PAGE_CACHED
>          the fault handler returns the page
>          page fault is resolved
>                |
>                |                                   pages can be pulled
>                |                                   backgroundly
>                |                                      |
>                |                                      V
>                |                                   KVM_VMEM_MARK_PAGE_CACHED
>                |                                      |
>                V                                      V
>          The specified pages<----------------------KVM_VMEM_MAKE_PAGES_PRESENT
>          are made present                             |
>          so future page fault is avoided.             |
>                |                                      |
>                V                                      V
>
>                   all the pages are pulled from the source
>
>                |                                      |
>                V                                      V
>          the vma becomes anonymous<----------------KVM_VMEM_MAKE_VMA_ANONYMOUS
>         (note: I'm not sure if this can be implemented or not)
>                |                                      |
>                V                                      V
>          migration completes                        exit()
>

Yes, thanks, this was very helpful.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


WARNING: multiple messages have this Message-ID (diff)
From: Avi Kivity <avi@redhat.com>
To: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Nadav Har'El <nyh@math.technion.ac.il>,
	kvm@vger.kernel.org, satoshi.itoh@aist.go.jp,
	Stefan Hajnoczi <stefanha@gmail.com>,
	t.hirofuchi@aist.go.jp, Dor Laor <dlaor@redhat.com>,
	qemu-devel@nongnu.org, Yaniv Kaul <ykaul@redhat.com>
Subject: Re: [Qemu-devel] [PATCH][RFC] post copy chardevice (was Re: [RFC] postcopy livemigration proposal)
Date: Tue, 16 Aug 2011 06:40:35 -0700	[thread overview]
Message-ID: <4E4A7353.9030708@redhat.com> (raw)
In-Reply-To: <20110816014226.GJ13791@valinux.co.jp>

On 08/15/2011 06:42 PM, Isaku Yamahata wrote:
> On Mon, Aug 15, 2011 at 12:29:37PM -0700, Avi Kivity wrote:
> >  On 08/12/2011 04:07 AM, Isaku Yamahata wrote:
> >>  This is a character device to hook page access.
> >>  The page fault in the area is reported to another user process by
> >>  this chardriver. Then, the process fills the page contents and
> >>  resolves the page fault.
> >
> >  Have you considered CUSE (character device in userspace, fs/fuse/cuse.c)?
>
> By looking at dev.c and cuse.c, it doesn't seem to support mmap and
> fault handler.

If performance is sufficient, this would be the preferred path.  Enhance 
an existing API which can be useful to others, rather than add a new one.

> >>  +
> >>  +struct kvm_vmem_make_pages_present {
> >>  +	__u32 nr;
> >>  +	struct kvm_vmem_page_range __user *ranges;
> >>  +};
> >
> >  This is madvise(MADV_WILLNEED), is it not?
>
> Another process, not qemu process, issues it,
> and it make the pages are present in qemu process address space.

That process just issues these calls in a loop until all memory is 
present, yes? it seems those few lines could be easily added to qemu.

>
>
> >  Can you explain these in some more detail?
>
>
> KVM_CRATE_VMEM_DEV: create vmem-dev device from kvm device
>                      for qemu
> KVM_CREATE_VMEM: create vmem device from vmem-dev device.
>                   (note:qemu creates more than one memory region.)
>
>
> KVM_VMEM_WAIT_READY: wait for KVM_VMEM_READY
>                       for qemu
> KVM_VMEM_READY: unblock KVM_VMEM_WAIT_READY
>                  for daemon uses
> These are for qemu and daemon to synchronise to enter postcopy stage.

This are eliminated if we fold the daemon into qemu.  Also, could just a 
semaphore or other synchronization mechanism.

>
> KVM_VMEM_GET_PAGE_REQUEST: retrieve page fault of qemu process

Equivalent to the fault callback of CUSE (if we add it)?

> KVM_VMEM_MARK_PAGE_CACHED: mark the specified pages pulled from the source
>                             for daemon

Equivalent to returning from that callback with a new page?

> KVM_VMEM_MAKE_PAGES_PRESENT: make the specified pages present in qemu
>                               virtual address space
>                               for daemon uses
> KVM_VMEM_MAKE_VMA_ANONYMOUS: make the specified vma in the qemu process
>                               anonymous
> 			     I'm not sure whether this can be implemented
>                               or not.
>
> I think The following the work flow on the destination helps.
>
>          qemu on the destination
>                |
>                V
>          open(/dev/kvm)
>                |
>                V
>          KVM_CREATE_VMEM_DEV
>                |
>                V
>          Here we have two file descriptors to
>          vmem device and shmem file
>                |
>                |
>                |                                  daemon on the destination
>                V
>          fork()---------------------------------------,
>                |                                      |
>                V                                      |
>          close(socket)                                V
>          close(shmem)                              mmap(shmem file)
>                |                                      |
>                V                                      V
>          mmap(vmem device) for guest RAM           close(shmem file)
>                |                                      |
>                V                                      |
>          KVM_VMEM_READY_WAIT<---------------------KVM_VMEM_READY
>                |                                      |
>                V                                      |
>          close(vmem device)                        Here the daemon takes over
>                |                                   the owner of the socket
>          entering post copy stage                  to the source
>          start guest execution                        |
>                |                                      |
>                V                                      V
>          access guest RAM                          KVM_VMEM_GET_PAGE_REQUEST
>                |                                      |
>                V                                      V
>          page fault ------------------------------>page offset is returned
>          block                                        |
>                                                       V
>                                                    pull page from the source
>                                                    write the page contents
>                                                    to the shmem.
>                                                       |
>                                                       V
>          unblock<-----------------------------KVM_VMEM_MARK_PAGE_CACHED
>          the fault handler returns the page
>          page fault is resolved
>                |
>                |                                   pages can be pulled
>                |                                   backgroundly
>                |                                      |
>                |                                      V
>                |                                   KVM_VMEM_MARK_PAGE_CACHED
>                |                                      |
>                V                                      V
>          The specified pages<----------------------KVM_VMEM_MAKE_PAGES_PRESENT
>          are made present                             |
>          so future page fault is avoided.             |
>                |                                      |
>                V                                      V
>
>                   all the pages are pulled from the source
>
>                |                                      |
>                V                                      V
>          the vma becomes anonymous<----------------KVM_VMEM_MAKE_VMA_ANONYMOUS
>         (note: I'm not sure if this can be implemented or not)
>                |                                      |
>                V                                      V
>          migration completes                        exit()
>

Yes, thanks, this was very helpful.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

  reply	other threads:[~2011-08-16 13:40 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-08  3:24 [RFC] postcopy livemigration proposal Isaku Yamahata
2011-08-08  3:24 ` [Qemu-devel] " Isaku Yamahata
2011-08-08  9:20 ` Dor Laor
2011-08-08  9:20   ` [Qemu-devel] " Dor Laor
2011-08-08  9:40   ` Yaniv Kaul
2011-08-08  9:40     ` [Qemu-devel] " Yaniv Kaul
2011-08-08 21:42     ` Anthony Liguori
2011-08-08 21:42       ` Anthony Liguori
2011-08-08 10:59   ` Nadav Har'El
2011-08-08 10:59     ` [Qemu-devel] " Nadav Har'El
2011-08-08 11:47     ` Dor Laor
2011-08-08 11:47       ` [Qemu-devel] " Dor Laor
2011-08-08 16:52       ` Cleber Rosa
2011-08-08 15:52         ` Anthony Liguori
2011-08-08 12:32   ` Anthony Liguori
2011-08-08 12:32     ` [Qemu-devel] " Anthony Liguori
2011-08-08 15:11     ` Dor Laor
2011-08-08 15:11       ` Dor Laor
2011-08-08 15:29       ` Anthony Liguori
2011-08-08 15:29         ` Anthony Liguori
2011-08-08 15:36         ` Avi Kivity
2011-08-08 15:36           ` [Qemu-devel] " Avi Kivity
2011-08-08 15:59           ` Anthony Liguori
2011-08-08 15:59             ` Anthony Liguori
2011-08-08 19:47             ` Dor Laor
2011-08-08 19:47               ` [Qemu-devel] " Dor Laor
2011-08-09  2:07               ` Isaku Yamahata
2011-08-09  2:07                 ` Isaku Yamahata
2011-08-08  9:38 ` Stefan Hajnoczi
2011-08-08  9:38   ` Stefan Hajnoczi
2011-08-08  9:43   ` Isaku Yamahata
2011-08-08  9:43     ` Isaku Yamahata
2011-08-08 12:38 ` Avi Kivity
2011-08-08 12:38   ` [Qemu-devel] " Avi Kivity
2011-08-09  2:33   ` Isaku Yamahata
2011-08-09  2:33     ` [Qemu-devel] " Isaku Yamahata
2011-08-10 13:55     ` Avi Kivity
2011-08-10 13:55       ` [Qemu-devel] " Avi Kivity
2011-08-11  2:19       ` Isaku Yamahata
2011-08-11  2:19         ` [Qemu-devel] " Isaku Yamahata
2011-08-11 16:55         ` Andrea Arcangeli
2011-08-11 16:55           ` [Qemu-devel] " Andrea Arcangeli
2011-08-12 11:07 ` [PATCH][RFC] post copy chardevice (was Re: [RFC] postcopy livemigration proposal) Isaku Yamahata
2011-08-12 11:07   ` [Qemu-devel] " Isaku Yamahata
2011-08-12 11:09   ` Isaku Yamahata
2011-08-12 11:09     ` [Qemu-devel] " Isaku Yamahata
2011-08-12 21:26   ` Blue Swirl
2011-08-12 21:26     ` Blue Swirl
2011-08-15 19:29   ` Avi Kivity
2011-08-15 19:29     ` [Qemu-devel] " Avi Kivity
2011-08-16  1:42     ` Isaku Yamahata
2011-08-16  1:42       ` [Qemu-devel] " Isaku Yamahata
2011-08-16 13:40       ` Avi Kivity [this message]
2011-08-16 13:40         ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E4A7353.9030708@redhat.com \
    --to=avi@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=dlaor@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=nyh@math.technion.ac.il \
    --cc=qemu-devel@nongnu.org \
    --cc=satoshi.itoh@aist.go.jp \
    --cc=stefanha@gmail.com \
    --cc=t.hirofuchi@aist.go.jp \
    --cc=yamahata@valinux.co.jp \
    --cc=ykaul@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.