From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH][RFC] post copy chardevice (was Re: [RFC] postcopy livemigration proposal) Date: Tue, 16 Aug 2011 06:40:35 -0700 Message-ID: <4E4A7353.9030708@redhat.com> References: <20110808032438.GC24764@valinux.co.jp> <20110812110737.GA13791@valinux.co.jp> <4E4973A1.2040008@redhat.com> <20110816014226.GJ13791@valinux.co.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, t.hirofuchi@aist.go.jp, satoshi.itoh@aist.go.jp, Andrea Arcangeli , Stefan Hajnoczi , Dor Laor , Yaniv Kaul , Anthony Liguori , "Nadav Har'El" To: Isaku Yamahata Return-path: Received: from mx1.redhat.com ([209.132.183.28]:36219 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751377Ab1HPNk4 (ORCPT ); Tue, 16 Aug 2011 09:40:56 -0400 In-Reply-To: <20110816014226.GJ13791@valinux.co.jp> Sender: kvm-owner@vger.kernel.org List-ID: On 08/15/2011 06:42 PM, Isaku Yamahata wrote: > On Mon, Aug 15, 2011 at 12:29:37PM -0700, Avi Kivity wrote: > > On 08/12/2011 04:07 AM, Isaku Yamahata wrote: > >> This is a character device to hook page access. > >> The page fault in the area is reported to another user process by > >> this chardriver. Then, the process fills the page contents and > >> resolves the page fault. > > > > Have you considered CUSE (character device in userspace, fs/fuse/cuse.c)? > > By looking at dev.c and cuse.c, it doesn't seem to support mmap and > fault handler. If performance is sufficient, this would be the preferred path. Enhance an existing API which can be useful to others, rather than add a new one. > >> + > >> +struct kvm_vmem_make_pages_present { > >> + __u32 nr; > >> + struct kvm_vmem_page_range __user *ranges; > >> +}; > > > > This is madvise(MADV_WILLNEED), is it not? > > Another process, not qemu process, issues it, > and it make the pages are present in qemu process address space. That process just issues these calls in a loop until all memory is present, yes? it seems those few lines could be easily added to qemu. > > > > Can you explain these in some more detail? > > > KVM_CRATE_VMEM_DEV: create vmem-dev device from kvm device > for qemu > KVM_CREATE_VMEM: create vmem device from vmem-dev device. > (note:qemu creates more than one memory region.) > > > KVM_VMEM_WAIT_READY: wait for KVM_VMEM_READY > for qemu > KVM_VMEM_READY: unblock KVM_VMEM_WAIT_READY > for daemon uses > These are for qemu and daemon to synchronise to enter postcopy stage. This are eliminated if we fold the daemon into qemu. Also, could just a semaphore or other synchronization mechanism. > > KVM_VMEM_GET_PAGE_REQUEST: retrieve page fault of qemu process Equivalent to the fault callback of CUSE (if we add it)? > KVM_VMEM_MARK_PAGE_CACHED: mark the specified pages pulled from the source > for daemon Equivalent to returning from that callback with a new page? > KVM_VMEM_MAKE_PAGES_PRESENT: make the specified pages present in qemu > virtual address space > for daemon uses > KVM_VMEM_MAKE_VMA_ANONYMOUS: make the specified vma in the qemu process > anonymous > I'm not sure whether this can be implemented > or not. > > I think The following the work flow on the destination helps. > > qemu on the destination > | > V > open(/dev/kvm) > | > V > KVM_CREATE_VMEM_DEV > | > V > Here we have two file descriptors to > vmem device and shmem file > | > | > | daemon on the destination > V > fork()---------------------------------------, > | | > V | > close(socket) V > close(shmem) mmap(shmem file) > | | > V V > mmap(vmem device) for guest RAM close(shmem file) > | | > V | > KVM_VMEM_READY_WAIT<---------------------KVM_VMEM_READY > | | > V | > close(vmem device) Here the daemon takes over > | the owner of the socket > entering post copy stage to the source > start guest execution | > | | > V V > access guest RAM KVM_VMEM_GET_PAGE_REQUEST > | | > V V > page fault ------------------------------>page offset is returned > block | > V > pull page from the source > write the page contents > to the shmem. > | > V > unblock<-----------------------------KVM_VMEM_MARK_PAGE_CACHED > the fault handler returns the page > page fault is resolved > | > | pages can be pulled > | backgroundly > | | > | V > | KVM_VMEM_MARK_PAGE_CACHED > | | > V V > The specified pages<----------------------KVM_VMEM_MAKE_PAGES_PRESENT > are made present | > so future page fault is avoided. | > | | > V V > > all the pages are pulled from the source > > | | > V V > the vma becomes anonymous<----------------KVM_VMEM_MAKE_VMA_ANONYMOUS > (note: I'm not sure if this can be implemented or not) > | | > V V > migration completes exit() > Yes, thanks, this was very helpful. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:48633) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QtJt5-0003ov-To for qemu-devel@nongnu.org; Tue, 16 Aug 2011 09:41:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QtJt1-0002ly-0l for qemu-devel@nongnu.org; Tue, 16 Aug 2011 09:40:59 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33431) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QtJt0-0002lt-NM for qemu-devel@nongnu.org; Tue, 16 Aug 2011 09:40:54 -0400 Message-ID: <4E4A7353.9030708@redhat.com> Date: Tue, 16 Aug 2011 06:40:35 -0700 From: Avi Kivity MIME-Version: 1.0 References: <20110808032438.GC24764@valinux.co.jp> <20110812110737.GA13791@valinux.co.jp> <4E4973A1.2040008@redhat.com> <20110816014226.GJ13791@valinux.co.jp> In-Reply-To: <20110816014226.GJ13791@valinux.co.jp> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH][RFC] post copy chardevice (was Re: [RFC] postcopy livemigration proposal) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Isaku Yamahata Cc: Andrea Arcangeli , Nadav Har'El , kvm@vger.kernel.org, satoshi.itoh@aist.go.jp, Stefan Hajnoczi , t.hirofuchi@aist.go.jp, Dor Laor , qemu-devel@nongnu.org, Yaniv Kaul On 08/15/2011 06:42 PM, Isaku Yamahata wrote: > On Mon, Aug 15, 2011 at 12:29:37PM -0700, Avi Kivity wrote: > > On 08/12/2011 04:07 AM, Isaku Yamahata wrote: > >> This is a character device to hook page access. > >> The page fault in the area is reported to another user process by > >> this chardriver. Then, the process fills the page contents and > >> resolves the page fault. > > > > Have you considered CUSE (character device in userspace, fs/fuse/cuse.c)? > > By looking at dev.c and cuse.c, it doesn't seem to support mmap and > fault handler. If performance is sufficient, this would be the preferred path. Enhance an existing API which can be useful to others, rather than add a new one. > >> + > >> +struct kvm_vmem_make_pages_present { > >> + __u32 nr; > >> + struct kvm_vmem_page_range __user *ranges; > >> +}; > > > > This is madvise(MADV_WILLNEED), is it not? > > Another process, not qemu process, issues it, > and it make the pages are present in qemu process address space. That process just issues these calls in a loop until all memory is present, yes? it seems those few lines could be easily added to qemu. > > > > Can you explain these in some more detail? > > > KVM_CRATE_VMEM_DEV: create vmem-dev device from kvm device > for qemu > KVM_CREATE_VMEM: create vmem device from vmem-dev device. > (note:qemu creates more than one memory region.) > > > KVM_VMEM_WAIT_READY: wait for KVM_VMEM_READY > for qemu > KVM_VMEM_READY: unblock KVM_VMEM_WAIT_READY > for daemon uses > These are for qemu and daemon to synchronise to enter postcopy stage. This are eliminated if we fold the daemon into qemu. Also, could just a semaphore or other synchronization mechanism. > > KVM_VMEM_GET_PAGE_REQUEST: retrieve page fault of qemu process Equivalent to the fault callback of CUSE (if we add it)? > KVM_VMEM_MARK_PAGE_CACHED: mark the specified pages pulled from the source > for daemon Equivalent to returning from that callback with a new page? > KVM_VMEM_MAKE_PAGES_PRESENT: make the specified pages present in qemu > virtual address space > for daemon uses > KVM_VMEM_MAKE_VMA_ANONYMOUS: make the specified vma in the qemu process > anonymous > I'm not sure whether this can be implemented > or not. > > I think The following the work flow on the destination helps. > > qemu on the destination > | > V > open(/dev/kvm) > | > V > KVM_CREATE_VMEM_DEV > | > V > Here we have two file descriptors to > vmem device and shmem file > | > | > | daemon on the destination > V > fork()---------------------------------------, > | | > V | > close(socket) V > close(shmem) mmap(shmem file) > | | > V V > mmap(vmem device) for guest RAM close(shmem file) > | | > V | > KVM_VMEM_READY_WAIT<---------------------KVM_VMEM_READY > | | > V | > close(vmem device) Here the daemon takes over > | the owner of the socket > entering post copy stage to the source > start guest execution | > | | > V V > access guest RAM KVM_VMEM_GET_PAGE_REQUEST > | | > V V > page fault ------------------------------>page offset is returned > block | > V > pull page from the source > write the page contents > to the shmem. > | > V > unblock<-----------------------------KVM_VMEM_MARK_PAGE_CACHED > the fault handler returns the page > page fault is resolved > | > | pages can be pulled > | backgroundly > | | > | V > | KVM_VMEM_MARK_PAGE_CACHED > | | > V V > The specified pages<----------------------KVM_VMEM_MAKE_PAGES_PRESENT > are made present | > so future page fault is avoided. | > | | > V V > > all the pages are pulled from the source > > | | > V V > the vma becomes anonymous<----------------KVM_VMEM_MAKE_VMA_ANONYMOUS > (note: I'm not sure if this can be implemented or not) > | | > V V > migration completes exit() > Yes, thanks, this was very helpful. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.