From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:44722) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UJGDI-0000So-9t for qemu-devel@nongnu.org; Sat, 23 Mar 2013 00:37:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UJGDG-0004G4-VU for qemu-devel@nongnu.org; Sat, 23 Mar 2013 00:37:52 -0400 Received: from e28smtp01.in.ibm.com ([122.248.162.1]:46340) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UJGDG-00049B-Ez for qemu-devel@nongnu.org; Sat, 23 Mar 2013 00:37:50 -0400 Received: from /spool/local by e28smtp01.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sat, 23 Mar 2013 10:03:53 +0530 Received: from d28relay02.in.ibm.com (d28relay02.in.ibm.com [9.184.220.59]) by d28dlp03.in.ibm.com (Postfix) with ESMTP id D8462125804E for ; Sat, 23 Mar 2013 10:08:49 +0530 (IST) Received: from d28av04.in.ibm.com (d28av04.in.ibm.com [9.184.220.66]) by d28relay02.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r2N4bZoS51445958 for ; Sat, 23 Mar 2013 10:07:36 +0530 Received: from d28av04.in.ibm.com (loopback [127.0.0.1]) by d28av04.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r2N4bcPZ028709 for ; Sat, 23 Mar 2013 15:37:38 +1100 Message-ID: <514D3142.9050000@linux.vnet.ibm.com> Date: Sat, 23 Mar 2013 12:36:18 +0800 From: Wenchao Xia MIME-Version: 1.0 References: <5142CCB6.7000004@linux.vnet.ibm.com> <51471686.3030505@redhat.com> <514AABFC.1030605@linux.vnet.ibm.com> <514AF393.8030109@redhat.com> <20130321133802.GA15276@stefanha-thinkpad.redhat.com> <514B0E3F.5070309@redhat.com> <20130321145630.GA16677@stefanha-thinkpad.redhat.com> <514B2277.8070502@redhat.com> In-Reply-To: <514B2277.8070502@redhat.com> Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [RFC] qmp interface for save vmstate to image List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake Cc: Kevin Wolf , Pavel Hrdina , Juan Quintela , Stefan Hajnoczi , qemu-devel , Paolo Bonzini , Dietmar Maurer ÓÚ 2013-3-21 23:08, Eric Blake дµÀ: > On 03/21/2013 08:56 AM, Stefan Hajnoczi wrote: >> On Thu, Mar 21, 2013 at 02:42:23PM +0100, Paolo Bonzini wrote: >>> Il 21/03/2013 14:38, Stefan Hajnoczi ha scritto: >>>> There already is a guest RAM cloning mechanism: fork the QEMU process. >>>> Then you have a copy-on-write guest RAM. >>>> >>>> In a little more detail: >>>> >>>> 1. save non-RAM device state >>>> 2. quiesce QEMU to a state that is safe for forking >>>> 3. create an EventNotifier for live savevm completion signal >>>> 4. fork and pass completion EventNotifier to child >>>> 5. parent continues running VM >>>> 6. child performs vmsave of copy-on-write guest RAM >>>> 7. child signals completion EventNotifier and terminates >>>> 8. parent raises live savevm completion QMP event >>> >>> Forking a threaded program is not so easy, but it could be done if the >>> child is very simple and only uses syscalls to communicate back with the >>> parent: >> >> On Linux you should be able to use clone(2) to spawn a thread with >> copy-on-write memory. Too bad it's not portable because it gets around >> the messy fork issues. > > And introduces its own messy issues - once you clone() using different > flags than what fork() does, you have invalidated the use of a LOT of > libc interfaces in that child; in particular, any use of pthread is > liable to break. > I think the core of fork() is snapshot RAM pages with RAM, just like LVM2's block snapshot, very cool idea :). The problem is implemention, an API like following is needed: void *mem_snapshot(void *addr, uint64_t len); Briefly I haven't found it on Linux, and not sure if it is available on upstream Linux kernel/C lib. Make this API available then use it in qemu, would be much nicer. It is very challenge to use fork()/clone() way in qemu, I guess there will be many sparse code preparing for fork(), and some resource handling code after fork(), code to query progress, exception handling, child/parent talking mechnism, ah... seems complex. But I am looking forward to see how good it is. Compared with migration to image, the later one use less mem with more I/O, but is much easier to be implemented and portable, maybe it can be used as a simple improvement for "migrate to fd", before an underlining mem snapshot API is available. -- Best Regards Wenchao Xia