From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:42870) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RSUqq-0001GZ-Rl for qemu-devel@nongnu.org; Mon, 21 Nov 2011 09:28:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RSUqk-0004Uj-W6 for qemu-devel@nongnu.org; Mon, 21 Nov 2011 09:28:04 -0500 Received: from e28smtp05.in.ibm.com ([122.248.162.5]:52487) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RSUqk-0004Tv-0u for qemu-devel@nongnu.org; Mon, 21 Nov 2011 09:27:58 -0500 Received: from /spool/local by e28smtp05.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 21 Nov 2011 19:57:53 +0530 Received: from d28av05.in.ibm.com (d28av05.in.ibm.com [9.184.220.67]) by d28relay05.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id pALERn3U3858500 for ; Mon, 21 Nov 2011 19:57:49 +0530 Received: from d28av05.in.ibm.com (loopback [127.0.0.1]) by d28av05.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id pALERnZH001306 for ; Tue, 22 Nov 2011 01:27:49 +1100 Message-ID: <4ECA5FE2.1090905@linux.vnet.ibm.com> Date: Mon, 21 Nov 2011 22:27:46 +0800 From: shu ming MIME-Version: 1.0 References: <1321876869.761.59.camel@watermelon.coderich.net> <4ECA44A3.2040200@redhat.com> In-Reply-To: <4ECA44A3.2040200@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC] Consistent Snapshots Idea List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Richard Laager , qemu-devel , kvm@vger.kernel.org On 2011-11-21 20:31, Avi Kivity wrote: > On 11/21/2011 02:01 PM, Richard Laager wrote: >> I'm not an expert on the architecture of KVM, so perhaps this is a QEMU >> question. If so, please let me know and I'll ask on a different list. > It is a qemu question, yes (though fork()ing a guest also relates to kvm). > >> Background: >> >> Assuming the block layer can make instantaneous snapshots of a guest's >> disk (e.g. lvcreate -s), one can get "crash consistent" (i.e. as if the >> guest crashed) snapshots. To get a "fully consistent" snapshot, you need >> to shutdown the guest. For production VMs, this is obviously not ideal. >> >> Idea: >> >> What if KVM/QEMU was to fork() the guest and shutdown one copy? >> >> KVM/QEMU would momentarily halt the execution of the guest and take a >> writable, instantaneous snapshot of each block device. Then it would >> fork(). The parent would resume execution as normal. The child would >> redirect disk writes to the snapshot(s). The RAM should have >> copy-on-write behavior as with any other fork()ed process. Other >> resources like the network, display, sound, serial, etc. would simply be >> disconnected/bit-bucketed. Finally, the child would resume guest >> execution and send the guest an ACPI power button press event. This >> would cause the guest OS to perform an orderly shutdown. >> >> I believe this would provide consistent snapshots in the vast majority >> of real-world scenarios in a guest OS and application-independent way. > Interesting idea. Will the guest actually shut down nicely without a > network? Things like NFS mounts will break. Does the child and parent process run in parallel? What will happen if the parent process try to access the block device? It looks like that the child process will write to a snapshot file, but where will the parent process write to? > >> Implementation Nits: >> >> * A timeout on the child process would likely be a good idea. >> * It'd probably be best to disconnect the network (i.e. tell the >> guest the cable is unplugged) to avoid long timeouts. Likewise >> for the hardware flow-control lines on the serial port. > This is actually critical, otherwise the guest will shutdown(2) all > sockets and confuse the clients. > >> * For correctness, fdatasync()ing or similar might be necessary >> after halting execution and before creating the snapshots. > Microsoft guests have an API to quiesce storage prior to a snapshot, and > I think there is work to bring this to Linux guests. So it should be > possible to get consistent snapshots even without this, but it takes > more integration. >