From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1N9cSZ-00042W-6i for qemu-devel@nongnu.org; Sun, 15 Nov 2009 05:35:55 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1N9cSS-0003zs-NI for qemu-devel@nongnu.org; Sun, 15 Nov 2009 05:35:53 -0500 Received: from [199.232.76.173] (port=34541 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1N9cSS-0003zp-Fz for qemu-devel@nongnu.org; Sun, 15 Nov 2009 05:35:48 -0500 Received: from mx1.redhat.com ([209.132.183.28]:40121) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1N9cSR-0005n0-TD for qemu-devel@nongnu.org; Sun, 15 Nov 2009 05:35:48 -0500 Message-ID: <4AFFD96D.5090100@redhat.com> Date: Sun, 15 Nov 2009 12:35:25 +0200 From: Avi Kivity MIME-Version: 1.0 References: <4AF79242.20406@oss.ntt.co.jp> In-Reply-To: <4AF79242.20406@oss.ntt.co.jp> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] Re: [RFC] KVM Fault Tolerance: Kemari for KVM List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?UTF-8?B?RmVybmFuZG8gTHVpcyBWw6F6cXVleiBDYW8=?= Cc: Andrea Arcangeli , Chris Wright , =?UTF-8?B?b211cmEga2VpKSI=?= , kvm@vger.kernel.org, Yoshiaki Tamura , qemu-devel@nongnu.org, Takuya Yoshikawa , =?UTF-8?B?IuWkp+adkeWcrShv?=@gnu.org On 11/09/2009 05:53 AM, Fernando Luis V=C3=A1zquez Cao wrote: > > Kemari runs paired virtual machines in an active-passive configuration > and achieves whole-system replication by continuously copying the > state of the system (dirty pages and the state of the virtual devices) > from the active node to the passive node. An interesting implication > of this is that during normal operation only the active node is > actually executing code. > Can you characterize the performance impact for various workloads? I=20 assume you are running continuously in log-dirty mode. Doesn't this=20 make memory intensive workloads suffer? > > The synchronization process can be broken down as follows: > > - Event tapping: On KVM all I/O generates a VMEXIT that is > synchronously handled by the Linux kernel monitor i.e. KVM (it is > worth noting that this applies to virtio devices too, because they > use MMIO and PIO just like a regular PCI device). Some I/O (virtio-based) is asynchronous, but you still have well-known=20 tap points within qemu. > > - Notification to qemu: Taking a page from live migration's > playbook, the synchronization process is user-space driven, which > means that qemu needs to be woken up at each synchronization > point. That is already the case for qemu-emulated devices, but we > also have in-kernel emulators. To compound the problem, even for > user-space emulated devices accesses to coalesced MMIO areas can > not be detected. As a consequence we need a mechanism to > communicate KVM-handled events to qemu. Do you mean the ioapic, pic, and lapic? Perhaps its best to start with=20 those in userspace (-no-kvm-irqchip). Why is access to those chips considered a synchronization point? > - Virtual machine synchronization: All the dirty pages since the > last synchronization point and the state of the virtual devices is > sent to the fallback node from the user-space qemu process. For thi= s > the existing savevm infrastructure and KVM's dirty page tracking > capabilities can be reused. Regarding in-kernel devices, with the > likely advent of in-kernel virtio backends we need a generic way > to access their state from user-space, for which, again, the kvm_ru= n > share memory area could be used. I wonder if you can pipeline dirty memory synchronization. That is,=20 write-protect those pages that are dirty, start copying them to the=20 other side, and continue execution, copying memory if the guest faults=20 it again. How many pages do you copy per synchronization point for reasonably=20 difficult workloads? --=20 error compiling committee.c: too many arguments to function