From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1N9cSZ-00042W-6i
	for qemu-devel@nongnu.org; Sun, 15 Nov 2009 05:35:55 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1N9cSS-0003zs-NI
	for qemu-devel@nongnu.org; Sun, 15 Nov 2009 05:35:53 -0500
Received: from [199.232.76.173] (port=34541 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1N9cSS-0003zp-Fz
	for qemu-devel@nongnu.org; Sun, 15 Nov 2009 05:35:48 -0500
Received: from mx1.redhat.com ([209.132.183.28]:40121)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <avi@redhat.com>) id 1N9cSR-0005n0-TD
	for qemu-devel@nongnu.org; Sun, 15 Nov 2009 05:35:48 -0500
Message-ID: <4AFFD96D.5090100@redhat.com>
Date: Sun, 15 Nov 2009 12:35:25 +0200
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
References: <4AF79242.20406@oss.ntt.co.jp>
In-Reply-To: <4AF79242.20406@oss.ntt.co.jp>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Subject: [Qemu-devel] Re: [RFC] KVM Fault Tolerance: Kemari for KVM
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: =?UTF-8?B?RmVybmFuZG8gTHVpcyBWw6F6cXVleiBDYW8=?= <fernando@oss.ntt.co.jp>
Cc: Andrea Arcangeli <aarcange@redhat.com>, Chris Wright <chrisw@redhat.com>, =?UTF-8?B?b211cmEga2VpKSI=?= <ohmura.kei@lab.ntt.co.jp>, kvm@vger.kernel.org, Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>, qemu-devel@nongnu.org, Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>, =?UTF-8?B?IuWkp+adkeWcrShv?=@gnu.org

On 11/09/2009 05:53 AM, Fernando Luis V=C3=A1zquez Cao wrote:
>
> Kemari runs paired virtual machines in an active-passive configuration
> and achieves whole-system replication by continuously copying the
> state of the system (dirty pages and the state of the virtual devices)
> from the active node to the passive node. An interesting implication
> of this is that during normal operation only the active node is
> actually executing code.
>

Can you characterize the performance impact for various workloads?  I=20
assume you are running continuously in log-dirty mode.  Doesn't this=20
make memory intensive workloads suffer?

>
> The synchronization process can be broken down as follows:
>
>   - Event tapping: On KVM all I/O generates a VMEXIT that is
>     synchronously handled by the Linux kernel monitor i.e. KVM (it is
>     worth noting that this applies to virtio devices too, because they
>     use MMIO and PIO just like a regular PCI device).

Some I/O (virtio-based) is asynchronous, but you still have well-known=20
tap points within qemu.

>
>   - Notification to qemu: Taking a page from live migration's
>     playbook, the synchronization process is user-space driven, which
>     means that qemu needs to be woken up at each synchronization
>     point. That is already the case for qemu-emulated devices, but we
>     also have in-kernel emulators. To compound the problem, even for
>     user-space emulated devices accesses to coalesced MMIO areas can
>     not be detected. As a consequence we need a mechanism to
>     communicate KVM-handled events to qemu.

Do you mean the ioapic, pic, and lapic?  Perhaps its best to start with=20
those in userspace (-no-kvm-irqchip).

Why is access to those chips considered a synchronization point?

>   - Virtual machine synchronization: All the dirty pages since the
>     last synchronization point and the state of the virtual devices is
>     sent to the fallback node from the user-space qemu process. For thi=
s
>     the existing savevm infrastructure and KVM's dirty page tracking
>     capabilities can be reused. Regarding in-kernel devices, with the
>     likely advent of in-kernel virtio backends we need a generic way
>     to access their state from user-space, for which, again, the kvm_ru=
n
>     share memory area could be used.

I wonder if you can pipeline dirty memory synchronization.  That is,=20
write-protect those pages that are dirty, start copying them to the=20
other side, and continue execution, copying memory if the guest faults=20
it again.

How many pages do you copy per synchronization point for reasonably=20
difficult workloads?

--=20
error compiling committee.c: too many arguments to function