From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:40276)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1QqP6n-0005nY-RB
	for qemu-devel@nongnu.org; Mon, 08 Aug 2011 08:39:06 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1QqP6l-0004cx-EX
	for qemu-devel@nongnu.org; Mon, 08 Aug 2011 08:39:05 -0400
Received: from mx1.redhat.com ([209.132.183.28]:2774)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1QqP6l-0004cX-5F
	for qemu-devel@nongnu.org; Mon, 08 Aug 2011 08:39:03 -0400
Message-ID: <4E3FD8DE.6060508@redhat.com>
Date: Mon, 08 Aug 2011 15:38:54 +0300
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
References: <20110808032438.GC24764@valinux.co.jp>
In-Reply-To: <20110808032438.GC24764@valinux.co.jp>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC] postcopy livemigration proposal
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Andrea Arcangeli <aarcange@redhat.com>, t.hirofuchi@aist.go.jp, qemu-devel@nongnu.org, kvm@vger.kernel.org, satoshi.itoh@aist.go.jp

On 08/08/2011 06:24 AM, Isaku Yamahata wrote:
> This mail is on "Yabusame: Postcopy Live Migration for Qemu/KVM"
> on which we'll give a talk at KVM-forum.
> The purpose of this mail is to letting developers know it in advance
> so that we can get better feedback on its design/implementation approach
> early before our starting to implement it.

Interesting; what is the impact of increased latency on memory reads?

>
>
> There are several design points.
>    - who takes care of pulling page contents.
>      an independent daemon vs a thread in qemu
>      The daemon approach is preferable because an independent daemon would
>      easy for debug postcopy memory mechanism without qemu.
>      If required, it wouldn't be difficult to convert a daemon into
>      a thread in qemu

Isn't this equivalent to touching each page in sequence?

Care must be taken that we don't post too many requests, or it could 
affect the latency of synchronous accesses by the guest.

>
>    - connection between the source and the destination
>      The connection for live migration can be re-used after sending machine
>      state.
>
>    - transfer protocol
>      The existing protocol that exists today can be extended.
>
>    - hooking guest RAM access
>      Introduce a character device to handle page fault.
>      When page fault occurs, it queues page request up to user space daemon
>      at the destination. And the daemon pulls page contents from the source
>      and serves it into the character device. Then the page fault is resovlved.

This doesn't play well with host swapping, transparent hugepages, or 
ksm, does it?

I see you note this later on.

> * More on hooking guest RAM access
> There are several candidate for the implementation. Our preference is
> character device approach.
>
>    - inserting hooks into everywhere in qemu/kvm
>      This is impractical
>
>    - backing store for guest ram
>      a block device or a file can be used to back guest RAM.
>      Thus hook the guest ram access.
>
>      pros
>      - new device driver isn't needed.
>      cons
>      - future improvement would be difficult
>      - some KVM host feature(KSM, THP) wouldn't work
>
>    - character device
>      qemu mmap() the dedicated character device, and then hook page fault.
>
>      pros
>      - straght forward approach
>      - future improvement would be easy
>      cons
>      - new driver is needed
>      - some KVM host feature(KSM, THP) wouldn't work
>        They checks if a given VMA is anonymous. This can be fixed.
>
>    - swap device
>      When creating guest, it is set up as if all the guest RAM is swapped out
>      to a dedicated swap device, which may be nbd disk (or some kind of user
>      space block device, BUSE?).
>      When the VM tries to access memory, swap-in is triggered and IO to the
>      swap device is issued. Then the IO to swap is routed to the daemon
>      in user space with nbd protocol (or BUSE, AOE, iSCSI...). The daemon pulls
>      pages from the migration source and services the IO request.
>
>      pros
>      - After the page transfer is complete, everything is same as normal case.
>      - no new device driver isn't needed
>      cons
>      - future improvement would be difficult
>      - administration: setting up nbd, swap device
>

Using a swap device would be my preference.  We'd still be using 
anonymous memory so thp/ksm/ordinary swap still work.

It would need to be a special kind of swap device since we only want to 
swap in, and never out, to that device.  We'd also need a special way of 
telling the kernel that memory comes from that device.  In that it's 
similar your second option.

Maybe we should use a backing file (using nbd) and have a madvise() call 
that converts the vma to anonymous memory once the migration is finished.

-- 
error compiling committee.c: too many arguments to function