From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:37784) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qr9G4-0002yU-Sf for qemu-devel@nongnu.org; Wed, 10 Aug 2011 09:55:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Qr9G3-0006Tz-Hd for qemu-devel@nongnu.org; Wed, 10 Aug 2011 09:55:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49649) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qr9G3-0006Tn-A9 for qemu-devel@nongnu.org; Wed, 10 Aug 2011 09:55:43 -0400 Message-ID: <4E428DD4.9030806@redhat.com> Date: Wed, 10 Aug 2011 16:55:32 +0300 From: Avi Kivity MIME-Version: 1.0 References: <20110808032438.GC24764@valinux.co.jp> <4E3FD8DE.6060508@redhat.com> <20110809023305.GG25667@valinux.co.jp> In-Reply-To: <20110809023305.GG25667@valinux.co.jp> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC] postcopy livemigration proposal List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Isaku Yamahata Cc: Andrea Arcangeli , t.hirofuchi@aist.go.jp, qemu-devel@nongnu.org, kvm@vger.kernel.org, satoshi.itoh@aist.go.jp On 08/09/2011 05:33 AM, Isaku Yamahata wrote: > On Mon, Aug 08, 2011 at 03:38:54PM +0300, Avi Kivity wrote: > > On 08/08/2011 06:24 AM, Isaku Yamahata wrote: > >> This mail is on "Yabusame: Postcopy Live Migration for Qemu/KVM" > >> on which we'll give a talk at KVM-forum. > >> The purpose of this mail is to letting developers know it in advance > >> so that we can get better feedback on its design/implementation approach > >> early before our starting to implement it. > > > > Interesting; what is the impact of increased latency on memory reads? > > Many people has already discussed it much in another thread. :-) > That's much more than I expected. Can you point me to the discussion? > > >> There are several design points. > >> - who takes care of pulling page contents. > >> an independent daemon vs a thread in qemu > >> The daemon approach is preferable because an independent daemon would > >> easy for debug postcopy memory mechanism without qemu. > >> If required, it wouldn't be difficult to convert a daemon into > >> a thread in qemu > > > > Isn't this equivalent to touching each page in sequence? > > No. I don't get your point of this question. If you have a qemu thread that does for (each guest page) sum += *(char *)page; doesn't that effectively pull all pages from the source node? (but maybe I'm assuming that the kernel takes care of things and this isn't the case?) > >> > >> - hooking guest RAM access > >> Introduce a character device to handle page fault. > >> When page fault occurs, it queues page request up to user space daemon > >> at the destination. And the daemon pulls page contents from the source > >> and serves it into the character device. Then the page fault is resovlved. > > > > This doesn't play well with host swapping, transparent hugepages, or > > ksm, does it? > > No. At least it wouldn't be so difficult to fix it, I haven't looked ksm, > thp so closely though. > Although the vma is backed by the device, the populated page is > anonymous. (by MMAP_PRIVATE or the deriver returning anonymous page) > So swapping, thp, ksm should work. I'm not 100% sure, but I think that thp and ksm need the vma to be anonymous, not just the page. > > > > It would need to be a special kind of swap device since we only want to > > swap in, and never out, to that device. We'd also need a special way of > > telling the kernel that memory comes from that device. In that it's > > similar your second option. > > > > Maybe we should use a backing file (using nbd) and have a madvise() call > > that converts the vma to anonymous memory once the migration is finished. > > With whichever options, I'd like to convert the vma into anonymous area > after the migration completes somehow. i.e. nulling vma->vm_ops. > (The pages are already anonymous.) > > It seems troublesome involving complicated races/lockings. So I'm not sure > it's worthwhile. Andrea, what's your take on this? -- error compiling committee.c: too many arguments to function