From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:37198)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1Xsoxu-0005PH-Dd
	for qemu-devel@nongnu.org; Mon, 24 Nov 2014 03:25:52 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1Xsoxo-0006A5-1P
	for qemu-devel@nongnu.org; Mon, 24 Nov 2014 03:25:46 -0500
Received: from szxga01-in.huawei.com ([119.145.14.64]:6697)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1Xsoxm-00069o-Sj
	for qemu-devel@nongnu.org; Mon, 24 Nov 2014 03:25:39 -0500
Message-ID: <5472EB84.70800@huawei.com>
Date: Mon, 24 Nov 2014 16:25:40 +0800
From: zhanghailiang <zhang.zhanghailiang@huawei.com>
MIME-Version: 1.0
References: <1412358473-31398-1-git-send-email-dgilbert@redhat.com>
	<546EB5F3.6020600@huawei.com> <20141121185634.GJ4569@redhat.com>
In-Reply-To: <20141121185634.GJ4569@redhat.com>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v4 00/47] Postcopy implementation
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: yamahata@private.email.ne.jp, lilei@linux.vnet.ibm.com, quintela@redhat.com, cristian.klein@cs.umu.se, "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com>, qemu-devel@nongnu.org, amit.shah@redhat.com, yanghy@cn.fujitsu.com

On 2014/11/22 2:56, Andrea Arcangeli wrote:
> On Fri, Nov 21, 2014 at 11:48:03AM +0800, zhanghailiang wrote:
>> Hi David,
>>
>> When i migrated VM in postcopy way when configuring VM with '-realtime mlock=on' option,
>> It failed, and reports "postcopy_ram_hosttest: remap_anon_pages not available: File exists" in destination,
>>
>> Is it a bug of userfaultfd API?
>
> It's not userfaultfd related, but it's remap_anon_pages related (in
> the future mcopy_atomic or equivalent userfaultfd cmd) and
> MADV_DONTNEED related.
>
> If the destination qemu starts with mlockall(current|future), -EEXIST
> saves the day by noticing all not yet transferred pages were already
> present in the destination (as allocated zero pages). We can't trigger
> non-present faults (in userfaultfd) if the dst starts with mlockall.
>
> Furthermore if precopy has been run before postcopy (currently it's
> always the case as there's no way to specify the number of precopy
> passes to run before starting postcopy... in turn allowing to specify
> zero passes) the bitmap with the re-dirtied pages must be transferred
> to the destination before postcopy can start, and MADV_DONTNEED has to
> be used to zap those re-dirtied pages. But MADV_DONTNEED will fail
> with -EINVAL too well before postcopy starts if mlockall is set on the
> destination qemu.
>
> If you didn't fail at -EINVAL in the destination MADV_DONTNEED
> probably there wasn't any redirtied page.
>
> remap_anon_pages is extremely strict (unlike vma-mangling mremap that
> would just zap the dst range vma silently if it existed) so it cannot
> overwrite the guest memory and you get EEXIST (the strictness was
> intentional to eliminate the risk of any memory corruption if userland
> hits a bug like in this case).
>
> But it should have failed before with MADV_DONTNEED returning -EINVAL
> if there was any re-redirted page between the last precopy pass and
> postcopy (I assume the guest was idle?).
>

You are right ;)

> In short I think to fix this qemu should call mlockall in the
> destination only after postcopy is complete. There's no way to lock
> the memory in the destination if the memory still resides in the
> source so some userfault may have to happen (and if userfaults happen,
> it means we're ot mlocked yet).
>

Got it, so this problem should be fixed in qemu. Thanks for your explanation.