From mboxrd@z Thu Jan  1 00:00:00 1970
From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Subject: Re: [RFC PATCH V1 0/6] mm: add a new option MREMAP_DUP to mmrep syscall
Date: Mon, 06 Jan 2014 15:41:52 +0800
Message-ID: <52CA5E40.6040603@linux.vnet.ibm.com>
References: <1368093011-4867-1-git-send-email-wenchaolinux@gmail.com> <20130509141329.GC11497@suse.de> <518C5B5E.4010706@gmail.com> <CAJSP0QULp5c3tWwZ4ipWn6wS3YWauE07Bmd8nzjp8CJhWaD_oQ@mail.gmail.com> <52AFE828.3010500@linux.vnet.ibm.com> <20131230202342.GA7973@amt.cnet> <943AC3BD-C4EB-4B6C-BE34-AB921938AAF0@linux.vnet.ibm.com> <20131231185328.GA22414@amt.cnet>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: Stefan Hajnoczi <stefanha@gmail.com>, wenchao <wenchaolinux@gmail.com>,
        Mel Gorman <mgorman@suse.de>, linux-mm@kvack.org,
        Andrew Morton <akpm@linux-foundation.org>, hughd@google.com,
        walken@google.com, Alexander Viro <viro@zeniv.linux.org.uk>,
        kirill.shutemov@linux.intel.com,
        Anthony Liguori <anthony@codemonkey.ws>, KVM <kvm@vger.kernel.org>
To: Marcelo Tosatti <mtosatti@redhat.com>
Return-path: <owner-linux-mm@kvack.org>
In-Reply-To: <20131231185328.GA22414@amt.cnet>
Sender: owner-linux-mm@kvack.org
List-Id: kvm.vger.kernel.org

On 01/01/2014 02:53 AM, Marcelo Tosatti wrote:
> On Tue, Dec 31, 2013 at 08:06:51PM +0800, Xiao Guangrong wrote:
>>
>> On Dec 31, 2013, at 4:23 AM, Marcelo Tosatti <mtosatti@redhat.com> wro=
te:
>>
>>> On Tue, Dec 17, 2013 at 01:59:04PM +0800, Xiao Guangrong wrote:
>>>>
>>>> CCed KVM guys.
>>>>
>>>> On 05/10/2013 01:11 PM, Stefan Hajnoczi wrote:
>>>>> On Fri, May 10, 2013 at 4:28 AM, wenchao <wenchaolinux@gmail.com> w=
rote:
>>>>>> =E4=BA=8E 2013-5-9 22:13, Mel Gorman =E5=86=99=E9=81=93:
>>>>>>
>>>>>>> On Thu, May 09, 2013 at 05:50:05PM +0800, wenchaolinux@gmail.com =
wrote:
>>>>>>>>
>>>>>>>> From: Wenchao Xia <wenchaolinux@gmail.com>
>>>>>>>>
>>>>>>>>  This serial try to enable mremap syscall to cow some private me=
mory
>>>>>>>> region,
>>>>>>>> just like what fork() did. As a result, user space application w=
ould got
>>>>>>>> a
>>>>>>>> mirror of those region, and it can be used as a snapshot for fur=
ther
>>>>>>>> processing.
>>>>>>>>
>>>>>>>
>>>>>>> What not just fork()? Even if the application was threaded it sho=
uld be
>>>>>>> managable to handle fork just for processing the private memory r=
egion
>>>>>>> in question. I'm having trouble figuring out what sort of applica=
tion
>>>>>>> would require an interface like this.
>>>>>>>
>>>>>> It have some troubles: parent - child communication, sometimes
>>>>>> page copy.
>>>>>> I'd like to snapshot qemu guest's RAM, currently solution is:
>>>>>> 1) fork()
>>>>>> 2) pipe guest RAM data from child to parent.
>>>>>> 3) parent write down the contents.
>>>>>>
>>>>>> To avoid complex communication for data control, and file content
>>>>>> protecting, So let parent instead of child handling the data with
>>>>>> a pipe, but this brings additional copy(). I think an explicit API
>>>>>> cow mapping an memory region inside one process, could avoid it,
>>>>>> and faster and cow less pages, also make user space code nicer.
>>>>>
>>>>> A new Linux-specific API is not portable and not available on exist=
ing
>>>>> hosts.  Since QEMU supports non-Linux host operating systems the
>>>>> fork() approach is preferable.
>>>>>
>>>>> If you're worried about the memory copy - which should be benchmark=
ed
>>>>> - then vmsplice(2) can be used in the child process and splice(2) c=
an
>>>>> be used in the parent.  It probably doesn't help though since QEMU
>>>>> scans RAM pages to find all-zero pages before sending them over the
>>>>> socket, and at that point the memory copy might not make much
>>>>> difference.
>>>>>
>>>>> Perhaps other applications can use this new flag better, but for QE=
MU
>>>>> I think fork()'s portability is more important than the convenience=
 of
>>>>> accessing the CoW pages in the same process.
>>>>
>>>> Yup, I agree with you that the new syscall sometimes is not a good s=
olution.
>>>>
>>>> Currently, we're working on live-update[1] that will be enabled on Q=
emu firstly,
>>>> this feature let the guest run on the new Qemu binary smoothly witho=
ut
>>>> restart, it's good for us to do security-update.
>>>>
>>>> In this case, we need to move the guest memory on old qemu instance =
to the
>>>> new one, fork() can not help because we need to exec() a new instanc=
e, after
>>>> that all memory mapping will be destroyed.
>>>>
>>>> We tried to enable SPLICE_F_MOVE[2] for vmsplice() to move the memor=
y without
>>>> memory-copy but the performance isn't so good as we expected: it's d=
ue to
>>>> some limitations: the page-size, lock, message-size limitation on pi=
pe, etc.
>>>> Of course, we will continue to improve this, but wenchao's patch see=
ms a new
>>>> direction for us.
>>>>
>>>> To coordinate with your fork() approach, maybe we can introduce a ne=
w flag
>>>> for VMA, something like: VM_KEEP_ONEXEC, to tell exec() to do not de=
stroy
>>>> this VMA. How about this or you guy have new idea? Really appreciate=
 for your
>>>> suggestion.
>>>>
>>>> [1] http://marc.info/?l=3Dqemu-devel&m=3D138597598700844&w=3D2
>>>> [2] https://lkml.org/lkml/2013/10/25/285
>>>
>>> Hi,
>>>
>>
>> Hi Marcelo,
>>
>>
>>> What is the purpose of snapshotting guest RAM here, in the context of
>>> local migration?
>>
>> RAM-shapshotting and local-migration are on the different ways.
>> Why i asked for your guy=E2=80=99s suggestion here is  beacuse i  thou=
ght
>> they need do a same thing that moves memory from one process
>> to another in a efficient way. Your idea? :)
>=20
> Another possibility is to use memory that is not anonymous for guest
> RAM, such as hugetlbfs or tmpfs.=20
>=20
> IIRC ksm and thp have limitations wrt tmpfs.

Yes, KSM and THP are what we're concerning about.

>=20
> Still curious about RAM snapshotting.

Wen Chao, could you please tell it more?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=3Dmailto:"dont@kvack.org"> email@kvack.org </a>