From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48612) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XJAwQ-0002I6-2o for qemu-devel@nongnu.org; Sun, 17 Aug 2014 20:36:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XJAwH-00066q-OD for qemu-devel@nongnu.org; Sun, 17 Aug 2014 20:36:54 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:13349) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XJAwG-00066b-UU for qemu-devel@nongnu.org; Sun, 17 Aug 2014 20:36:45 -0400 Message-ID: <53F14A76.2020608@huawei.com> Date: Mon, 18 Aug 2014 08:36:06 +0800 From: zhanghailiang MIME-Version: 1.0 References: <1407928917-16220-1-git-send-email-zhang.zhanghailiang@huawei.com> <20140813115020.GC20244@redhat.com> <53EC57CD.3020503@huawei.com> <20140814071503.GA18294@G08FNSTD100614.fnst.cn.fujitsu.com> <53EC7CAC.2020509@huawei.com> <20140814095651.GA30944@redhat.com> In-Reply-To: <20140814095651.GA30944@redhat.com> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] mlock: fix bug when mlockall called before mbind List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: Hu Tao , luonengjun@huawei.com, peter.huangpeng@huawei.com, xiexiangyou , qemu-devel@nongnu.org, aliguori@amazon.com, pbonzini@redhat.com, imammedo@redhat.com, gaowanlong@cn.fujitsu.com On 2014/8/14 17:56, Michael S. Tsirkin wrote: > On Thu, Aug 14, 2014 at 05:09:00PM +0800, zhanghailiang wrote: >> On 2014/8/14 15:15, Hu Tao wrote: >>> On Thu, Aug 14, 2014 at 02:31:41PM +0800, zhanghailiang wrote: >>>> On 2014/8/13 19:50, Michael S. Tsirkin wrote: >>>>> On Wed, Aug 13, 2014 at 07:21:57PM +0800, zhanghailiang wrote: >>>>>> If we configure qemu with realtime-mlock-on and memory-node-bind at the same time, >>>>>> Qemu will fail to start, and mbind() fails with message "Input/output error". >>>>>> >>>>>>> From man page: >>>>>> int mbind(void *addr, unsigned long len, int mode, >>>>>> unsigned long *nodemask, unsigned long maxnode, >>>>>> unsigned flags); >>>>>> The *MPOL_BIND* mode specifies a strict policy that restricts memory allocation >>>>>> to the nodes specified in nodemask. >>>>>> If *MPOL_MF_STRICT* is passed in flags and policy is not MPOL_DEFAULT(In qemu >>>>>> here is MPOL_BIND), then the call will fail with the error EIO if the existing >>>>>> pages in the memory range don't follow the policy. >>>>>> >>>>>> The memory locked ahead by mlockall can not guarantee to follow the policy above, >>>>>> And if that happens, it will result in an EIO error. >>>>>> >>>>>> So we should call mlock after mbind, here we adjust the place where called mlock, >>>>>> Move it to function pc_memory_init. >>>>>> >>>>>> Signed-off-by: xiexiangyou >>>>>> Signed-off-by: zhanghailiang >>>>> >>>>> OK but won't this still fail in case of memory hotplug? >>>>> We set MCL_FUTURE so the same will apply? >>> >>> It has already been set. >>> >>>>> Maybe it's enough to set MPOL_MF_MOVE? >>>>> Does the following work for you? >>>>> >>>> >>>> Hi Michael, >>>> >>>> I have tested memory hotplug, use virsh command like >>>> 'virsh setmem redhat-6.4 6388608 --config --live', it is OK, and >>>> it will not call mbind when do such memory hotplug. >>>> but i don't know if there is command like 'memory-node hotplug' ? >>> >>> Using qemu monitor command: >>> object_add memory-backend-ram,id=ram1,size=128M,host-nodes=0,policy=bind >>> >> >> Hi Hu Tao, >> >> I have tested it use the above command, and yes, it failed. >> And if i remove the *MCL_FUTURE* flag of mlockall(), it will work fine. >> >> *MCL_FUTURE* Lock all pages which will become mapped into the >> address space of the process in the future.(From man page) >> >> So i think we could remove this flag, and call mlockall(MCL_CURRENT) >> every time when we do the above 'memory object add'. > > No that's wrong. We want allocations of qemu memory locked as well - > not just guest memory. > >>>> >>>> MPOL_MF_MOVE can work, it is more simple, but it is not perfect. It >>>> consumes more time to *move the memory*(i guess will reconstruct >>>> pages and copy memory)which has been locked by mlockall. The result >>>> is VM will start slower than the above scenario. >>> >>> I think your patch makes sense but it doesn't work for hotplugged >>> memory. We need MPOL_MF_MOVE, too. >>> >> >> Thank you very much, before send another patch, I will look into the >> qemu process of hotplug memory, and try to solve this fault.:) > > Does setting MPOL_MF_MOVE solve all problems? > Hmm, if we don't consider its side-effect(VM will start slower), That solves all problems. :) >>>> >>>> BTW, i think the follow process is clearer and more logical: >>>> Allocate memory--->Set memory policy--->Lock memory. >>>> >>>> So what's your opinion? Thanks very much. >>>> >>>> >>>>> --> >>>>> >>>>> hostmem: set MPOL_MF_MOVE >>>>> >>>>> When memory is allocated on a wrong node, MPOL_MF_STRICT >>>>> doesn't move it - it just fails the allocation. >>>>> A simple way to reproduce the failure is with mlock=on >>>>> realtime feature. >>>>> >>>>> The code comment actually says: "ensure policy won't be ignored" >>>>> so setting MPOL_MF_MOVE seems like a better way to do this. >>>>> >>>>> Signed-off-by: Michael S. Tsirkin >>>>> >>>>> --- >>>>> >>>>> diff --git a/backends/hostmem.c b/backends/hostmem.c >>>>> index ca10c51..a9905c0 100644 >>>>> --- a/backends/hostmem.c >>>>> +++ b/backends/hostmem.c >>>>> @@ -304,7 +304,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) >>>>> /* ensure policy won't be ignored in case memory is preallocated >>>>> * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so >>>>> * this doesn't catch hugepage case. */ >>>>> - unsigned flags = MPOL_MF_STRICT; >>>>> + unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE; >>>>> >>>>> /* check for invalid host-nodes and policies and give more verbose >>>>> * error messages than mbind(). */ >>>>> >>>> >>> >>> >>> . >>> >> > > . >