From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59596) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XHr8W-0004ap-L3 for qemu-devel@nongnu.org; Thu, 14 Aug 2014 05:16:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XHr8R-0001Q9-P1 for qemu-devel@nongnu.org; Thu, 14 Aug 2014 05:15:56 -0400 Received: from szxga02-in.huawei.com ([119.145.14.65]:16990) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XHr8Q-000174-Ty for qemu-devel@nongnu.org; Thu, 14 Aug 2014 05:15:51 -0400 Message-ID: <53EC7CAC.2020509@huawei.com> Date: Thu, 14 Aug 2014 17:09:00 +0800 From: zhanghailiang MIME-Version: 1.0 References: <1407928917-16220-1-git-send-email-zhang.zhanghailiang@huawei.com> <20140813115020.GC20244@redhat.com> <53EC57CD.3020503@huawei.com> <20140814071503.GA18294@G08FNSTD100614.fnst.cn.fujitsu.com> In-Reply-To: <20140814071503.GA18294@G08FNSTD100614.fnst.cn.fujitsu.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] mlock: fix bug when mlockall called before mbind List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Hu Tao Cc: "Michael S. Tsirkin" , luonengjun@huawei.com, qemu-devel@nongnu.org, xiexiangyou , peter.huangpeng@huawei.com, aliguori@amazon.com, pbonzini@redhat.com, imammedo@redhat.com, gaowanlong@cn.fujitsu.com On 2014/8/14 15:15, Hu Tao wrote: > On Thu, Aug 14, 2014 at 02:31:41PM +0800, zhanghailiang wrote: >> On 2014/8/13 19:50, Michael S. Tsirkin wrote: >>> On Wed, Aug 13, 2014 at 07:21:57PM +0800, zhanghailiang wrote: >>>> If we configure qemu with realtime-mlock-on and memory-node-bind at the same time, >>>> Qemu will fail to start, and mbind() fails with message "Input/output error". >>>> >>>>> From man page: >>>> int mbind(void *addr, unsigned long len, int mode, >>>> unsigned long *nodemask, unsigned long maxnode, >>>> unsigned flags); >>>> The *MPOL_BIND* mode specifies a strict policy that restricts memory allocation >>>> to the nodes specified in nodemask. >>>> If *MPOL_MF_STRICT* is passed in flags and policy is not MPOL_DEFAULT(In qemu >>>> here is MPOL_BIND), then the call will fail with the error EIO if the existing >>>> pages in the memory range don't follow the policy. >>>> >>>> The memory locked ahead by mlockall can not guarantee to follow the policy above, >>>> And if that happens, it will result in an EIO error. >>>> >>>> So we should call mlock after mbind, here we adjust the place where called mlock, >>>> Move it to function pc_memory_init. >>>> >>>> Signed-off-by: xiexiangyou >>>> Signed-off-by: zhanghailiang >>> >>> OK but won't this still fail in case of memory hotplug? >>> We set MCL_FUTURE so the same will apply? > > It has already been set. > >>> Maybe it's enough to set MPOL_MF_MOVE? >>> Does the following work for you? >>> >> >> Hi Michael, >> >> I have tested memory hotplug, use virsh command like >> 'virsh setmem redhat-6.4 6388608 --config --live', it is OK, and >> it will not call mbind when do such memory hotplug. >> but i don't know if there is command like 'memory-node hotplug' ? > > Using qemu monitor command: > object_add memory-backend-ram,id=ram1,size=128M,host-nodes=0,policy=bind > Hi Hu Tao, I have tested it use the above command, and yes, it failed. And if i remove the *MCL_FUTURE* flag of mlockall(), it will work fine. *MCL_FUTURE* Lock all pages which will become mapped into the address space of the process in the future.(From man page) So i think we could remove this flag, and call mlockall(MCL_CURRENT) every time when we do the above 'memory object add'. >> >> MPOL_MF_MOVE can work, it is more simple, but it is not perfect. It >> consumes more time to *move the memory*(i guess will reconstruct >> pages and copy memory)which has been locked by mlockall. The result >> is VM will start slower than the above scenario. > > I think your patch makes sense but it doesn't work for hotplugged > memory. We need MPOL_MF_MOVE, too. > Thank you very much, before send another patch, I will look into the qemu process of hotplug memory, and try to solve this fault.:) >> >> BTW, i think the follow process is clearer and more logical: >> Allocate memory--->Set memory policy--->Lock memory. >> >> So what's your opinion? Thanks very much. >> >> >>> --> >>> >>> hostmem: set MPOL_MF_MOVE >>> >>> When memory is allocated on a wrong node, MPOL_MF_STRICT >>> doesn't move it - it just fails the allocation. >>> A simple way to reproduce the failure is with mlock=on >>> realtime feature. >>> >>> The code comment actually says: "ensure policy won't be ignored" >>> so setting MPOL_MF_MOVE seems like a better way to do this. >>> >>> Signed-off-by: Michael S. Tsirkin >>> >>> --- >>> >>> diff --git a/backends/hostmem.c b/backends/hostmem.c >>> index ca10c51..a9905c0 100644 >>> --- a/backends/hostmem.c >>> +++ b/backends/hostmem.c >>> @@ -304,7 +304,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) >>> /* ensure policy won't be ignored in case memory is preallocated >>> * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so >>> * this doesn't catch hugepage case. */ >>> - unsigned flags = MPOL_MF_STRICT; >>> + unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE; >>> >>> /* check for invalid host-nodes and policies and give more verbose >>> * error messages than mbind(). */ >>> >> > > > . >