From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58652) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XHoaG-0001wy-Qd for qemu-devel@nongnu.org; Thu, 14 Aug 2014 02:32:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XHoaC-0003do-0P for qemu-devel@nongnu.org; Thu, 14 Aug 2014 02:32:24 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:3470) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XHoaB-0003dX-4G for qemu-devel@nongnu.org; Thu, 14 Aug 2014 02:32:19 -0400 Message-ID: <53EC57CD.3020503@huawei.com> Date: Thu, 14 Aug 2014 14:31:41 +0800 From: zhanghailiang MIME-Version: 1.0 References: <1407928917-16220-1-git-send-email-zhang.zhanghailiang@huawei.com> <20140813115020.GC20244@redhat.com> In-Reply-To: <20140813115020.GC20244@redhat.com> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] mlock: fix bug when mlockall called before mbind List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: hutao@cn.fujitsu.com, luonengjun@huawei.com, peter.huangpeng@huawei.com, xiexiangyou , qemu-devel@nongnu.org, aliguori@amazon.com, imammedo@redhat.com, pbonzini@redhat.com, gaowanlong@cn.fujitsu.com On 2014/8/13 19:50, Michael S. Tsirkin wrote: > On Wed, Aug 13, 2014 at 07:21:57PM +0800, zhanghailiang wrote: >> If we configure qemu with realtime-mlock-on and memory-node-bind at the same time, >> Qemu will fail to start, and mbind() fails with message "Input/output error". >> >> > From man page: >> int mbind(void *addr, unsigned long len, int mode, >> unsigned long *nodemask, unsigned long maxnode, >> unsigned flags); >> The *MPOL_BIND* mode specifies a strict policy that restricts memory allocation >> to the nodes specified in nodemask. >> If *MPOL_MF_STRICT* is passed in flags and policy is not MPOL_DEFAULT(In qemu >> here is MPOL_BIND), then the call will fail with the error EIO if the existing >> pages in the memory range don't follow the policy. >> >> The memory locked ahead by mlockall can not guarantee to follow the policy above, >> And if that happens, it will result in an EIO error. >> >> So we should call mlock after mbind, here we adjust the place where called mlock, >> Move it to function pc_memory_init. >> >> Signed-off-by: xiexiangyou >> Signed-off-by: zhanghailiang > > OK but won't this still fail in case of memory hotplug? > We set MCL_FUTURE so the same will apply? > Maybe it's enough to set MPOL_MF_MOVE? > Does the following work for you? > Hi Michael, I have tested memory hotplug, use virsh command like 'virsh setmem redhat-6.4 6388608 --config --live', it is OK, and it will not call mbind when do such memory hotplug. but i don't know if there is command like 'memory-node hotplug' ? MPOL_MF_MOVE can work, it is more simple, but it is not perfect. It consumes more time to *move the memory*(i guess will reconstruct pages and copy memory)which has been locked by mlockall. The result is VM will start slower than the above scenario. BTW, i think the follow process is clearer and more logical: Allocate memory--->Set memory policy--->Lock memory. So what's your opinion? Thanks very much. > --> > > hostmem: set MPOL_MF_MOVE > > When memory is allocated on a wrong node, MPOL_MF_STRICT > doesn't move it - it just fails the allocation. > A simple way to reproduce the failure is with mlock=on > realtime feature. > > The code comment actually says: "ensure policy won't be ignored" > so setting MPOL_MF_MOVE seems like a better way to do this. > > Signed-off-by: Michael S. Tsirkin > > --- > > diff --git a/backends/hostmem.c b/backends/hostmem.c > index ca10c51..a9905c0 100644 > --- a/backends/hostmem.c > +++ b/backends/hostmem.c > @@ -304,7 +304,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) > /* ensure policy won't be ignored in case memory is preallocated > * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so > * this doesn't catch hugepage case. */ > - unsigned flags = MPOL_MF_STRICT; > + unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE; > > /* check for invalid host-nodes and policies and give more verbose > * error messages than mbind(). */ >