qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] mlock: fix bug when mlockall called before mbind
@ 2014-08-13 11:21 zhanghailiang
  2014-08-13 11:50 ` Michael S. Tsirkin
  0 siblings, 1 reply; 7+ messages in thread
From: zhanghailiang @ 2014-08-13 11:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhanghailiang, mst, hutao, luonengjun, peter.huangpeng,
	xiexiangyou, aliguori, pbonzini, gaowanlong

If we configure qemu with realtime-mlock-on and memory-node-bind at the same time,
Qemu will fail to start, and mbind() fails with message "Input/output error".

>From man page:
int mbind(void *addr, unsigned long len, int mode,
                 unsigned long *nodemask, unsigned long maxnode,
                 unsigned flags);
The *MPOL_BIND* mode specifies a strict policy that restricts memory allocation
to the nodes specified in nodemask.
If *MPOL_MF_STRICT* is passed in flags and policy is not MPOL_DEFAULT(In qemu
here is MPOL_BIND), then the call will fail with the error EIO if the existing
pages in  the memory range don't follow the policy.

The memory locked ahead by mlockall can not guarantee to follow the policy above,
And if that happens, it will result in an EIO error.

So we should call mlock after mbind, here we adjust the place where called mlock,
Move it to function pc_memory_init.

Signed-off-by: xiexiangyou <xiexiangyou@huawei.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 hw/i386/pc.c            |  8 ++++++++
 include/sysemu/sysemu.h |  1 +
 vl.c                    | 10 +---------
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 9e58982..08a03c2 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1321,6 +1321,14 @@ FWCfgState *pc_memory_init(MachineState *machine,
     for (i = 0; i < nb_option_roms; i++) {
         rom_add_option(option_rom[i].name, option_rom[i].bootindex);
     }
+
+    if (enable_mlock) {
+        if (os_mlock() < 0) {
+            error_report("qemu: locking memory failed\n");
+            exit(EXIT_FAILURE);
+        }
+    }
+
     guest_info->fw_cfg = fw_cfg;
     return fw_cfg;
 }
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index d8539fd..b61e78f 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -136,6 +136,7 @@ extern uint8_t qemu_extra_params_fw[2];
 extern QEMUClockType rtc_clock;
 extern const char *mem_path;
 extern int mem_prealloc;
+extern bool enable_mlock;
 
 #define MAX_NODES 128
 
diff --git a/vl.c b/vl.c
index a8029d5..9a19d97 100644
--- a/vl.c
+++ b/vl.c
@@ -134,6 +134,7 @@ const char* keyboard_layout = NULL;
 ram_addr_t ram_size;
 const char *mem_path = NULL;
 int mem_prealloc = 0; /* force preallocation of physical target memory */
+bool enable_mlock = false;
 int nb_nics;
 NICInfo nd_table[MAX_NICS];
 int autostart;
@@ -1419,16 +1420,7 @@ static void smp_parse(QemuOpts *opts)
 
 static void configure_realtime(QemuOpts *opts)
 {
-    bool enable_mlock;
-
     enable_mlock = qemu_opt_get_bool(opts, "mlock", true);
-
-    if (enable_mlock) {
-        if (os_mlock() < 0) {
-            fprintf(stderr, "qemu: locking memory failed\n");
-            exit(1);
-        }
-    }
 }
 
 
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] mlock: fix bug when mlockall called before mbind
  2014-08-13 11:21 [Qemu-devel] [PATCH] mlock: fix bug when mlockall called before mbind zhanghailiang
@ 2014-08-13 11:50 ` Michael S. Tsirkin
  2014-08-14  6:31   ` zhanghailiang
  0 siblings, 1 reply; 7+ messages in thread
From: Michael S. Tsirkin @ 2014-08-13 11:50 UTC (permalink / raw)
  To: zhanghailiang
  Cc: hutao, luonengjun, peter.huangpeng, xiexiangyou, qemu-devel,
	aliguori, imammedo, pbonzini, gaowanlong

On Wed, Aug 13, 2014 at 07:21:57PM +0800, zhanghailiang wrote:
> If we configure qemu with realtime-mlock-on and memory-node-bind at the same time,
> Qemu will fail to start, and mbind() fails with message "Input/output error".
> 
> >From man page:
> int mbind(void *addr, unsigned long len, int mode,
>                  unsigned long *nodemask, unsigned long maxnode,
>                  unsigned flags);
> The *MPOL_BIND* mode specifies a strict policy that restricts memory allocation
> to the nodes specified in nodemask.
> If *MPOL_MF_STRICT* is passed in flags and policy is not MPOL_DEFAULT(In qemu
> here is MPOL_BIND), then the call will fail with the error EIO if the existing
> pages in  the memory range don't follow the policy.
> 
> The memory locked ahead by mlockall can not guarantee to follow the policy above,
> And if that happens, it will result in an EIO error.
> 
> So we should call mlock after mbind, here we adjust the place where called mlock,
> Move it to function pc_memory_init.
> 
> Signed-off-by: xiexiangyou <xiexiangyou@huawei.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>

OK but won't this still fail in case of memory hotplug?
We set MCL_FUTURE so the same will apply?
Maybe it's enough to set MPOL_MF_MOVE?
Does the following work for you?

-->

hostmem: set MPOL_MF_MOVE

When memory is allocated on a wrong node, MPOL_MF_STRICT
doesn't move it - it just fails the allocation.
A simple way to reproduce the failure is with mlock=on
realtime feature.

The code comment actually says: "ensure policy won't be ignored"
so setting MPOL_MF_MOVE seems like a better way to do this.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

diff --git a/backends/hostmem.c b/backends/hostmem.c
index ca10c51..a9905c0 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -304,7 +304,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
         /* ensure policy won't be ignored in case memory is preallocated
          * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
          * this doesn't catch hugepage case. */
-        unsigned flags = MPOL_MF_STRICT;
+        unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
 
         /* check for invalid host-nodes and policies and give more verbose
          * error messages than mbind(). */

-- 
MST

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] mlock: fix bug when mlockall called before mbind
  2014-08-13 11:50 ` Michael S. Tsirkin
@ 2014-08-14  6:31   ` zhanghailiang
  2014-08-14  7:15     ` Hu Tao
  0 siblings, 1 reply; 7+ messages in thread
From: zhanghailiang @ 2014-08-14  6:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: hutao, luonengjun, peter.huangpeng, xiexiangyou, qemu-devel,
	aliguori, imammedo, pbonzini, gaowanlong

On 2014/8/13 19:50, Michael S. Tsirkin wrote:
> On Wed, Aug 13, 2014 at 07:21:57PM +0800, zhanghailiang wrote:
>> If we configure qemu with realtime-mlock-on and memory-node-bind at the same time,
>> Qemu will fail to start, and mbind() fails with message "Input/output error".
>>
>> > From man page:
>> int mbind(void *addr, unsigned long len, int mode,
>>                   unsigned long *nodemask, unsigned long maxnode,
>>                   unsigned flags);
>> The *MPOL_BIND* mode specifies a strict policy that restricts memory allocation
>> to the nodes specified in nodemask.
>> If *MPOL_MF_STRICT* is passed in flags and policy is not MPOL_DEFAULT(In qemu
>> here is MPOL_BIND), then the call will fail with the error EIO if the existing
>> pages in  the memory range don't follow the policy.
>>
>> The memory locked ahead by mlockall can not guarantee to follow the policy above,
>> And if that happens, it will result in an EIO error.
>>
>> So we should call mlock after mbind, here we adjust the place where called mlock,
>> Move it to function pc_memory_init.
>>
>> Signed-off-by: xiexiangyou<xiexiangyou@huawei.com>
>> Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
>
> OK but won't this still fail in case of memory hotplug?
> We set MCL_FUTURE so the same will apply?
> Maybe it's enough to set MPOL_MF_MOVE?
> Does the following work for you?
>

Hi Michael,

I have tested memory hotplug, use virsh command like
'virsh setmem redhat-6.4 6388608 --config --live', it is OK, and
it will not call mbind when do such memory hotplug.
but i don't know if there is command like 'memory-node hotplug' ?

MPOL_MF_MOVE can work, it is more simple, but it is not perfect. It 
consumes more time to *move the memory*(i guess will reconstruct pages 
and copy memory)which has been locked by mlockall. The result is VM will 
start slower than the above scenario.

BTW, i think the follow process is clearer and more logical:
Allocate memory--->Set memory policy--->Lock memory.

So what's your opinion? Thanks very much.


> -->
>
> hostmem: set MPOL_MF_MOVE
>
> When memory is allocated on a wrong node, MPOL_MF_STRICT
> doesn't move it - it just fails the allocation.
> A simple way to reproduce the failure is with mlock=on
> realtime feature.
>
> The code comment actually says: "ensure policy won't be ignored"
> so setting MPOL_MF_MOVE seems like a better way to do this.
>
> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>
> ---
>
> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index ca10c51..a9905c0 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -304,7 +304,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
>           /* ensure policy won't be ignored in case memory is preallocated
>            * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
>            * this doesn't catch hugepage case. */
> -        unsigned flags = MPOL_MF_STRICT;
> +        unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
>
>           /* check for invalid host-nodes and policies and give more verbose
>            * error messages than mbind(). */
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] mlock: fix bug when mlockall called before mbind
  2014-08-14  6:31   ` zhanghailiang
@ 2014-08-14  7:15     ` Hu Tao
  2014-08-14  9:09       ` zhanghailiang
  0 siblings, 1 reply; 7+ messages in thread
From: Hu Tao @ 2014-08-14  7:15 UTC (permalink / raw)
  To: zhanghailiang
  Cc: Michael S. Tsirkin, luonengjun, peter.huangpeng, xiexiangyou,
	qemu-devel, aliguori, imammedo, pbonzini, gaowanlong

On Thu, Aug 14, 2014 at 02:31:41PM +0800, zhanghailiang wrote:
> On 2014/8/13 19:50, Michael S. Tsirkin wrote:
> >On Wed, Aug 13, 2014 at 07:21:57PM +0800, zhanghailiang wrote:
> >>If we configure qemu with realtime-mlock-on and memory-node-bind at the same time,
> >>Qemu will fail to start, and mbind() fails with message "Input/output error".
> >>
> >>> From man page:
> >>int mbind(void *addr, unsigned long len, int mode,
> >>                  unsigned long *nodemask, unsigned long maxnode,
> >>                  unsigned flags);
> >>The *MPOL_BIND* mode specifies a strict policy that restricts memory allocation
> >>to the nodes specified in nodemask.
> >>If *MPOL_MF_STRICT* is passed in flags and policy is not MPOL_DEFAULT(In qemu
> >>here is MPOL_BIND), then the call will fail with the error EIO if the existing
> >>pages in  the memory range don't follow the policy.
> >>
> >>The memory locked ahead by mlockall can not guarantee to follow the policy above,
> >>And if that happens, it will result in an EIO error.
> >>
> >>So we should call mlock after mbind, here we adjust the place where called mlock,
> >>Move it to function pc_memory_init.
> >>
> >>Signed-off-by: xiexiangyou<xiexiangyou@huawei.com>
> >>Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
> >
> >OK but won't this still fail in case of memory hotplug?
> >We set MCL_FUTURE so the same will apply?

It has already been set.

> >Maybe it's enough to set MPOL_MF_MOVE?
> >Does the following work for you?
> >
> 
> Hi Michael,
> 
> I have tested memory hotplug, use virsh command like
> 'virsh setmem redhat-6.4 6388608 --config --live', it is OK, and
> it will not call mbind when do such memory hotplug.
> but i don't know if there is command like 'memory-node hotplug' ?

Using qemu monitor command:
object_add memory-backend-ram,id=ram1,size=128M,host-nodes=0,policy=bind

> 
> MPOL_MF_MOVE can work, it is more simple, but it is not perfect. It
> consumes more time to *move the memory*(i guess will reconstruct
> pages and copy memory)which has been locked by mlockall. The result
> is VM will start slower than the above scenario.

I think your patch makes sense but it doesn't work for hotplugged
memory. We need MPOL_MF_MOVE, too.

> 
> BTW, i think the follow process is clearer and more logical:
> Allocate memory--->Set memory policy--->Lock memory.
> 
> So what's your opinion? Thanks very much.
> 
> 
> >-->
> >
> >hostmem: set MPOL_MF_MOVE
> >
> >When memory is allocated on a wrong node, MPOL_MF_STRICT
> >doesn't move it - it just fails the allocation.
> >A simple way to reproduce the failure is with mlock=on
> >realtime feature.
> >
> >The code comment actually says: "ensure policy won't be ignored"
> >so setting MPOL_MF_MOVE seems like a better way to do this.
> >
> >Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> >
> >---
> >
> >diff --git a/backends/hostmem.c b/backends/hostmem.c
> >index ca10c51..a9905c0 100644
> >--- a/backends/hostmem.c
> >+++ b/backends/hostmem.c
> >@@ -304,7 +304,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
> >          /* ensure policy won't be ignored in case memory is preallocated
> >           * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
> >           * this doesn't catch hugepage case. */
> >-        unsigned flags = MPOL_MF_STRICT;
> >+        unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
> >
> >          /* check for invalid host-nodes and policies and give more verbose
> >           * error messages than mbind(). */
> >
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] mlock: fix bug when mlockall called before mbind
  2014-08-14  7:15     ` Hu Tao
@ 2014-08-14  9:09       ` zhanghailiang
  2014-08-14  9:56         ` Michael S. Tsirkin
  0 siblings, 1 reply; 7+ messages in thread
From: zhanghailiang @ 2014-08-14  9:09 UTC (permalink / raw)
  To: Hu Tao
  Cc: Michael S. Tsirkin, luonengjun, qemu-devel, xiexiangyou,
	peter.huangpeng, aliguori, pbonzini, imammedo, gaowanlong

On 2014/8/14 15:15, Hu Tao wrote:
> On Thu, Aug 14, 2014 at 02:31:41PM +0800, zhanghailiang wrote:
>> On 2014/8/13 19:50, Michael S. Tsirkin wrote:
>>> On Wed, Aug 13, 2014 at 07:21:57PM +0800, zhanghailiang wrote:
>>>> If we configure qemu with realtime-mlock-on and memory-node-bind at the same time,
>>>> Qemu will fail to start, and mbind() fails with message "Input/output error".
>>>>
>>>>>  From man page:
>>>> int mbind(void *addr, unsigned long len, int mode,
>>>>                   unsigned long *nodemask, unsigned long maxnode,
>>>>                   unsigned flags);
>>>> The *MPOL_BIND* mode specifies a strict policy that restricts memory allocation
>>>> to the nodes specified in nodemask.
>>>> If *MPOL_MF_STRICT* is passed in flags and policy is not MPOL_DEFAULT(In qemu
>>>> here is MPOL_BIND), then the call will fail with the error EIO if the existing
>>>> pages in  the memory range don't follow the policy.
>>>>
>>>> The memory locked ahead by mlockall can not guarantee to follow the policy above,
>>>> And if that happens, it will result in an EIO error.
>>>>
>>>> So we should call mlock after mbind, here we adjust the place where called mlock,
>>>> Move it to function pc_memory_init.
>>>>
>>>> Signed-off-by: xiexiangyou<xiexiangyou@huawei.com>
>>>> Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
>>>
>>> OK but won't this still fail in case of memory hotplug?
>>> We set MCL_FUTURE so the same will apply?
>
> It has already been set.
>
>>> Maybe it's enough to set MPOL_MF_MOVE?
>>> Does the following work for you?
>>>
>>
>> Hi Michael,
>>
>> I have tested memory hotplug, use virsh command like
>> 'virsh setmem redhat-6.4 6388608 --config --live', it is OK, and
>> it will not call mbind when do such memory hotplug.
>> but i don't know if there is command like 'memory-node hotplug' ?
>
> Using qemu monitor command:
> object_add memory-backend-ram,id=ram1,size=128M,host-nodes=0,policy=bind
>

Hi Hu Tao,

I have tested it use the above command, and yes, it failed.
And if i remove the *MCL_FUTURE* flag of mlockall(), it will work fine.

*MCL_FUTURE*  Lock  all  pages  which  will become mapped into the
address space of the process in the future.(From man page)

So i think we could remove this flag, and call mlockall(MCL_CURRENT)
every time when we do the above 'memory object add'.

>>
>> MPOL_MF_MOVE can work, it is more simple, but it is not perfect. It
>> consumes more time to *move the memory*(i guess will reconstruct
>> pages and copy memory)which has been locked by mlockall. The result
>> is VM will start slower than the above scenario.
>
> I think your patch makes sense but it doesn't work for hotplugged
> memory. We need MPOL_MF_MOVE, too.
>

Thank you very much, before send another patch, I will look into the
qemu process of hotplug memory, and try to solve this fault.:)

>>
>> BTW, i think the follow process is clearer and more logical:
>> Allocate memory--->Set memory policy--->Lock memory.
>>
>> So what's your opinion? Thanks very much.
>>
>>
>>> -->
>>>
>>> hostmem: set MPOL_MF_MOVE
>>>
>>> When memory is allocated on a wrong node, MPOL_MF_STRICT
>>> doesn't move it - it just fails the allocation.
>>> A simple way to reproduce the failure is with mlock=on
>>> realtime feature.
>>>
>>> The code comment actually says: "ensure policy won't be ignored"
>>> so setting MPOL_MF_MOVE seems like a better way to do this.
>>>
>>> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>>>
>>> ---
>>>
>>> diff --git a/backends/hostmem.c b/backends/hostmem.c
>>> index ca10c51..a9905c0 100644
>>> --- a/backends/hostmem.c
>>> +++ b/backends/hostmem.c
>>> @@ -304,7 +304,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
>>>           /* ensure policy won't be ignored in case memory is preallocated
>>>            * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
>>>            * this doesn't catch hugepage case. */
>>> -        unsigned flags = MPOL_MF_STRICT;
>>> +        unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
>>>
>>>           /* check for invalid host-nodes and policies and give more verbose
>>>            * error messages than mbind(). */
>>>
>>
>
>
> .
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] mlock: fix bug when mlockall called before mbind
  2014-08-14  9:09       ` zhanghailiang
@ 2014-08-14  9:56         ` Michael S. Tsirkin
  2014-08-18  0:36           ` zhanghailiang
  0 siblings, 1 reply; 7+ messages in thread
From: Michael S. Tsirkin @ 2014-08-14  9:56 UTC (permalink / raw)
  To: zhanghailiang
  Cc: Hu Tao, luonengjun, peter.huangpeng, xiexiangyou, qemu-devel,
	aliguori, pbonzini, imammedo, gaowanlong

On Thu, Aug 14, 2014 at 05:09:00PM +0800, zhanghailiang wrote:
> On 2014/8/14 15:15, Hu Tao wrote:
> >On Thu, Aug 14, 2014 at 02:31:41PM +0800, zhanghailiang wrote:
> >>On 2014/8/13 19:50, Michael S. Tsirkin wrote:
> >>>On Wed, Aug 13, 2014 at 07:21:57PM +0800, zhanghailiang wrote:
> >>>>If we configure qemu with realtime-mlock-on and memory-node-bind at the same time,
> >>>>Qemu will fail to start, and mbind() fails with message "Input/output error".
> >>>>
> >>>>> From man page:
> >>>>int mbind(void *addr, unsigned long len, int mode,
> >>>>                  unsigned long *nodemask, unsigned long maxnode,
> >>>>                  unsigned flags);
> >>>>The *MPOL_BIND* mode specifies a strict policy that restricts memory allocation
> >>>>to the nodes specified in nodemask.
> >>>>If *MPOL_MF_STRICT* is passed in flags and policy is not MPOL_DEFAULT(In qemu
> >>>>here is MPOL_BIND), then the call will fail with the error EIO if the existing
> >>>>pages in  the memory range don't follow the policy.
> >>>>
> >>>>The memory locked ahead by mlockall can not guarantee to follow the policy above,
> >>>>And if that happens, it will result in an EIO error.
> >>>>
> >>>>So we should call mlock after mbind, here we adjust the place where called mlock,
> >>>>Move it to function pc_memory_init.
> >>>>
> >>>>Signed-off-by: xiexiangyou<xiexiangyou@huawei.com>
> >>>>Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
> >>>
> >>>OK but won't this still fail in case of memory hotplug?
> >>>We set MCL_FUTURE so the same will apply?
> >
> >It has already been set.
> >
> >>>Maybe it's enough to set MPOL_MF_MOVE?
> >>>Does the following work for you?
> >>>
> >>
> >>Hi Michael,
> >>
> >>I have tested memory hotplug, use virsh command like
> >>'virsh setmem redhat-6.4 6388608 --config --live', it is OK, and
> >>it will not call mbind when do such memory hotplug.
> >>but i don't know if there is command like 'memory-node hotplug' ?
> >
> >Using qemu monitor command:
> >object_add memory-backend-ram,id=ram1,size=128M,host-nodes=0,policy=bind
> >
> 
> Hi Hu Tao,
> 
> I have tested it use the above command, and yes, it failed.
> And if i remove the *MCL_FUTURE* flag of mlockall(), it will work fine.
> 
> *MCL_FUTURE*  Lock  all  pages  which  will become mapped into the
> address space of the process in the future.(From man page)
> 
> So i think we could remove this flag, and call mlockall(MCL_CURRENT)
> every time when we do the above 'memory object add'.

No that's wrong. We want allocations of qemu memory locked as well -
not just guest memory.

> >>
> >>MPOL_MF_MOVE can work, it is more simple, but it is not perfect. It
> >>consumes more time to *move the memory*(i guess will reconstruct
> >>pages and copy memory)which has been locked by mlockall. The result
> >>is VM will start slower than the above scenario.
> >
> >I think your patch makes sense but it doesn't work for hotplugged
> >memory. We need MPOL_MF_MOVE, too.
> >
> 
> Thank you very much, before send another patch, I will look into the
> qemu process of hotplug memory, and try to solve this fault.:)

Does setting MPOL_MF_MOVE solve all problems?

> >>
> >>BTW, i think the follow process is clearer and more logical:
> >>Allocate memory--->Set memory policy--->Lock memory.
> >>
> >>So what's your opinion? Thanks very much.
> >>
> >>
> >>>-->
> >>>
> >>>hostmem: set MPOL_MF_MOVE
> >>>
> >>>When memory is allocated on a wrong node, MPOL_MF_STRICT
> >>>doesn't move it - it just fails the allocation.
> >>>A simple way to reproduce the failure is with mlock=on
> >>>realtime feature.
> >>>
> >>>The code comment actually says: "ensure policy won't be ignored"
> >>>so setting MPOL_MF_MOVE seems like a better way to do this.
> >>>
> >>>Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> >>>
> >>>---
> >>>
> >>>diff --git a/backends/hostmem.c b/backends/hostmem.c
> >>>index ca10c51..a9905c0 100644
> >>>--- a/backends/hostmem.c
> >>>+++ b/backends/hostmem.c
> >>>@@ -304,7 +304,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
> >>>          /* ensure policy won't be ignored in case memory is preallocated
> >>>           * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
> >>>           * this doesn't catch hugepage case. */
> >>>-        unsigned flags = MPOL_MF_STRICT;
> >>>+        unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
> >>>
> >>>          /* check for invalid host-nodes and policies and give more verbose
> >>>           * error messages than mbind(). */
> >>>
> >>
> >
> >
> >.
> >
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] mlock: fix bug when mlockall called before mbind
  2014-08-14  9:56         ` Michael S. Tsirkin
@ 2014-08-18  0:36           ` zhanghailiang
  0 siblings, 0 replies; 7+ messages in thread
From: zhanghailiang @ 2014-08-18  0:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Hu Tao, luonengjun, peter.huangpeng, xiexiangyou, qemu-devel,
	aliguori, pbonzini, imammedo, gaowanlong

On 2014/8/14 17:56, Michael S. Tsirkin wrote:
> On Thu, Aug 14, 2014 at 05:09:00PM +0800, zhanghailiang wrote:
>> On 2014/8/14 15:15, Hu Tao wrote:
>>> On Thu, Aug 14, 2014 at 02:31:41PM +0800, zhanghailiang wrote:
>>>> On 2014/8/13 19:50, Michael S. Tsirkin wrote:
>>>>> On Wed, Aug 13, 2014 at 07:21:57PM +0800, zhanghailiang wrote:
>>>>>> If we configure qemu with realtime-mlock-on and memory-node-bind at the same time,
>>>>>> Qemu will fail to start, and mbind() fails with message "Input/output error".
>>>>>>
>>>>>>>  From man page:
>>>>>> int mbind(void *addr, unsigned long len, int mode,
>>>>>>                   unsigned long *nodemask, unsigned long maxnode,
>>>>>>                   unsigned flags);
>>>>>> The *MPOL_BIND* mode specifies a strict policy that restricts memory allocation
>>>>>> to the nodes specified in nodemask.
>>>>>> If *MPOL_MF_STRICT* is passed in flags and policy is not MPOL_DEFAULT(In qemu
>>>>>> here is MPOL_BIND), then the call will fail with the error EIO if the existing
>>>>>> pages in  the memory range don't follow the policy.
>>>>>>
>>>>>> The memory locked ahead by mlockall can not guarantee to follow the policy above,
>>>>>> And if that happens, it will result in an EIO error.
>>>>>>
>>>>>> So we should call mlock after mbind, here we adjust the place where called mlock,
>>>>>> Move it to function pc_memory_init.
>>>>>>
>>>>>> Signed-off-by: xiexiangyou<xiexiangyou@huawei.com>
>>>>>> Signed-off-by: zhanghailiang<zhang.zhanghailiang@huawei.com>
>>>>>
>>>>> OK but won't this still fail in case of memory hotplug?
>>>>> We set MCL_FUTURE so the same will apply?
>>>
>>> It has already been set.
>>>
>>>>> Maybe it's enough to set MPOL_MF_MOVE?
>>>>> Does the following work for you?
>>>>>
>>>>
>>>> Hi Michael,
>>>>
>>>> I have tested memory hotplug, use virsh command like
>>>> 'virsh setmem redhat-6.4 6388608 --config --live', it is OK, and
>>>> it will not call mbind when do such memory hotplug.
>>>> but i don't know if there is command like 'memory-node hotplug' ?
>>>
>>> Using qemu monitor command:
>>> object_add memory-backend-ram,id=ram1,size=128M,host-nodes=0,policy=bind
>>>
>>
>> Hi Hu Tao,
>>
>> I have tested it use the above command, and yes, it failed.
>> And if i remove the *MCL_FUTURE* flag of mlockall(), it will work fine.
>>
>> *MCL_FUTURE*  Lock  all  pages  which  will become mapped into the
>> address space of the process in the future.(From man page)
>>
>> So i think we could remove this flag, and call mlockall(MCL_CURRENT)
>> every time when we do the above 'memory object add'.
>
> No that's wrong. We want allocations of qemu memory locked as well -
> not just guest memory.
>
>>>>
>>>> MPOL_MF_MOVE can work, it is more simple, but it is not perfect. It
>>>> consumes more time to *move the memory*(i guess will reconstruct
>>>> pages and copy memory)which has been locked by mlockall. The result
>>>> is VM will start slower than the above scenario.
>>>
>>> I think your patch makes sense but it doesn't work for hotplugged
>>> memory. We need MPOL_MF_MOVE, too.
>>>
>>
>> Thank you very much, before send another patch, I will look into the
>> qemu process of hotplug memory, and try to solve this fault.:)
>
> Does setting MPOL_MF_MOVE solve all problems?
>

Hmm, if we don't consider its side-effect(VM will start slower),
That solves all problems. :)

>>>>
>>>> BTW, i think the follow process is clearer and more logical:
>>>> Allocate memory--->Set memory policy--->Lock memory.
>>>>
>>>> So what's your opinion? Thanks very much.
>>>>
>>>>
>>>>> -->
>>>>>
>>>>> hostmem: set MPOL_MF_MOVE
>>>>>
>>>>> When memory is allocated on a wrong node, MPOL_MF_STRICT
>>>>> doesn't move it - it just fails the allocation.
>>>>> A simple way to reproduce the failure is with mlock=on
>>>>> realtime feature.
>>>>>
>>>>> The code comment actually says: "ensure policy won't be ignored"
>>>>> so setting MPOL_MF_MOVE seems like a better way to do this.
>>>>>
>>>>> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>>>>>
>>>>> ---
>>>>>
>>>>> diff --git a/backends/hostmem.c b/backends/hostmem.c
>>>>> index ca10c51..a9905c0 100644
>>>>> --- a/backends/hostmem.c
>>>>> +++ b/backends/hostmem.c
>>>>> @@ -304,7 +304,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
>>>>>           /* ensure policy won't be ignored in case memory is preallocated
>>>>>            * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
>>>>>            * this doesn't catch hugepage case. */
>>>>> -        unsigned flags = MPOL_MF_STRICT;
>>>>> +        unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
>>>>>
>>>>>           /* check for invalid host-nodes and policies and give more verbose
>>>>>            * error messages than mbind(). */
>>>>>
>>>>
>>>
>>>
>>> .
>>>
>>
>
> .
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-08-18  0:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-13 11:21 [Qemu-devel] [PATCH] mlock: fix bug when mlockall called before mbind zhanghailiang
2014-08-13 11:50 ` Michael S. Tsirkin
2014-08-14  6:31   ` zhanghailiang
2014-08-14  7:15     ` Hu Tao
2014-08-14  9:09       ` zhanghailiang
2014-08-14  9:56         ` Michael S. Tsirkin
2014-08-18  0:36           ` zhanghailiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).