qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Jones <drjones@redhat.com>
To: gaowanlong@cn.fujitsu.com
Cc: aliguori@us.ibm.com, ehabkost@redhat.com, hutao@cn.fujitsu.com,
	peter huangpeng <peter.huangpeng@huawei.com>,
	qemu-devel@nongnu.org, bsd@redhat.com,
	Paolo Bonzini <pbonzini@redhat.com>,
	y-goto@jp.fujitsu.com, lcapitulino@redhat.com, lersek@redhat.com,
	afaerber@suse.de
Subject: Re: [Qemu-devel] [PATCH V9 06/12] NUMA: Add Linux libnuma detection
Date: Thu, 29 Aug 2013 04:31:53 -0400 (EDT)	[thread overview]
Message-ID: <221143207.3452239.1377765113416.JavaMail.root@redhat.com> (raw)
In-Reply-To: <994970720.3441149.1377764149516.JavaMail.root@redhat.com>



----- Original Message -----
> 
> 
> ----- Original Message -----
> > On 08/28/2013 09:44 PM, Paolo Bonzini wrote:
> > > Il 26/08/2013 10:43, Andrew Jones ha scritto:
> > >>
> > >> ----- Original Message -----
> > >>>> On 08/26/2013 03:46 PM, Andrew Jones wrote:
> > >>>>>>>>>> Is this patch still necessary? I thought that dropping the
> > >>>>>>>>>>>>>> numa_num_configured_nodes() calls from patch 8/12 got rid
> > >>>>>>>>>>>>>> of the need for this library. Maybe I missed other uses?
> > >>>>>>>>>>
> > >>>>>>>>>> Yes, in 08/12 we also use mbind(),
> > >>>>>> You don't need a whole library for mbind(), it's a syscall. See
> > >>>>>> syscall(2).
> > >>>>>>
> > >>>>>>>>>> and in 09/12 we use max_numa_node().
> > >>>>>> Really? I didn't see it there. And anyway, that goes back to our
> > >>>>>> discussion
> > >>>>>> about setting qemu's MAX_NODES to whatever we think qemu should
> > >>>>>> support,
> > >>>>>> and then just checking that we don't blow that limit whenever
> > >>>>>> reading
> > >>>>>> host node info, i.e.
> > >>>>>>
> > >>>>>> maxnode = 0;
> > >>>>>> while (host_nodes[maxnode] && maxnode < MAX_NODES)
> > >>>>>>   node_read(&info[maxnode++]);
> > >>>>>>
> > >>>>>> type of a thing.
> > >>>>>>
> > >>>>>> And, if there's a place you really need to know the current online
> > >>>>>> number
> > >>>>>> of host nodes, then, like I said earlier, you should just go to
> > >>>>>> sysfs
> > >>>>>> yourself. libnuma:numa_max_node() returns an int that it only
> > >>>>>> initializes
> > >>>>>> at library load time, so it's not going to adapt to
> > >>>>>> onlining/offlining.
> > >>>>
> > >>>> OK, thank you.
> > >>>> Then I should define MPOL_* macros in QEMU and use mbind(2) syscall
> > >>>> directly,
> > >>>> right?
> > >> Hmm, yeah, that's too bad that numaif.h is part of libnuma, and not a
> > >> more
> > >> general lib. Whether or not we want to redefine those symbols within
> > >> qemu, in order to avoid the dependency on installing numactl-devel,
> > >> isn't
> > >> something I can answer. That's a better question for Anthony. Anthony?
> > >> Paolo,
> > >> any opinions? Maybe we should pick up uapi/linux/mempolicy.h with the
> > >> linux-header synch script?
> > >>
> > > 
> > > I think using libnuma is fine.  In principle this could be used on other
> > > OSes than Linux, I think?
> > 
> > But seems that mbind(2) is Linux-specific syscall, right?
> > 
> 
> You would need to avoid directly calling mbind, i.e. use libnuma for all
> numa related calls. Then, if libnuma were to support more OSes, qemu would
> automatically (wrt to numa) as well. Your mbind() with libnuma would look
> like this
> 
> numa_set_bind_policy(strict)
> numa_tonodemask_memory(addr, size, nodemask)
> 
> The problem is that set_bind_policy only takes a bool, and thus only
> allows two of the four possibly policies
> 
> MPOL_BIND        strict == 1
> MPOL_PREFERRED   strict == 0
> 

Ah, there is a way to get interleave policy

if (policy == interleave) {
   numa_interleave_memory(addr, size, nodemask)
} else {
   numa_set_bind_policy(strict)
   numa_tonodemask_memory(addr, size, nodemask)
}

a bit clunky. And I still don't see a way to select MPOL_DEFAULT, nor a way to
use any additional flags, such as MPOL_F_RELATIVE_NODES.


> So, due to libnuma's policy setting limitations, and the fact it doesn't
> currently support more OSes than Linux, then I prefer your current
> series version that drops libnuma. If qemu will need to support NUMA on
> another OS, then we can cross this bridge when we get there.

  reply	other threads:[~2013-08-29  8:32 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-23  4:09 [Qemu-devel] [PATCH V9 00/12] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
2013-08-23  4:09 ` [Qemu-devel] [PATCH V9 01/12] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions Wanlong Gao
2013-08-23  4:09 ` [Qemu-devel] [PATCH V9 02/12] NUMA: split -numa option Wanlong Gao
2013-08-23  4:09 ` [Qemu-devel] [PATCH V9 03/12] NUMA: check if the total numa memory size is equal to ram_size Wanlong Gao
2013-08-23  4:09 ` [Qemu-devel] [PATCH V9 04/12] NUMA: move numa related code to numa.c Wanlong Gao
2013-08-23  4:09 ` [Qemu-devel] [PATCH V9 05/12] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
2013-08-23  4:09 ` [Qemu-devel] [PATCH V9 06/12] NUMA: Add Linux libnuma detection Wanlong Gao
2013-08-23  8:40   ` Andrew Jones
2013-08-26  1:43     ` Wanlong Gao
2013-08-26  7:46       ` Andrew Jones
2013-08-26  8:16         ` Wanlong Gao
2013-08-26  8:43           ` Andrew Jones
2013-08-28 13:44             ` Paolo Bonzini
2013-08-29  2:22               ` Wanlong Gao
2013-08-29  8:15                 ` Andrew Jones
2013-08-29  8:31                   ` Andrew Jones [this message]
2013-08-23  4:09 ` [Qemu-devel] [PATCH V9 07/12] NUMA: parse guest numa nodes memory policy Wanlong Gao
2013-08-23 14:11   ` Andrew Jones
2013-08-26  1:07     ` Wanlong Gao
2013-08-26  7:12       ` Andrew Jones
2013-08-23  4:09 ` [Qemu-devel] [PATCH V9 08/12] NUMA: set " Wanlong Gao
2013-08-23  8:44   ` Andrew Jones
2013-08-23  4:10 ` [Qemu-devel] [PATCH V9 09/12] NUMA: add qmp command set-mem-policy to set memory policy for NUMA node Wanlong Gao
2013-08-23  4:10 ` [Qemu-devel] [PATCH V9 10/12] NUMA: add hmp command set-mem-policy Wanlong Gao
2013-08-23  4:10 ` [Qemu-devel] [PATCH V9 11/12] NUMA: add qmp command query-numa Wanlong Gao
2013-08-23  4:10 ` [Qemu-devel] [PATCH V9 12/12] NUMA: convert hmp command info_numa to use qmp command query_numa Wanlong Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=221143207.3452239.1377765113416.JavaMail.root@redhat.com \
    --to=drjones@redhat.com \
    --cc=afaerber@suse.de \
    --cc=aliguori@us.ibm.com \
    --cc=bsd@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=gaowanlong@cn.fujitsu.com \
    --cc=hutao@cn.fujitsu.com \
    --cc=lcapitulino@redhat.com \
    --cc=lersek@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=qemu-devel@nongnu.org \
    --cc=y-goto@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).