qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>,
	qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
	Alexander Graf <agraf@suse.de>
Subject: Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes
Date: Mon, 16 Jun 2014 11:49:46 -0700	[thread overview]
Message-ID: <20140616184945.GI16644@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140616161500.GD3222@otherpad.lan.raisama.net>

On 16.06.2014 [13:15:00 -0300], Eduardo Habkost wrote:
> On Mon, Jun 16, 2014 at 05:53:53PM +1000, Alexey Kardashevskiy wrote:
> > Currently NUMA nodes must go consequently and QEMU ignores nodes
> > with @nodeid bigger than the number of NUMA nodes in the command line.
> 
> Why would somebody need a NUMA node with nodeid bigger than the number
> of NUMA nodes? NUMA node IDs must be in the [0, numa_nodes-1] range.

That is not how the code works currently.

vl.c::numa_add()
...
        if (get_param_value(option, 128, "nodeid", optarg) == 0) {
            nodenr = nb_numa_nodes;
        } else {
            if (parse_uint_full(option, &nodenr, 10) < 0) {
                fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option);
                exit(1);
            }
        }
...
        if (get_param_value(option, 128, "mem", optarg) == 0) {
            node_mem[nodenr] = 0;
        } else {
            int64_t sval;
            sval = strtosz(option, &endptr);
            if (sval < 0 || *endptr) {
                fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
                exit(1);
            }
            node_mem[nodenr] = sval;
        }
...
	nb_numa_nodes++;
...

So if a user passes nodeid= to the NUMA node definition, that entry in
node_mem is set to the appropriate value, but nb_numa_nodes, which is
used to bound the iteration of that array is not bumped appropriately.
So we end up looking at arbitrary indices in the node_mem array, which
are often 0.

Note also that means that we can't generically differentiate here
between a user-defined memoryless node and one that happens to be 0
because the particular nodeid was not specified on the command-line.
Alexey, how do you differentiate between these two cases after your
patches? In patch 3, I see you check (and skip in the loop) explicitly
if !node_mem[nodeid], but I'm not sure how that check can differentiate
between the statically 0 (from main's intialization loop) and when a
user says a node's memory is 0. Probably something obvious I'm missing
(it is Monday after all)...

> > This prevents us from creating memory-less nodes which is possible
> > situation in POWERPC under pHyp or Sapphire.
> 
> Why? If I recall correctly, nodes without any CPUs or any memory are
> already allowed.

They are allowed, but it seems like the code throughout qemu (where it's
relevant) assumes that NUMA nodes are sequential and continuous, but
that's not required (nor is it enforced on the command-line).

> How exactly would this patch help you? How do you expect the
> command-line to look like for your use case?

Alexey has replied with that, it looks like.

> > This makes nb_numa_nodes a total number of nodes or the biggest node
> > number + 1 whichever is greater.
> > 
> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> >  vl.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/vl.c b/vl.c
> > index ac0e3d7..f1b75cb 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -1356,7 +1356,7 @@ static void numa_add(const char *optarg)
> >          if (get_param_value(option, 128, "cpus", optarg) != 0) {
> >              numa_node_parse_cpus(nodenr, option);
> >          }
> > -        nb_numa_nodes++;
> > +        nb_numa_nodes = MAX(nb_numa_nodes + 1, nodenr + 1);
> 
> I would instead suggest that if any node in the [0, max_node_id] range
> is not present on the command-line, QEMU should instead reject the
> command-line.

We already check two things:

Too many nodes (meaning we've filled the array):
        if (nb_numa_nodes >= MAX_NODES) {
            fprintf(stderr, "qemu: too many NUMA nodes\n");
            exit(1);
        }

Node ID itself is out of range (due to the use of an array):
        if (nodenr >= MAX_NODES) {
            fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr);
            exit(1);
        }

But that doesn't prevent one from having *sparse* NUMA node IDs. And, as
far as I can tell, this is allowed by the spec, but isn't properly
supported by qemu.

Thanks,
Nish

  reply	other threads:[~2014-06-16 18:50 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-16  7:53 [Qemu-devel] [PATCH 0/7] spapr: rework memory nodes Alexey Kardashevskiy
2014-06-16  7:53 ` [Qemu-devel] [PATCH 1/7] spapr: Move DT memory node rendering to a helper Alexey Kardashevskiy
2014-06-16  7:53 ` [Qemu-devel] [PATCH 2/7] spapr: Use DT memory node rendering helper for other nodes Alexey Kardashevskiy
2014-06-16  7:53 ` [Qemu-devel] [PATCH 3/7] spapr: Refactor spapr_populate_memory() Alexey Kardashevskiy
2014-06-18  5:04   ` Alexey Kardashevskiy
2014-06-20 19:10   ` Nishanth Aravamudan
2014-06-21  3:08     ` Alexey Kardashevskiy
2014-06-23 17:41       ` Nishanth Aravamudan
2014-06-23 22:02         ` Alexey Kardashevskiy
2014-06-20 22:55   ` Nishanth Aravamudan
2014-06-21  3:06     ` Alexey Kardashevskiy
2014-06-23 17:40       ` Nishanth Aravamudan
2014-06-24  6:07         ` Alexey Kardashevskiy
2014-06-24 17:07           ` Nishanth Aravamudan
2014-06-24  3:08       ` Nishanth Aravamudan
2014-06-24  6:14         ` Alexey Kardashevskiy
2014-06-24 17:01           ` Nishanth Aravamudan
2014-07-21 18:08           ` Nishanth Aravamudan
2014-06-16  7:53 ` [Qemu-devel] [PATCH 4/7] spapr: Split memory nodes to power-of-two blocks Alexey Kardashevskiy
2014-06-17  7:07   ` Alexey Kardashevskiy
2014-06-16  7:53 ` [Qemu-devel] [PATCH 5/7] spapr: Add a helper for node0_size calculation Alexey Kardashevskiy
2014-06-16 18:43   ` Nishanth Aravamudan
2014-06-16  7:53 ` [Qemu-devel] [PATCH 6/7] spapr: Fix ibm, associativity for memory nodes Alexey Kardashevskiy
2014-06-16  7:53 ` [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes Alexey Kardashevskiy
2014-06-16 16:15   ` Eduardo Habkost
2014-06-16 18:49     ` Nishanth Aravamudan [this message]
2014-06-16 20:11       ` Eduardo Habkost
2014-06-16 20:31         ` Eduardo Habkost
2014-06-17  0:21           ` Nishanth Aravamudan
2014-06-17  0:16         ` Nishanth Aravamudan
2014-06-16  8:16 ` [Qemu-devel] [PATCH 0/7] spapr: rework memory nodes Alexey Kardashevskiy
2014-06-16 18:26   ` Nishanth Aravamudan
2014-06-16 20:51   ` Eduardo Habkost
2014-06-17  0:25     ` Nishanth Aravamudan
2014-06-17  1:37       ` Eduardo Habkost
2014-06-17 18:36         ` Nishanth Aravamudan
2014-06-17  1:41       ` Eduardo Habkost
2014-06-17 18:37         ` Nishanth Aravamudan
2014-06-17  5:51     ` Alexey Kardashevskiy
2014-06-17 14:07       ` Eduardo Habkost
2014-06-17 18:38         ` Nishanth Aravamudan
2014-06-17 19:22           ` Eduardo Habkost
2014-06-18 18:28             ` Nishanth Aravamudan
2014-06-18 19:33               ` Eduardo Habkost
2014-06-18 23:58                 ` Nishanth Aravamudan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140616184945.GI16644@linux.vnet.ibm.com \
    --to=nacc@linux.vnet.ibm.com \
    --cc=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=ehabkost@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).