From: Tao Xu <tao3.xu@intel.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: ehabkost@redhat.com, jingqi.liu@intel.com, fan.du@intel.com,
qemu-devel@nongnu.org, daniel@linux.ibm.com,
jonathan.cameron@huawei.com, dan.j.williams@intel.com
Subject: Re: [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes
Date: Wed, 14 Aug 2019 10:24:03 +0800 [thread overview]
Message-ID: <24976688-d8f1-4de0-870d-73b96c20c300@intel.com> (raw)
In-Reply-To: <20190813170027.0617b129@redhat.com>
On 8/13/2019 11:00 PM, Igor Mammedov wrote:
> On Fri, 9 Aug 2019 14:57:25 +0800
> Tao <tao3.xu@intel.com> wrote:
>
>> From: Tao Xu <tao3.xu@intel.com>
>>
>> In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
>> The initiator represents processor which access to memory. And in 5.2.27.3
>> Memory Proximity Domain Attributes Structure, the attached initiator is
>> defined as where the memory controller responsible for a memory proximity
>> domain. With attached initiator information, the topology of heterogeneous
>> memory can be described.
>>
>> Extend CLI of "-numa node" option to indicate the initiator numa node-id.
>> In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
>> the platform's HMAT tables.
>>
>> Reviewed-by: Jingqi Liu <Jingqi.liu@intel.com>
>> Suggested-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
>>
>> No changes in v9
>> ---
[...]
>> +
>> + for (i = 0; i < machine->numa_state->num_nodes; i++) {
>> + if (numa_info[i].initiator_valid &&
>> + !numa_info[numa_info[i].initiator].has_cpu) {
> ^^^^^^^^^^^^^^^^^^^^^^ possible out of bounds read, see bellow
>
I will add a error "if (numa_info[i].initiator >= MAX_NODES)" when input.
>> + error_report("The initiator-id %"PRIu16 " of NUMA node %d"
>> + " does not exist.", numa_info[i].initiator, i);
>> + error_printf("\n");
>> +
>> + exit(1);
>> + }
> it takes care only about nodes that have cpus or memory-only ones that have
> initiator explicitly provided on CLI. And leaves possibility to have
> memory-only nodes without initiator mixed with nodes that have initiator.
> Is it valid to have mixed configuration?
> Should we forbid it?
>
Mixed configuration may indeed trigger bug in the future. Because in
this patches we default generate HMAT. But mixed configuration situation
or without initiator setting will let mem-only node "Flags" field 0,
then the Proximity Domain for the Attached Initiator field is not
valid.
List are three situations:
1) full configuration, just like
-object memory-backend-ram,size=1G,id=m0 \
-object memory-backend-ram,size=1G,id=m1 \
-object memory-backend-ram,size=1G,id=m2 \
-numa node,nodeid=0,memdev=m0 \
-numa node,nodeid=1,memdev=m1,initiator=0 \
-numa node,nodeid=2,memdev=m2,initiator=0
2) mixed configuration, just like
-object memory-backend-ram,size=1G,id=m0 \
-object memory-backend-ram,size=1G,id=m1 \
-object memory-backend-ram,size=1G,id=m2 \
-numa node,nodeid=0,memdev=m0 \
-numa node,nodeid=1,memdev=m1,initiator=0 \
-numa node,nodeid=2,memdev=m2
3) no configuration, just like
-object memory-backend-ram,size=1G,id=m0 \
-object memory-backend-ram,size=1G,id=m1 \
-object memory-backend-ram,size=1G,id=m2 \
-numa node,nodeid=0,memdev=m0 \
-numa node,nodeid=1,memdev=m1 \
-numa node,nodeid=2,memdev=m2
I have 3 ideas:
1. HMAT option. Add a machine option like "-machine,hmat=yes", then qemu
can have HMAT.
2. Default setting. The numa without initiator default set numa node
which has cpu 0 as initiator.
3. Auto setting. intelligent auto configuration like
numa_default_auto_assign_ram, auto set initiator of the memory-only
nodes averagely.
Therefore, there are 2 different solution:
1) HMAT option + Default setting
2) HMAT option + Auto setting
>> + }
>> +
>> if (s->len && !qtest_enabled()) {
>> warn_report("CPU(s) not present in any NUMA nodes: %s",
>> s->str);
>> diff --git a/hw/core/numa.c b/hw/core/numa.c
>> index 8fcbba05d6..cfb6339810 100644
>> --- a/hw/core/numa.c
>> +++ b/hw/core/numa.c
>> @@ -128,6 +128,19 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>> numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
>> numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
>> }
>> +
>> + if (node->has_initiator) {
>> + if (numa_info[nodenr].initiator_valid &&
>> + (node->initiator != numa_info[nodenr].initiator)) {
>> + error_setg(errp, "The initiator of NUMA node %" PRIu16 " has been "
>> + "set to node %" PRIu16, nodenr,
>> + numa_info[nodenr].initiator);
>> + return;
>> + }
>> +
>> + numa_info[nodenr].initiator_valid = true;
>> + numa_info[nodenr].initiator = node->initiator;
> ^^^
> not validated user input? (which could lead to read beyond numa_info[] boundaries
> in previous hunk).
>
>> + }
>> numa_info[nodenr].present = true;
>> max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
>> ms->numa_state->num_nodes++;
>> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
>> index 76da3016db..46ad06e000 100644
>> --- a/include/sysemu/numa.h
>> +++ b/include/sysemu/numa.h
>> @@ -10,6 +10,9 @@ struct NodeInfo {
>> uint64_t node_mem;
>> struct HostMemoryBackend *node_memdev;
>> bool present;
>> + bool has_cpu;
>> + bool initiator_valid;
>> + uint16_t initiator;
>> uint8_t distance[MAX_NODES];
>> };
>>
>> diff --git a/qapi/machine.json b/qapi/machine.json
>> index 6db8a7e2ec..05e367d26a 100644
>> --- a/qapi/machine.json
>> +++ b/qapi/machine.json
>> @@ -414,6 +414,9 @@
>> # @memdev: memory backend object. If specified for one node,
>> # it must be specified for all nodes.
>> #
>> +# @initiator: the initiator numa nodeid that is closest (as in directly
>> +# attached) to this numa node (since 4.2)
> well, it's pretty unclear what doc comment means (unless reader knows well
> specific part of ACPI spec)
>
> suggest to rephrase to something more understandable for unaware
> readers (+ possible reference to spec for those who is interested
> in spec definition since this doc is meant for developers).
>
>> +#
>> # Since: 2.1
>> ##
>> { 'struct': 'NumaNodeOptions',
>> @@ -421,7 +424,8 @@
>> '*nodeid': 'uint16',
>> '*cpus': ['uint16'],
>> '*mem': 'size',
>> - '*memdev': 'str' }}
>> + '*memdev': 'str',
>> + '*initiator': 'uint16' }}
>>
>> ##
>> # @NumaDistOptions:
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 9621e934c0..c480781992 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -161,14 +161,14 @@ If any on the three values is given, the total number of CPUs @var{n} can be omi
>> ETEXI
>>
>> DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>> - "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
>> - "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
>> + "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>> + "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>> "-numa dist,src=source,dst=destination,val=distance\n"
>> "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
>> QEMU_ARCH_ALL)
>> STEXI
>> -@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
>> -@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
>> +@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>> +@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>> @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
>> @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
>> @findex -numa
>> @@ -215,6 +215,25 @@ split equally between them.
>> @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore,
>> if one node uses @samp{memdev}, all of them have to use it.
>>
>> +@samp{initiator} indicate the initiator NUMA @var{initiator} that is
> ^^^^^^^ ^^^^^^^^^^^^^^
> above will result in "initiator NUMA initiator", was it your intention?
>
>> +closest (as in directly attached) to this NUMA @var{node}.
> Again suggest replace spec language with something more user friendly
> (this time without spec reference as it's geared for end user)
>
>> +For example, the following option assigns 2 NUMA nodes, node 0 has CPU.
> Following example creates a machine with 2 NUMA ...
>
>> +node 1 has only memory, and its' initiator is node 0. Note that because
>> +node 0 has CPU, by default the initiator of node 0 is itself and must be
>> +itself.
>> +@example
>> +-M pc \
>> +-m 2G,slots=2,maxmem=4G \
>> +-object memory-backend-ram,size=1G,id=m0 \
>> +-object memory-backend-ram,size=1G,id=m1 \
>> +-numa node,nodeid=0,memdev=m0 \
>> +-numa node,nodeid=1,memdev=m1,initiator=0 \
>> +-smp 2,sockets=2,maxcpus=2 \
>> +-numa cpu,node-id=0,socket-id=0 \
>> +-numa cpu,node-id=0,socket-id=1 \
>> +@end example
>> +
>> @var{source} and @var{destination} are NUMA node IDs.
>> @var{distance} is the NUMA distance from @var{source} to @var{destination}.
>> The distance from a node to itself is always 10. If any pair of nodes is
>
next prev parent reply other threads:[~2019-08-14 2:25 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-09 6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
2019-08-09 6:57 ` [Qemu-devel] [PATCH v9 01/11] hw/arm: simplify arm_load_dtb Tao
2019-08-13 21:55 ` Alistair Francis
2019-08-14 1:19 ` Andrew Jeffery
2019-08-13 21:55 ` Eduardo Habkost
2019-08-14 13:08 ` Cédric Le Goater
2019-08-09 6:57 ` [Qemu-devel] [PATCH v9 02/11] numa: move numa global variable nb_numa_nodes into MachineState Tao
2019-08-09 6:57 ` [Qemu-devel] [PATCH v9 03/11] numa: move numa global variable have_numa_distance " Tao
2019-08-09 6:57 ` [Qemu-devel] [PATCH v9 04/11] numa: move numa global variable numa_info " Tao
2019-08-09 6:57 ` [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes Tao
2019-08-13 15:00 ` Igor Mammedov
2019-08-14 2:24 ` Tao Xu [this message]
2019-08-16 14:47 ` Igor Mammedov
2019-08-14 2:39 ` Dan Williams
2019-08-14 5:13 ` Tao Xu
2019-08-14 21:29 ` Dan Williams
2019-08-15 1:56 ` Tao Xu
2019-08-15 2:31 ` Dan Williams
2019-08-16 14:57 ` Igor Mammedov
2019-08-20 8:34 ` Tao Xu
2019-08-27 13:12 ` Igor Mammedov
2019-08-28 1:09 ` Tao Xu
2019-08-09 6:57 ` [Qemu-devel] [PATCH v9 06/11] hmat acpi: Build Memory Proximity Domain Attributes Structure(s) Tao
2019-08-09 6:57 ` [Qemu-devel] [PATCH v9 07/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao
2019-08-09 6:57 ` [Qemu-devel] [PATCH v9 08/11] hmat acpi: Build Memory Side Cache " Tao
2019-08-09 6:57 ` [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information Tao
2019-08-12 5:13 ` Daniel Black
2019-08-12 6:11 ` Tao Xu
2019-08-13 15:11 ` Eric Blake
2019-08-14 2:58 ` Tao Xu
2019-08-09 6:57 ` [Qemu-devel] [PATCH v9 10/11] numa: Extend the CLI to provide memory side cache information Tao
2019-08-09 6:57 ` [Qemu-devel] [PATCH v9 11/11] tests/bios-tables-test: add test cases for ACPI HMAT Tao
2019-08-09 11:11 ` [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) no-reply
2019-08-13 8:53 ` Tao Xu
2019-08-14 20:57 ` Eduardo Habkost
2019-08-15 0:53 ` Tao Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=24976688-d8f1-4de0-870d-73b96c20c300@intel.com \
--to=tao3.xu@intel.com \
--cc=dan.j.williams@intel.com \
--cc=daniel@linux.ibm.com \
--cc=ehabkost@redhat.com \
--cc=fan.du@intel.com \
--cc=imammedo@redhat.com \
--cc=jingqi.liu@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).