Re: [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Tao Xu <tao3.xu@intel.com>
To: Eric Blake <eblake@redhat.com>, daniel@linux.ibm.com
Cc: ehabkost@redhat.com, jingqi.liu@intel.com, fan.du@intel.com,
	qemu-devel@nongnu.org, Markus Armbruster <armbru@redhat.com>,
	jonathan.cameron@huawei.com, imammedo@redhat.com,
	dan.j.williams@intel.com
Subject: Re: [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information
Date: Wed, 14 Aug 2019 10:58:20 +0800	[thread overview]
Message-ID: <431d92ec-0033-bf9c-a47e-507af160c75b@intel.com> (raw)
In-Reply-To: <be305a8a-f1f4-7084-4bb7-4174d530483d@redhat.com>

On 8/13/2019 11:11 PM, Eric Blake wrote:
> On 8/9/19 1:57 AM, Tao wrote:
>> From: Liu Jingqi <jingqi.liu@intel.com>
>>
>> Add -numa hmat-lb option to provide System Locality Latency and
>> Bandwidth Information. These memory attributes help to build
>> System Locality Latency and Bandwidth Information Structure(s)
>> in ACPI Heterogeneous Memory Attribute Table (HMAT).
>>
>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
>>
>> Changes in v9:
>>      - change the CLI input way, make it more user firendly (Daniel Black)
>>      use latency=NUM[p|n|u]s and bandwidth=NUM[M|G|P](B/s) as input and drop
>>      the base-lat and base-bw input.
> 
> Why are you hand-rolling yet another scaling parser instead of reusing
> one that's already in-tree?

Because there are no time scaling parser and QMP 'size' type will use kb 
as default. It is a tricky issue because the entry in HMAT is small(max 
0xffff) and we need to store the unit in HMAT.

But as you mentioned blew, 'str' is not a good choice for QMP.
Therefore, what about this solution:

For bandwidth, reuse the qemu_strtosz_MiB() (because the smllest unit is 
MB/s). For latency, write a time scaling parser named as 
"qemu_strtotime_ps()" and "qemu_strtotime_ns()" in util/cutils.c. And 
then use it to pre-convert them into the single scale (QMP interface can 
use).

At last, in HMAT, we auto store the data, separate it into the same base 
unit and entry, and show error if overflow. Then the HMAT can support as 
large as possible.

I am wondering if this solution is OK.
> 
>> +++ b/hw/core/numa.c
> 
>> +void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
>> +                        Error **errp)
>> +{
> 
>> +    if (node->has_latency) {
>> +        hmat_lb = ms->numa_state->hmat_lb[node->hierarchy][node->data_type];
>> +
>> +        if (!hmat_lb) {
>> +            hmat_lb = g_malloc0(sizeof(*hmat_lb));
>> +            ms->numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
>> +        } else if (hmat_lb->latency[node->initiator][node->target]) {
>> +            error_setg(errp, "Duplicate configuration of the latency for "
>> +                       "initiator=%" PRIu16 " and target=%" PRIu16 ".",
>> +                       node->initiator, node->target);
>> +            return;
>> +        }
>> +
>> +        ret = qemu_strtoui(node->latency, &endptr, 10, &latency);
>> +        if (ret < 0) {
>> +            error_setg(errp, "Invalid latency %s", node->latency);
>> +            return;
>> +        }
>> +
>> +        if (*endptr == '\0') {
>> +            base_lat = 1;
>> +        } else if (*(endptr + 1) == 's') {
>> +            switch (*endptr) {
>> +            case 'p':
>> +                base_lat = 1;
>> +                break;
>> +            case 'n':
>> +                base_lat = PICO_PER_NSEC;
>> +                break;
>> +            case 'u':
>> +                base_lat = PICO_PER_USEC;
>> +                break;
> 
> Hmm - this is a different scaling than any of our existing parsers
> (which assume multiples k/M/G..., not subdivisions u/n/s)
> 
> 
>> +    if (node->has_bandwidth) {
>> +        hmat_lb = ms->numa_state->hmat_lb[node->hierarchy][node->data_type];
>> +
>> +        if (!hmat_lb) {
>> +            hmat_lb = g_malloc0(sizeof(*hmat_lb));
>> +            ms->numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
>> +        } else if (hmat_lb->bandwidth[node->initiator][node->target]) {
>> +            error_setg(errp, "Duplicate configuration of the bandwidth for "
>> +                       "initiator=%" PRIu16 " and target=%" PRIu16 ".",
>> +                       node->initiator, node->target);
>> +            return;
>> +        }
>> +
>> +        ret = qemu_strtoui(node->bandwidth, &endptr, 10, &bandwidth);
>> +        if (ret < 0) {
>> +            error_setg(errp, "Invalid bandwidth %s", node->bandwidth);
>> +            return;
>> +        }
>> +
>> +        switch (toupper(*endptr)) {
>> +        case '\0':
>> +        case 'M':
>> +            base_bw = 1;
>> +            break;
>> +        case 'G':
>> +            base_bw = UINT64_C(1) << 10;
>> +            break;
>> +        case 'P':
>> +            base_bw = UINT64_C(1) << 20;
>> +            break;
> 
> But this one, in addition to being wrong (P is 1<<30, not 1<<20), should
> definitely be reusing qemu_strtosz_metric() or similar (look in
> util/cutils.c).
> 
> 
>> +++ b/qapi/machine.json
>> @@ -377,10 +377,12 @@
>>   #
>>   # @cpu: property based CPU(s) to node mapping (Since: 2.10)
>>   #
>> +# @hmat-lb: memory latency and bandwidth information (Since: 4.2)
>> +#
>>   # Since: 2.1
>>   ##
>>   { 'enum': 'NumaOptionsType',
>> -  'data': [ 'node', 'dist', 'cpu' ] }
>> +  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
>>   
> 
>> +##
>> +# @HmatLBDataType:
>> +#
>> +# Data type in the System Locality Latency
>> +# and Bandwidth Information Structure of HMAT (Heterogeneous
>> +# Memory Attribute Table)
>> +#
>> +# For more information of @HmatLBDataType see
>> +# the chapter 5.2.27.4: Table 5-142:  Field "Data Type" of ACPI 6.3 spec.
>> +#
>> +# @access-latency: access latency (picoseconds)
>> +#
>> +# @read-latency: read latency (picoseconds)
>> +#
>> +# @write-latency: write latency (picoseconds)
>> +#
>> +# @access-bandwidth: access bandwidth (MB/s)
>> +#
>> +# @read-bandwidth: read bandwidth (MB/s)
>> +#
>> +# @write-bandwidth: write bandwidth (MB/s)
> 
> Are these really the best scales?
> 
> 
>> +
>> +##
>> +# @NumaHmatLBOptions:
>> +#
>> +# Set the system locality latency and bandwidth information
>> +# between Initiator and Target proximity Domains.
>> +#
>> +# For more information of @NumaHmatLBOptions see
>> +# the chapter 5.2.27.4: Table 5-142 of ACPI 6.3 spec.
>> +#
>> +# @initiator: the Initiator Proximity Domain.
>> +#
>> +# @target: the Target Proximity Domain.
>> +#
>> +# @hierarchy: the Memory Hierarchy. Indicates the performance
>> +#             of memory or side cache.
>> +#
>> +# @data-type: presents the type of data, access/read/write
>> +#             latency or hit latency.
>> +#
>> +# @latency: the value of latency from @initiator to @target proximity domain,
>> +#           the latency units are "ps(picosecond)", "ns(nanosecond)" or
>> +#           "us(microsecond)".
>> +#
>> +# @bandwidth: the value of bandwidth between @initiator and @target proximity
>> +#             domain, the bandwidth units are "MB(/s)","GB(/s)" or "PB(/s)".
>> +#
>> +# Since: 4.2
>> +##
>> +{ 'struct': 'NumaHmatLBOptions',
>> +    'data': {
>> +    'initiator': 'uint16',
>> +    'target': 'uint16',
>> +    'hierarchy': 'HmatLBMemoryHierarchy',
>> +    'data-type': 'HmatLBDataType',
>> +    '*latency': 'str',
>> +    '*bandwidth': 'str' }}
> 
> ...and then parsing strings instead of taking raw integers?  Parsing
> strings is okay for HMP, but for QMP, our goal should be a single
> representation with no additional sugar on top.  Latency and bandwidth
> should be int in a single scale.
> 
> 
>> +++ b/qemu-options.hx
>> @@ -164,16 +164,19 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>>       "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>>       "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>>       "-numa dist,src=source,dst=destination,val=distance\n"
>> -    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
>> +    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
>> +    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
> 
> Command-line parsing can then take human-written scaled numbers, and
> pre-convert them into the single scale accepted by the QMP interface.
>

next prev parent reply	other threads:[~2019-08-14  2:59 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-09  6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 01/11] hw/arm: simplify arm_load_dtb Tao
2019-08-13 21:55   ` Alistair Francis
2019-08-14  1:19     ` Andrew Jeffery
2019-08-13 21:55   ` Eduardo Habkost
2019-08-14 13:08     ` Cédric Le Goater
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 02/11] numa: move numa global variable nb_numa_nodes into MachineState Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 03/11] numa: move numa global variable have_numa_distance " Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 04/11] numa: move numa global variable numa_info " Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes Tao
2019-08-13 15:00   ` Igor Mammedov
2019-08-14  2:24     ` Tao Xu
2019-08-16 14:47       ` Igor Mammedov
2019-08-14  2:39     ` Dan Williams
2019-08-14  5:13       ` Tao Xu
2019-08-14 21:29         ` Dan Williams
2019-08-15  1:56           ` Tao Xu
2019-08-15  2:31             ` Dan Williams
2019-08-16 14:57               ` Igor Mammedov
2019-08-20  8:34                 ` Tao Xu
2019-08-27 13:12                   ` Igor Mammedov
2019-08-28  1:09                     ` Tao Xu
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 06/11] hmat acpi: Build Memory Proximity Domain Attributes Structure(s) Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 07/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 08/11] hmat acpi: Build Memory Side Cache " Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information Tao
2019-08-12  5:13   ` Daniel Black
2019-08-12  6:11     ` Tao Xu
2019-08-13 15:11   ` Eric Blake
2019-08-14  2:58     ` Tao Xu [this message]
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 10/11] numa: Extend the CLI to provide memory side cache information Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 11/11] tests/bios-tables-test: add test cases for ACPI HMAT Tao
2019-08-09 11:11 ` [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) no-reply
2019-08-13  8:53 ` Tao Xu
2019-08-14 20:57   ` Eduardo Habkost
2019-08-15  0:53     ` Tao Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=431d92ec-0033-bf9c-a47e-507af160c75b@intel.com \
    --to=tao3.xu@intel.com \
    --cc=armbru@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel@linux.ibm.com \
    --cc=eblake@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=fan.du@intel.com \
    --cc=imammedo@redhat.com \
    --cc=jingqi.liu@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).