qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Henrique Barboza <danielhb413@gmail.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org, groug@kaod.org
Subject: Re: [PATCH v2 3/6] spapr_numa: translate regular NUMA distance to PAPR distance
Date: Fri, 25 Sep 2020 09:44:14 -0300	[thread overview]
Message-ID: <5a80b9ae-3954-7e01-dcda-759ce50fe5e7@gmail.com> (raw)
In-Reply-To: <20200925023524.GQ2298@yekko.fritz.box>



On 9/24/20 11:35 PM, David Gibson wrote:
> On Thu, Sep 24, 2020 at 04:50:55PM -0300, Daniel Henrique Barboza wrote:
>> QEMU allows the user to set NUMA distances in the command line.
>> For ACPI architectures like x86, this means that user input is
>> used to populate the SLIT table, and the guest perceives the
>> distances as the user chooses to.
>>
>> PPC64 does not work that way. In the PAPR concept of NUMA,
>> associativity relations between the NUMA nodes are provided by
>> the device tree, and the guest kernel is free to calculate the
>> distances as it sees fit. Given how ACPI architectures works,
>> this puts the pSeries machine in a strange spot - users expect
>> to define NUMA distances like in the ACPI case, but QEMU does
>> not have control over it. To give pSeries users a similar
>> experience, we'll need to bring kernel specifics to QEMU
>> to approximate the NUMA distances.
>>
>> The pSeries kernel works with the NUMA distance range 10,
>> 20, 40, 80 and 160. The code starts at 10 (local distance) and
>> searches for a match in the first NUMA level between the
>> resources. If there is no match, the distance is doubled and
>> then it proceeds to try to match in the next NUMA level. Rinse
>> and repeat for MAX_DISTANCE_REF_POINTS levels.
>>
>> This patch introduces a spapr_numa_PAPRify_distances() helper
>> that translates the user distances to kernel distance, which
>> we're going to use to determine the associativity domains for
>> the NUMA nodes.
>>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> 
> The idea of rounding the distances like this seems pretty good to me.
> Since each level is a multiple of a distance from the preivous one it
> might be more theoretically correct to place the thresholds at the
> geometric mean between each level, rather than the arithmetic mean.  I
> very much doubt it makes much different in practice though, and this
> is simpler.
> 
> There is one nit, I'm less happy with though..
> 
>> ---
>>   hw/ppc/spapr_numa.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 44 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
>> index fe395e80a3..990a5fce08 100644
>> --- a/hw/ppc/spapr_numa.c
>> +++ b/hw/ppc/spapr_numa.c
>> @@ -37,6 +37,49 @@ static bool spapr_numa_is_symmetrical(MachineState *ms)
>>       return true;
>>   }
>>   
>> +/*
>> + * This function will translate the user distances into
>> + * what the kernel understand as possible values: 10
>> + * (local distance), 20, 40, 80 and 160. Current heuristic
>> + * is:
>> + *
>> + *  - distances between 11 and 30 inclusive -> rounded to 20
>> + *  - distances between 31 and 60 inclusive -> rounded to 40
>> + *  - distances between 61 and 120 inclusive -> rounded to 80
>> + *  - everything above 120 -> 160
>> + *
>> + * This step can also be done in the same time as the NUMA
>> + * associativity domains calculation, at the cost of extra
>> + * complexity. We chose to keep it simpler.
>> + *
>> + * Note: this will overwrite the distance values in
>> + * ms->numa_state->nodes.
>> + */
>> +static void spapr_numa_PAPRify_distances(MachineState *ms)
>> +{
>> +    int src, dst;
>> +    int nb_numa_nodes = ms->numa_state->num_nodes;
>> +    NodeInfo *numa_info = ms->numa_state->nodes;
>> +
>> +    for (src = 0; src < nb_numa_nodes; src++) {
>> +        for (dst = src; dst < nb_numa_nodes; dst++) {
>> +            uint8_t distance = numa_info[src].distance[dst];
>> +            uint8_t rounded_distance = 160;
>> +
>> +            if (distance > 11 && distance <= 30) {
>> +                rounded_distance = 20;
>> +            } else if (distance > 31 && distance <= 60) {
>> +                rounded_distance = 40;
>> +            } else if (distance > 61 && distance <= 120) {
>> +                rounded_distance = 80;
>> +            }
>> +
>> +            numa_info[src].distance[dst] = rounded_distance;
>> +            numa_info[dst].distance[src] = rounded_distance;
> 
> ..I don't love the fact that we alter the distance table in place.
> Even though it was never exposed to the guest, I'd prefer not to
> destroy the information the user passed in.  It could lead to
> surprising results with QMP introspection, and it may make future
> extensions more difficult.
> 
> So I'd prefer to either (a) just leave the table as is and round
> on-demand with a paprify_distance(NN) -> {20,40,80,..} type function,
> or (b) create a parallel, spapr local, table with the rounded
> distances


I did something similar with (a) in the very first version of this series.
I'll fall back to on-demand translation logic to avoid changing numa_info.



Thanks,


DHB

> 
>> +        }
>> +    }
>> +}
>> +
>>   void spapr_numa_associativity_init(SpaprMachineState *spapr,
>>                                      MachineState *machine)
>>   {
>> @@ -95,6 +138,7 @@ void spapr_numa_associativity_init(SpaprMachineState *spapr,
>>           exit(EXIT_FAILURE);
>>       }
>>   
>> +    spapr_numa_PAPRify_distances(machine);
>>   }
>>   
>>   void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
> 


  reply	other threads:[~2020-09-25 12:53 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-24 19:50 [PATCH v2 0/6] pseries NUMA distance calculation Daniel Henrique Barboza
2020-09-24 19:50 ` [PATCH v2 1/6] spapr: add spapr_machine_using_legacy_numa() helper Daniel Henrique Barboza
2020-09-24 19:50 ` [PATCH v2 2/6] spapr_numa: forbid asymmetrical NUMA setups Daniel Henrique Barboza
2020-09-25  2:36   ` David Gibson
2020-09-25  3:48   ` David Gibson
2020-09-25 12:41     ` Daniel Henrique Barboza
2020-09-26  7:49       ` David Gibson
2020-09-27 11:41         ` Daniel Henrique Barboza
2020-09-28  6:25           ` David Gibson
2020-09-24 19:50 ` [PATCH v2 3/6] spapr_numa: translate regular NUMA distance to PAPR distance Daniel Henrique Barboza
2020-09-25  2:35   ` David Gibson
2020-09-25 12:44     ` Daniel Henrique Barboza [this message]
2020-09-24 19:50 ` [PATCH v2 4/6] spapr_numa: change reference-points and maxdomain settings Daniel Henrique Barboza
2020-09-25  2:38   ` David Gibson
2020-09-25 13:16   ` Greg Kurz
2020-09-24 19:50 ` [PATCH v2 5/6] spapr_numa: consider user input when defining associativity Daniel Henrique Barboza
2020-09-25  3:39   ` David Gibson
2020-09-25 14:42     ` Daniel Henrique Barboza
2020-09-24 19:50 ` [PATCH v2 6/6] specs/ppc-spapr-numa: update with new NUMA support Daniel Henrique Barboza
2020-09-25  3:43   ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5a80b9ae-3954-7e01-dcda-759ce50fe5e7@gmail.com \
    --to=danielhb413@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=groug@kaod.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).