From: Daniel Henrique Barboza <danielhb413@gmail.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org, groug@kaod.org
Subject: Re: [PATCH v2 3/6] spapr_numa: translate regular NUMA distance to PAPR distance
Date: Fri, 25 Sep 2020 09:44:14 -0300 [thread overview]
Message-ID: <5a80b9ae-3954-7e01-dcda-759ce50fe5e7@gmail.com> (raw)
In-Reply-To: <20200925023524.GQ2298@yekko.fritz.box>
On 9/24/20 11:35 PM, David Gibson wrote:
> On Thu, Sep 24, 2020 at 04:50:55PM -0300, Daniel Henrique Barboza wrote:
>> QEMU allows the user to set NUMA distances in the command line.
>> For ACPI architectures like x86, this means that user input is
>> used to populate the SLIT table, and the guest perceives the
>> distances as the user chooses to.
>>
>> PPC64 does not work that way. In the PAPR concept of NUMA,
>> associativity relations between the NUMA nodes are provided by
>> the device tree, and the guest kernel is free to calculate the
>> distances as it sees fit. Given how ACPI architectures works,
>> this puts the pSeries machine in a strange spot - users expect
>> to define NUMA distances like in the ACPI case, but QEMU does
>> not have control over it. To give pSeries users a similar
>> experience, we'll need to bring kernel specifics to QEMU
>> to approximate the NUMA distances.
>>
>> The pSeries kernel works with the NUMA distance range 10,
>> 20, 40, 80 and 160. The code starts at 10 (local distance) and
>> searches for a match in the first NUMA level between the
>> resources. If there is no match, the distance is doubled and
>> then it proceeds to try to match in the next NUMA level. Rinse
>> and repeat for MAX_DISTANCE_REF_POINTS levels.
>>
>> This patch introduces a spapr_numa_PAPRify_distances() helper
>> that translates the user distances to kernel distance, which
>> we're going to use to determine the associativity domains for
>> the NUMA nodes.
>>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>
> The idea of rounding the distances like this seems pretty good to me.
> Since each level is a multiple of a distance from the preivous one it
> might be more theoretically correct to place the thresholds at the
> geometric mean between each level, rather than the arithmetic mean. I
> very much doubt it makes much different in practice though, and this
> is simpler.
>
> There is one nit, I'm less happy with though..
>
>> ---
>> hw/ppc/spapr_numa.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 44 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
>> index fe395e80a3..990a5fce08 100644
>> --- a/hw/ppc/spapr_numa.c
>> +++ b/hw/ppc/spapr_numa.c
>> @@ -37,6 +37,49 @@ static bool spapr_numa_is_symmetrical(MachineState *ms)
>> return true;
>> }
>>
>> +/*
>> + * This function will translate the user distances into
>> + * what the kernel understand as possible values: 10
>> + * (local distance), 20, 40, 80 and 160. Current heuristic
>> + * is:
>> + *
>> + * - distances between 11 and 30 inclusive -> rounded to 20
>> + * - distances between 31 and 60 inclusive -> rounded to 40
>> + * - distances between 61 and 120 inclusive -> rounded to 80
>> + * - everything above 120 -> 160
>> + *
>> + * This step can also be done in the same time as the NUMA
>> + * associativity domains calculation, at the cost of extra
>> + * complexity. We chose to keep it simpler.
>> + *
>> + * Note: this will overwrite the distance values in
>> + * ms->numa_state->nodes.
>> + */
>> +static void spapr_numa_PAPRify_distances(MachineState *ms)
>> +{
>> + int src, dst;
>> + int nb_numa_nodes = ms->numa_state->num_nodes;
>> + NodeInfo *numa_info = ms->numa_state->nodes;
>> +
>> + for (src = 0; src < nb_numa_nodes; src++) {
>> + for (dst = src; dst < nb_numa_nodes; dst++) {
>> + uint8_t distance = numa_info[src].distance[dst];
>> + uint8_t rounded_distance = 160;
>> +
>> + if (distance > 11 && distance <= 30) {
>> + rounded_distance = 20;
>> + } else if (distance > 31 && distance <= 60) {
>> + rounded_distance = 40;
>> + } else if (distance > 61 && distance <= 120) {
>> + rounded_distance = 80;
>> + }
>> +
>> + numa_info[src].distance[dst] = rounded_distance;
>> + numa_info[dst].distance[src] = rounded_distance;
>
> ..I don't love the fact that we alter the distance table in place.
> Even though it was never exposed to the guest, I'd prefer not to
> destroy the information the user passed in. It could lead to
> surprising results with QMP introspection, and it may make future
> extensions more difficult.
>
> So I'd prefer to either (a) just leave the table as is and round
> on-demand with a paprify_distance(NN) -> {20,40,80,..} type function,
> or (b) create a parallel, spapr local, table with the rounded
> distances
I did something similar with (a) in the very first version of this series.
I'll fall back to on-demand translation logic to avoid changing numa_info.
Thanks,
DHB
>
>> + }
>> + }
>> +}
>> +
>> void spapr_numa_associativity_init(SpaprMachineState *spapr,
>> MachineState *machine)
>> {
>> @@ -95,6 +138,7 @@ void spapr_numa_associativity_init(SpaprMachineState *spapr,
>> exit(EXIT_FAILURE);
>> }
>>
>> + spapr_numa_PAPRify_distances(machine);
>> }
>>
>> void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
>
next prev parent reply other threads:[~2020-09-25 12:53 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-24 19:50 [PATCH v2 0/6] pseries NUMA distance calculation Daniel Henrique Barboza
2020-09-24 19:50 ` [PATCH v2 1/6] spapr: add spapr_machine_using_legacy_numa() helper Daniel Henrique Barboza
2020-09-24 19:50 ` [PATCH v2 2/6] spapr_numa: forbid asymmetrical NUMA setups Daniel Henrique Barboza
2020-09-25 2:36 ` David Gibson
2020-09-25 3:48 ` David Gibson
2020-09-25 12:41 ` Daniel Henrique Barboza
2020-09-26 7:49 ` David Gibson
2020-09-27 11:41 ` Daniel Henrique Barboza
2020-09-28 6:25 ` David Gibson
2020-09-24 19:50 ` [PATCH v2 3/6] spapr_numa: translate regular NUMA distance to PAPR distance Daniel Henrique Barboza
2020-09-25 2:35 ` David Gibson
2020-09-25 12:44 ` Daniel Henrique Barboza [this message]
2020-09-24 19:50 ` [PATCH v2 4/6] spapr_numa: change reference-points and maxdomain settings Daniel Henrique Barboza
2020-09-25 2:38 ` David Gibson
2020-09-25 13:16 ` Greg Kurz
2020-09-24 19:50 ` [PATCH v2 5/6] spapr_numa: consider user input when defining associativity Daniel Henrique Barboza
2020-09-25 3:39 ` David Gibson
2020-09-25 14:42 ` Daniel Henrique Barboza
2020-09-24 19:50 ` [PATCH v2 6/6] specs/ppc-spapr-numa: update with new NUMA support Daniel Henrique Barboza
2020-09-25 3:43 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5a80b9ae-3954-7e01-dcda-759ce50fe5e7@gmail.com \
--to=danielhb413@gmail.com \
--cc=david@gibson.dropbear.id.au \
--cc=groug@kaod.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).