qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Gavin Shan <gshan@redhat.com>
To: Andrew Jones <drjones@redhat.com>
Cc: peter.maydell@linaro.org, qemu-devel@nongnu.org,
	qemu-arm@nongnu.org, shan.gavin@gmail.com, ehabkost@redhat.com
Subject: Re: [PATCH 1/2] numa: Set default distance map if needed
Date: Fri, 8 Oct 2021 10:51:24 +1100	[thread overview]
Message-ID: <9e51e29e-8792-18e7-7b38-68af15a3fdf5@redhat.com> (raw)
In-Reply-To: <20211006115643.p5b2qcoi4eagluqc@gator.home>

Hi Drew,

On 10/6/21 10:56 PM, Andrew Jones wrote:
> On Wed, Oct 06, 2021 at 10:03:25PM +1100, Gavin Shan wrote:
>> On 10/6/21 9:35 PM, Andrew Jones wrote:
>>> On Wed, Oct 06, 2021 at 06:22:08PM +0800, Gavin Shan wrote:
>>>> The following option is used to specify the distance map. It's
>>>> possible the option isn't provided by user. In this case, the
>>>> distance map isn't populated and exposed to platform. On the
>>>> other hand, the empty NUMA node, where no memory resides, is
>>>> allowed on ARM64 virt platform. For these empty NUMA nodes,
>>>> their corresponding device-tree nodes aren't populated, but
>>>> their NUMA IDs should be included in the "/distance-map"
>>>> device-tree node, so that kernel can probe them properly if
>>>> device-tree is used.
>>>>
>>>>     -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
>>>>
>>>> So when user doesn't specify distance map, we need to generate
>>>> the default distance map, where the local and remote distances
>>>> are 10 and 20 separately. This adds an extra parameter to the
>>>> exiting complete_init_numa_distance() to generate the default
>>>> distance map for this case.
>>>>
>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>> ---
>>>>    hw/core/numa.c | 13 +++++++++++--
>>>>    1 file changed, 11 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/core/numa.c b/hw/core/numa.c
>>>> index 510d096a88..fdb3a4aeca 100644
>>>> --- a/hw/core/numa.c
>>>> +++ b/hw/core/numa.c
>>>> @@ -594,7 +594,7 @@ static void validate_numa_distance(MachineState *ms)
>>>>        }
>>>>    }
>>>> -static void complete_init_numa_distance(MachineState *ms)
>>>> +static void complete_init_numa_distance(MachineState *ms, bool is_default)
>>>>    {
>>>>        int src, dst;
>>>>        NodeInfo *numa_info = ms->numa_state->nodes;
>>>> @@ -609,6 +609,8 @@ static void complete_init_numa_distance(MachineState *ms)
>>>>                if (numa_info[src].distance[dst] == 0) {
>>>>                    if (src == dst) {
>>>>                        numa_info[src].distance[dst] = NUMA_DISTANCE_MIN;
>>>> +                } else if (is_default) {
>>>> +                    numa_info[src].distance[dst] = NUMA_DISTANCE_DEFAULT;
>>>>                    } else {
>>>>                        numa_info[src].distance[dst] = numa_info[dst].distance[src];
>>>>                    }
>>>> @@ -716,13 +718,20 @@ void numa_complete_configuration(MachineState *ms)
>>>>             * A->B != distance B->A, then that means the distance table is
>>>>             * asymmetric. In this case, the distances for both directions
>>>>             * of all node pairs are required.
>>>> +         *
>>>> +         * The default node pair distances, which are 10 and 20 for the
>>>> +         * local and remote nodes separatly, are provided if user doesn't
>>>> +         * specify any node pair distances.
>>>>             */
>>>>            if (ms->numa_state->have_numa_distance) {
>>>>                /* Validate enough NUMA distance information was provided. */
>>>>                validate_numa_distance(ms);
>>>>                /* Validation succeeded, now fill in any missing distances. */
>>>> -            complete_init_numa_distance(ms);
>>>> +            complete_init_numa_distance(ms, false);
>>>> +        } else {
>>>> +            complete_init_numa_distance(ms, true);
>>>> +            ms->numa_state->have_numa_distance = true;
>>>>            }
>>>>        }
>>>>    }
>>>> -- 
>>>> 2.23.0
>>>>
>>>
>>> With this patch we'll always generate a distance map when there's a numa
>>> config now. Is there any reason a user would not want to do that? I.e.
>>> should we still give the user the choice of presenting a distance map?
>>> Also, does the addition of a distance map in DTs for compat machine types
>>> matter?
>>>
>>> Otherwise patch looks good to me.
>>>
>>
>> Users needn't specify the distance map when the default one in kernel,
>> whose distances are 10 and 20 for local and remote nodes in linux for
>> all architectures and machines, is used. The following option is still
>> usable to specify the distance map.
>>
>>    -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
>>
>> When the empty NUMA nodes are concerned, the distance map is mandatory
>> because their NUMA IDs are identified from there. So we always generate
>> the distance map as this patch does :)
>>
> 
> Yup, I knew all that already :-) I'm asking if we want to ensure the user
> can still control whether or not this distance map is generated at all. If
> a user doesn't want empty numa nodes or a distance map, then, with this
> patch, they cannot avoid the map's generation. That configurability
> question also relates to machine compatibility. Do we want to start
> generating this distance map on old, numa configured machine types? This
> patch will do that too.
> 
> But, it might be OK to just start generating this new DT node for all numa
> configured machine types and not allow the user to opt out. I do know that
> we allow hardware descriptions to be changed without compat code.  Also, a
> disable-auto-distance-map option may be considered useless and therefore
> not worth maintaining. The conservative in me says it's worth debating
> these things first though.
> 
> (Note, empty numa nodes have never worked with QEMU, so it's OK to start
>   erroring out when empty numa nodes and a disable-auto-distance-map option
>   are given together.)
> 

Sorry for the delay. I didn't fully understand "machine compatibility" even
after checking the code around. Could you please provide more details? I'm
not sure if the enforced distance-map for empty NUMA nodes will cause any
issues?

Yes, the empty NUMA node never worked with QEMU if device-tree is used.
We still need to figure out a way to support memory hotplug through
device-tree, similar thing as to what IBM's pSeries platform has.
However, it works when ACPI table is used. Taking the following
command line as an example, the hot-added memory is always put
into the last NUMA node (3). The last NUMA node can be empty node
after changing the code to allow to export ACPI SRAT table to include
the empty NUMA nodes.

    /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
    -accel kvm -machine virt,gic-version=host               \
    -cpu host -smp 4,sockets=2,cores=2,threads=1            \
    -m 1024M,slots=16,maxmem=64G                            \
    -object memory-backend-ram,id=mem0,size=512M            \
    -object memory-backend-ram,id=mem1,size=512M            \
    -numa node,nodeid=0,cpus=0-1,memdev=mem0                \
    -numa node,nodeid=1,cpus=2-3,memdev=mem1                \
    -numa node,nodeid=2                                     \
    -numa node,nodeid=3
      :
      :
    guest# cat /sys/devices/system/node/node3/meminfo | grep MemTotal
    Node 3 MemTotal:              0 kB
    (qemu) object_add memory-backend-ram,id=hpmem0,size=1G
    (qemu) device_add pc-dimm,id=dimm1,memdev=hpmem0,node=3
    guest# cat /sys/devices/system/node/node3/meminfo | grep MemTotal
    Node 3 MemTotal:        1048576 kB

Thanks,
Gavin





  reply	other threads:[~2021-10-07 23:54 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-06 10:22 [PATCH 0/2] hw/arm/virt: Fix qemu booting failure on device-tree Gavin Shan
2021-10-06 10:22 ` [PATCH 1/2] numa: Set default distance map if needed Gavin Shan
2021-10-06 10:35   ` Andrew Jones
2021-10-06 11:03     ` Gavin Shan
2021-10-06 11:56       ` Andrew Jones
2021-10-07 23:51         ` Gavin Shan [this message]
2021-10-08  6:07           ` Andrew Jones
2021-10-12  6:13             ` Gavin Shan
2021-10-12  9:40   ` Igor Mammedov
2021-10-12 10:31     ` Gavin Shan
2021-10-12 11:18       ` Igor Mammedov
2021-10-12 11:48       ` Andrew Jones
2021-10-12 12:34         ` Igor Mammedov
2021-10-12 13:05           ` Andrew Jones
2021-10-12 22:59             ` Gavin Shan
2021-10-12 10:37     ` Andrew Jones
2021-10-12 12:27       ` Igor Mammedov
2021-10-12 13:13         ` Andrew Jones
2021-10-12 13:53           ` Igor Mammedov
2021-10-12 23:32             ` Gavin Shan
2021-10-13  9:32               ` Igor Mammedov
2021-10-13  6:29             ` Andrew Jones
2021-10-06 10:22 ` [PATCH 2/2] hw/arm/virt: Don't create device-tree node for empty NUMA node Gavin Shan
2021-10-06 10:36   ` Andrew Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e51e29e-8792-18e7-7b38-68af15a3fdf5@redhat.com \
    --to=gshan@redhat.com \
    --cc=drjones@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=shan.gavin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).