From: Gavin Shan <gshan@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: peter.maydell@linaro.org, drjones@redhat.com,
richard.henderson@linaro.org, qemu-devel@nongnu.org,
"wangyanan \(Y\)" <wangyanan55@huawei.com>,
qemu-arm@nongnu.org, shan.gavin@gmail.com
Subject: Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID
Date: Fri, 25 Feb 2022 16:41:43 +0800 [thread overview]
Message-ID: <211c23f8-b5bd-219d-e584-20a0b700919d@redhat.com> (raw)
In-Reply-To: <aa22b9ba-6b5a-a728-870d-e5efbea67c5d@redhat.com>
Hi Igor,
On 2/17/22 10:14 AM, Gavin Shan wrote:
> On 1/26/22 5:14 PM, Igor Mammedov wrote:
>> On Wed, 26 Jan 2022 13:24:10 +0800
>> Gavin Shan <gshan@redhat.com> wrote:
>>
>>> The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id()
>>> when it isn't provided explicitly. However, the CPU topology isn't fully
>>> considered in the default association and it causes CPU topology broken
>>> warnings on booting Linux guest.
>>>
>>> For example, the following warning messages are observed when the Linux guest
>>> is booted with the following command lines.
>>>
>>> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
>>> -accel kvm -machine virt,gic-version=host \
>>> -cpu host \
>>> -smp 6,sockets=2,cores=3,threads=1 \
>>> -m 1024M,slots=16,maxmem=64G \
>>> -object memory-backend-ram,id=mem0,size=128M \
>>> -object memory-backend-ram,id=mem1,size=128M \
>>> -object memory-backend-ram,id=mem2,size=128M \
>>> -object memory-backend-ram,id=mem3,size=128M \
>>> -object memory-backend-ram,id=mem4,size=128M \
>>> -object memory-backend-ram,id=mem4,size=384M \
>>> -numa node,nodeid=0,memdev=mem0 \
>>> -numa node,nodeid=1,memdev=mem1 \
>>> -numa node,nodeid=2,memdev=mem2 \
>>> -numa node,nodeid=3,memdev=mem3 \
>>> -numa node,nodeid=4,memdev=mem4 \
>>> -numa node,nodeid=5,memdev=mem5
>>> :
>>> alternatives: patching kernel code
>>> BUG: arch topology borken
>>> the CLS domain not a subset of the MC domain
>>> <the above error log repeats>
>>> BUG: arch topology borken
>>> the DIE domain not a subset of the NODE domain
>>>
>>> With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5
>>> are associated with NODE#0 to NODE#5 separately. That's incorrect because
>>> CPU#0/1/2 should be associated with same NUMA node because they're seated
>>> in same socket.
>>>
>>> This fixes the issue by considering the socket when default CPU-to-NUMA
>>> is given. With this applied, no more CPU topology broken warnings are seen
>>> from the Linux guest. The 6 CPUs are associated with NODE#0/1, but there are
>>> no CPUs associated with NODE#2/3/4/5.
>>
>>> From migration point of view it looks fine to me, and doesn't need a compat knob
>> since NUMA data (on virt-arm) only used to construct ACPI tables (and we don't
>> version those unless something is broken by it).
>>
>>
>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>> ---
>>> hw/arm/virt.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>> index 141350bf21..b4a95522d3 100644
>>> --- a/hw/arm/virt.c
>>> +++ b/hw/arm/virt.c
>>> @@ -2499,7 +2499,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
>>> static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx)
>>> {
>>> - return idx % ms->numa_state->num_nodes;
>>> + return idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * ms->smp.threads);
>>
>> I'd like for ARM folks to confirm whether above is correct
>> (i.e. socket is NUMA node boundary and also if above topo vars
>> could have odd values. Don't look at horribly complicated x86
>> as example, but it showed that vendors could stash pretty much
>> anything there, so we should consider it here as well and maybe
>> forbid that in smp virt-arm parser)
>>
>
> After doing some investigation, I don't think the socket is NUMA node boundary.
> Unfortunately, I didn't find it's documented like this in any documents after
> checking device-tree specification, Linux CPU topology and NUMA binding documents.
>
> However, there are two options here according to Linux (guest) kernel code:
> (A) socket is NUMA node boundary (B) CPU die is NUMA node boundary. They are
> equivalent as CPU die isn't supported on arm/virt machine. Besides, the topology
> of one-to-one association between socket and NUMA node sounds natural and simplified.
> So I think (A) is the best way to go.
>
> Another thing I want to explain here is how the changes affect the memory
> allocation in Linux guest. Taking the command lines included in the commit
> log as an example, the first two NUMA nodes are bound to CPUs while the other
> 4 NUMA nodes are regarded as remote NUMA nodes to CPUs. The remote NUMA node
> won't accommodate the memory allocation until the memory in the near (local)
> NUMA node becomes exhausted. However, it's uncertain how the memory is hosted
> if memory binding isn't applied.
>
> Besides, I think the code should be improved like below to avoid overflow on
> ms->numa_state->num_nodes.
>
> static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx)
> {
> - return idx % ms->numa_state->num_nodes;
> + int node_idx;
> +
> + node_idx = idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * ms->smp.threads);
> + return node_idx % ms->numa_state->num_nodes;
> }
>
>
Kindly ping...
>>> }
>>> static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>>
Thanks,
Gavin
next prev parent reply other threads:[~2022-02-25 9:09 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-26 5:24 [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID Gavin Shan
2022-01-26 7:41 ` Andrew Jones
2022-01-26 9:14 ` Igor Mammedov
2022-01-28 7:05 ` wangyanan (Y) via
2022-02-15 8:19 ` Gavin Shan
2022-02-15 8:32 ` Andrew Jones
2022-02-16 10:58 ` Gavin Shan
2022-02-08 14:49 ` Peter Maydell
2022-02-17 2:14 ` Gavin Shan
2022-02-25 8:41 ` Gavin Shan [this message]
2022-02-25 10:03 ` Igor Mammedov
2022-02-28 4:26 ` Gavin Shan
2022-02-28 10:54 ` Igor Mammedov
2022-03-01 9:14 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=211c23f8-b5bd-219d-e584-20a0b700919d@redhat.com \
--to=gshan@redhat.com \
--cc=drjones@redhat.com \
--cc=imammedo@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=shan.gavin@gmail.com \
--cc=wangyanan55@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).