qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Gavin Shan <gshan@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: peter.maydell@linaro.org, drjones@redhat.com,
	richard.henderson@linaro.org, qemu-devel@nongnu.org,
	"wangyanan \(Y\)" <wangyanan55@huawei.com>,
	qemu-arm@nongnu.org, shan.gavin@gmail.com
Subject: Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID
Date: Fri, 25 Feb 2022 16:41:43 +0800	[thread overview]
Message-ID: <211c23f8-b5bd-219d-e584-20a0b700919d@redhat.com> (raw)
In-Reply-To: <aa22b9ba-6b5a-a728-870d-e5efbea67c5d@redhat.com>

Hi Igor,

On 2/17/22 10:14 AM, Gavin Shan wrote:
> On 1/26/22 5:14 PM, Igor Mammedov wrote:
>> On Wed, 26 Jan 2022 13:24:10 +0800
>> Gavin Shan <gshan@redhat.com> wrote:
>>
>>> The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id()
>>> when it isn't provided explicitly. However, the CPU topology isn't fully
>>> considered in the default association and it causes CPU topology broken
>>> warnings on booting Linux guest.
>>>
>>> For example, the following warning messages are observed when the Linux guest
>>> is booted with the following command lines.
>>>
>>>    /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
>>>    -accel kvm -machine virt,gic-version=host               \
>>>    -cpu host                                               \
>>>    -smp 6,sockets=2,cores=3,threads=1                      \
>>>    -m 1024M,slots=16,maxmem=64G                            \
>>>    -object memory-backend-ram,id=mem0,size=128M            \
>>>    -object memory-backend-ram,id=mem1,size=128M            \
>>>    -object memory-backend-ram,id=mem2,size=128M            \
>>>    -object memory-backend-ram,id=mem3,size=128M            \
>>>    -object memory-backend-ram,id=mem4,size=128M            \
>>>    -object memory-backend-ram,id=mem4,size=384M            \
>>>    -numa node,nodeid=0,memdev=mem0                         \
>>>    -numa node,nodeid=1,memdev=mem1                         \
>>>    -numa node,nodeid=2,memdev=mem2                         \
>>>    -numa node,nodeid=3,memdev=mem3                         \
>>>    -numa node,nodeid=4,memdev=mem4                         \
>>>    -numa node,nodeid=5,memdev=mem5
>>>           :
>>>    alternatives: patching kernel code
>>>    BUG: arch topology borken
>>>    the CLS domain not a subset of the MC domain
>>>    <the above error log repeats>
>>>    BUG: arch topology borken
>>>    the DIE domain not a subset of the NODE domain
>>>
>>> With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5
>>> are associated with NODE#0 to NODE#5 separately. That's incorrect because
>>> CPU#0/1/2 should be associated with same NUMA node because they're seated
>>> in same socket.
>>>
>>> This fixes the issue by considering the socket when default CPU-to-NUMA
>>> is given. With this applied, no more CPU topology broken warnings are seen
>>> from the Linux guest. The 6 CPUs are associated with NODE#0/1, but there are
>>> no CPUs associated with NODE#2/3/4/5.
>>
>>> From migration point of view it looks fine to me, and doesn't need a compat knob
>> since NUMA data (on virt-arm) only used to construct ACPI tables (and we don't
>> version those unless something is broken by it).
>>
>>
>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>> ---
>>>   hw/arm/virt.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>> index 141350bf21..b4a95522d3 100644
>>> --- a/hw/arm/virt.c
>>> +++ b/hw/arm/virt.c
>>> @@ -2499,7 +2499,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
>>>   static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx)
>>>   {
>>> -    return idx % ms->numa_state->num_nodes;
>>> +    return idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * ms->smp.threads);
>>
>> I'd like for ARM folks to confirm whether above is correct
>> (i.e. socket is NUMA node boundary and also if above topo vars
>> could have odd values. Don't look at horribly complicated x86
>> as example, but it showed that vendors could stash pretty much
>> anything there, so we should consider it here as well and maybe
>> forbid that in smp virt-arm parser)
>>
> 
> After doing some investigation, I don't think the socket is NUMA node boundary.
> Unfortunately, I didn't find it's documented like this in any documents after
> checking device-tree specification, Linux CPU topology and NUMA binding documents.
> 
> However, there are two options here according to Linux (guest) kernel code:
> (A) socket is NUMA node boundary  (B) CPU die is NUMA node boundary. They are
> equivalent as CPU die isn't supported on arm/virt machine. Besides, the topology
> of one-to-one association between socket and NUMA node sounds natural and simplified.
> So I think (A) is the best way to go.
> 
> Another thing I want to explain here is how the changes affect the memory
> allocation in Linux guest. Taking the command lines included in the commit
> log as an example, the first two NUMA nodes are bound to CPUs while the other
> 4 NUMA nodes are regarded as remote NUMA nodes to CPUs. The remote NUMA node
> won't accommodate the memory allocation until the memory in the near (local)
> NUMA node becomes exhausted. However, it's uncertain how the memory is hosted
> if memory binding isn't applied.
> 
> Besides, I think the code should be improved like below to avoid overflow on
> ms->numa_state->num_nodes.
> 
>   static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx)
>   {
> -    return idx % ms->numa_state->num_nodes;
> +    int node_idx;
> +
> +    node_idx = idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * ms->smp.threads);
> +    return node_idx % ms->numa_state->num_nodes;
>   }
> 
> 

Kindly ping...

>>>   }
>>>   static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>>

Thanks,
Gavin



  reply	other threads:[~2022-02-25  9:09 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-26  5:24 [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID Gavin Shan
2022-01-26  7:41 ` Andrew Jones
2022-01-26  9:14 ` Igor Mammedov
2022-01-28  7:05   ` wangyanan (Y) via
2022-02-15  8:19     ` Gavin Shan
2022-02-15  8:32       ` Andrew Jones
2022-02-16 10:58         ` Gavin Shan
2022-02-08 14:49   ` Peter Maydell
2022-02-17  2:14   ` Gavin Shan
2022-02-25  8:41     ` Gavin Shan [this message]
2022-02-25 10:03       ` Igor Mammedov
2022-02-28  4:26         ` Gavin Shan
2022-02-28 10:54           ` Igor Mammedov
2022-03-01  9:14             ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211c23f8-b5bd-219d-e584-20a0b700919d@redhat.com \
    --to=gshan@redhat.com \
    --cc=drjones@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=shan.gavin@gmail.com \
    --cc=wangyanan55@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).