From: Pierre Gondois <pierre.gondois@arm.com>
To: Rongwei Wang <rongwei.wang@linux.alibaba.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: akpm@linux-foundation.org, willy@infradead.org,
catalin.marinas@arm.com, dave.hansen@linux.intel.com,
tj@kernel.org, mingo@redhat.com
Subject: Re: [PATCH RFC 0/5] support NUMA emulation for arm64
Date: Mon, 23 Oct 2023 15:03:40 +0200 [thread overview]
Message-ID: <bdd0b2c8-264d-404f-8c51-68cd9323f51a@arm.com> (raw)
In-Reply-To: <57eba42c-732a-4a30-a714-5e5538f2e5d5@linux.alibaba.com>
Hello Rongwei,
On 10/12/23 15:30, Rongwei Wang wrote:
>
> On 2023/10/12 20:37, Pierre Gondois wrote:
>> Hello Rongwei,
>>
>> On 10/12/23 04:48, Rongwei Wang wrote:
>>> A brief introduction
>>> ====================
>>>
>>> The NUMA emulation can fake more node base on a single
>>> node system, e.g.
>>>
>>> one node system:
>>>
>>> [root@localhost ~]# numactl -H
>>> available: 1 nodes (0)
>>> node 0 cpus: 0 1 2 3 4 5 6 7
>>> node 0 size: 31788 MB
>>> node 0 free: 31446 MB
>>> node distances:
>>> node 0
>>> 0: 10
>>>
>>> add numa=fake=2 (fake 2 node on each origin node):
>>>
>>> [root@localhost ~]# numactl -H
>>> available: 2 nodes (0-1)
>>> node 0 cpus: 0 1 2 3 4 5 6 7
>>> node 0 size: 15806 MB
>>> node 0 free: 15451 MB
>>> node 1 cpus: 0 1 2 3 4 5 6 7
>>> node 1 size: 16029 MB
>>> node 1 free: 15989 MB
>>> node distances:
>>> node 0 1
>>> 0: 10 10
>>> 1: 10 10
>>>
>>> As above shown, a new node has been faked. As cpus, the realization
>>> of x86 NUMA emulation is kept. Maybe each node should has 4 cores is
>>> better (not sure, next to do if so).
>>>
>>> Why do this
>>> ===========
>>>
>>> It seems has following reasons:
>>> (1) In x86 host, apply NUMA emulation can fake more nodes environment
>>> to test or verify some performance stuff, but arm64 only has
>>> one method that modify ACPI table to do this. It's troublesome
>>> more or less.
>>> (2) Reduce competition for some locks. Here an example we found:
>>> will-it-scale/tlb_flush1_processes -t 96 -s 10, it shows obvious
>>> hotspot on lruvec->lock when test in single environment. What's
>>> more, The performance improved greatly if test in two more nodes
>>> system. The data shows below (more is better):
>>>
>>> ---------------------------------------------------------------------
>>> threads/process | 1 | 12 | 24 | 48 | 96
>>> ---------------------------------------------------------------------
>>> one node | 14 1122 | 110 5372 | 111 2615 | 79 7084 |
>>> 72 4516
>>> ---------------------------------------------------------------------
>>> numa=fake=2 | 14 1168 | 144 4848 | 215 9070 | 157 0412 |
>>> 142 3968
>>> ---------------------------------------------------------------------
>>> | For concurrency 12, no lruvec->lock hotspot.
>>> For 24,
>>> hotspot | one node has 24% hotspot on lruvec->lock, but
>>> | two nodes env hasn't.
>>> ---------------------------------------------------------------------
>>>
>>> As for risks (e.g. numa balance...), they need to be discussed here.
>>>
>>> Lastly, this just is a draft, I can improve next if it's acceptable.
>>
>> I'm not engaging on the utility/relevance of the patch-set, but I tried
>> them on an arm64 system with the 'numa=fake=2' parameter and could not
>
> Sorry, my fault.
>
> I should mention this in previous brief introduction: acpi=on numa=fake=2.
>
> The default patch of arm64 numa initialize is numa_init() ->
> dummy_numa_init() if turn off acpi (this path has not been taken into
> account yet in this patch, next will to do).
>
> What's more, if you test these patchset in qemu-kvm, you should add
> below parameters in the script.
>
> object memory-backend-ram,id=mem0,size=32G \
> numa node,memdev=mem0,cpus=0-7,nodeid=0 \
>
> (Above parameters just make sure SRAT table has NUMA configure, avoiding
> path of numa_init() -> dummy_numa_init())
>
>> see 2 nodes being created under:
>> /sys/devices/system/node/
>> Indeed it seems that even though numa_emulation() is moved to a generic
>> mm/numa.c file, the function is only called from:
>> arch/x86/mm/numa.c:numa_init()
>> (or maybe I'm misinterpreting the intent of the patches).
>
> Here drivers/base/arch_numa.c:numa_init() has called numa_emulation() (I
> guess it works if you add acpi=on :-)).
I don't see numa_emulation() being called from drivers/base/arch_numa.c:numa_init()
I have:
$ git grep numa_emulation
arch/x86/mm/numa.c: numa_emulation(&numa_meminfo, numa_distance_cnt);
arch/x86/mm/numa_internal.h:extern void __init numa_emulation(struct numa_meminfo *numa_meminfo,
include/asm-generic/numa.h:void __init numa_emulation(struct numa_meminfo *numa_meminfo,
mm/numa.c:/* Most of this file comes from x86/numa_emulation.c */
mm/numa.c: * numa_emulation - Emulate NUMA nodes
mm/numa.c:void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
so from this, an arm64-based platform should not be able to call numa_emulation().
Is it possible to add a call to dump_stack() in numa_emulation() to see the call stack ?
The branch I'm using is based on v6.6-rc5 and has the 5 patches applied:
2af398a87cc7 mm/numa: migrate leftover numa emulation into mm/numa.c
c8e314fb23be mm/numa: support CONFIG_NUMA_EMU for arm64
335b7219d40e arch_numa: remove __init in early_cpu_to_node()
d9358adf1cdc mm: percpu: fix variable type of cpu
1ffbe40a00f5 mm/numa: move numa emulation APIs into generic files
94f6f0550c62 (tag: v6.6-rc5) Linux 6.6-rc5
Regards,
Pierre
>
>
>>
>> Also I had the following errors when building (still for arm64):
>> mm/numa.c:862:8: error: implicit declaration of function
>> 'early_cpu_to_node' is invalid in C99
>> [-Werror,-Wimplicit-function-declaration]
>> nid = early_cpu_to_node(cpu);
>
> It seems CONFIG_DEBUG_PER_CPU_MAPS enabled in your environment? You can
> disable CONFIG_DEBUG_PER_CPU_MAPS and test it again.
>
> I have not test it with CONFIG_DEBUG_PER_CPU_MAPS enabled. It's very
> helpful, I will fix it next time.
>
> If you have any questions, please let me know.
>
> Regards,
>
> -wrw
>
>> ^
>> mm/numa.c:862:8: note: did you mean 'early_map_cpu_to_node'?
>> ./include/asm-generic/numa.h:37:13: note: 'early_map_cpu_to_node'
>> declared here
>> void __init early_map_cpu_to_node(unsigned int cpu, int nid);
>> ^
>> mm/numa.c:874:3: error: implicit declaration of function
>> 'debug_cpumask_set_cpu' is invalid in C99
>> [-Werror,-Wimplicit-function-declaration]
>> debug_cpumask_set_cpu(cpu, nid, enable);
>> ^
>> mm/numa.c:874:3: note: did you mean '__cpumask_set_cpu'?
>> ./include/linux/cpumask.h:474:29: note: '__cpumask_set_cpu' declared here
>> static __always_inline void __cpumask_set_cpu(unsigned int cpu, struct
>> cpumask *dstp)
>> ^
>> 2 errors generated.
>>
>> Regards,
>> Pierre
>>
>>>
>>> Thanks!
>>>
>>> Rongwei Wang (5):
>>> mm/numa: move numa emulation APIs into generic files
>>> mm: percpu: fix variable type of cpu
>>> arch_numa: remove __init in early_cpu_to_node()
>>> mm/numa: support CONFIG_NUMA_EMU for arm64
>>> mm/numa: migrate leftover numa emulation into mm/numa.c
>>>
>>> arch/x86/Kconfig | 8 -
>>> arch/x86/include/asm/numa.h | 3 -
>>> arch/x86/mm/Makefile | 1 -
>>> arch/x86/mm/numa.c | 216 +-------------
>>> arch/x86/mm/numa_internal.h | 14 +-
>>> drivers/base/arch_numa.c | 7 +-
>>> include/asm-generic/numa.h | 33 +++
>>> include/linux/percpu.h | 2 +-
>>> mm/Kconfig | 8 +
>>> mm/Makefile | 1 +
>>> arch/x86/mm/numa_emulation.c => mm/numa.c | 333 +++++++++++++++++++++-
>>> 11 files changed, 373 insertions(+), 253 deletions(-)
>>> rename arch/x86/mm/numa_emulation.c => mm/numa.c (63%)
>>>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-10-23 13:04 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-12 2:48 [PATCH RFC 0/5] support NUMA emulation for arm64 Rongwei Wang
2023-10-12 2:48 ` [PATCH RFC 1/5] mm/numa: move numa emulation APIs into generic files Rongwei Wang
2023-10-12 6:05 ` Ingo Molnar
2023-10-12 2:48 ` [PATCH RFC 2/5] mm: percpu: fix variable type of cpu Rongwei Wang
2023-10-12 2:48 ` [PATCH RFC 3/5] arch_numa: remove __init in early_cpu_to_node() Rongwei Wang
2023-10-12 2:48 ` [PATCH RFC 4/5] mm/numa: support CONFIG_NUMA_EMU for arm64 Rongwei Wang
2023-10-12 2:48 ` [PATCH RFC 5/5] mm/numa: migrate leftover numa emulation into mm/numa.c Rongwei Wang
2023-10-12 12:37 ` [PATCH RFC 0/5] support NUMA emulation for arm64 Pierre Gondois
2023-10-12 13:30 ` Rongwei Wang
2023-10-23 13:03 ` Pierre Gondois [this message]
2024-02-20 11:36 ` [PATCH v1 0/2] support NUMA emulation for genertic arch Rongwei Wang
2024-02-20 11:36 ` [PATCH v1 1/2] arch_numa: remove __init for early_cpu_to_node Rongwei Wang
2024-02-20 11:36 ` [PATCH v1 2/2] numa: introduce numa emulation for genertic arch Rongwei Wang
2024-02-21 6:12 ` [PATCH v1 0/2] support NUMA " Mike Rapoport
2024-02-21 15:51 ` Pierre Gondois
2024-02-29 3:26 ` Rongwei Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bdd0b2c8-264d-404f-8c51-68cd9323f51a@arm.com \
--to=pierre.gondois@arm.com \
--cc=akpm@linux-foundation.org \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=rongwei.wang@linux.alibaba.com \
--cc=tj@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).