Re: [PATCH] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id

From: Mike Rapoport <rppt@kernel.org>
To: Shijie Huang <shijie@amperemail.onmicrosoft.com>
Cc: Yury Norov <yury.norov@gmail.com>,
	Huang Shijie <shijie@os.amperecomputing.com>,
	gregkh@linuxfoundation.org, patches@amperecomputing.com,
	rafael@kernel.org, paul.walmsley@sifive.com, palmer@dabbelt.com,
	aou@eecs.berkeley.edu, kuba@kernel.org, vschneid@redhat.com,
	mingo@kernel.org, akpm@linux-foundation.org, vbabka@suse.cz,
	tglx@linutronix.de, jpoimboe@kernel.org, ndesaulniers@google.com,
	mikelley@microsoft.com, mhiramat@kernel.org, arnd@arndb.de,
	linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
	linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com,
	will@kernel.org, mark.rutland@arm.com, mpe@ellerman.id.au,
	linuxppc-dev@lists.ozlabs.org, chenhuacai@kernel.org,
	jiaxun.yang@flygoat.com, linux-mips@vger.kernel.org,
	cl@os.amperecomputing.com
Subject: Re: [PATCH] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id
Date: Fri, 19 Jan 2024 10:42:04 +0200	[thread overview]
Message-ID: <Zao13I4Bb0tur0fZ@kernel.org> (raw)
In-Reply-To: <1cd078fd-c345-4d85-a92f-04c806c20efa@amperemail.onmicrosoft.com>

On Fri, Jan 19, 2024 at 02:46:16PM +0800, Shijie Huang wrote:
> 
> 在 2024/1/19 12:42, Yury Norov 写道:
> > This adds another level of indirection, I think. Currently cpu_to_node
> > is a simple inliner. After the patch it would be a real function with
> > all the associate overhead. Can you share a bloat-o-meter output here?
> #./scripts/bloat-o-meter vmlinux vmlinux.new
> add/remove: 6/1 grow/shrink: 61/51 up/down: 1168/-588 (580)
> Function                                     old     new   delta
> numa_update_cpu                              148     244     +96
> 
>  ...................................................................................................................................(to many to skip)
> 
> Total: Before=32990130, After=32990710, chg +0.00%

It's not only about text size, the indirect call also hurts performance

> > 
> > Regardless, I don't think that the approach is correct. As per your
> > description, some initialization functions erroneously call
> > cpu_to_node() instead of early_cpu_to_node() which exists specifically
> > for that case.
> > 
> > If the above correct, it's clearly a caller problem, and the fix is to
> > simply switch all those callers to use early version.
> 
> It is easy to change to early_cpu_to_node() for sched_init(),
> init_sched_fair_class()
> 
> and workqueue_init_early(). These three places call the cpu_to_node() in the
> __init function.
> 
> 
> But it is a little hard to change the early_trace_init(), since it calls
> cpu_to_node in the deep
> 
> function stack:
> 
>   early_trace_init() --> ring_buffer_alloc() -->rb_allocate_cpu_buffer()
> 
> 
> For early_trace_init(), we need to change more code.
> 
> 
> Anyway, If we think it is not a good idea to change the common code, I am
> oaky too.

Is there a fundamental reason to have early_cpu_to_node() at all?
It seems that all the mappings are known by the end of setup_arch() and the
initialization of numa_node can be moved earlier. 

> > I would also initialize the numa_node with NUMA_NO_NODE at declaration,
> > so that if someone calls cpu_to_node() before the variable is properly
> > initialized at runtime, he'll get NO_NODE, which is obviously an error.
> 
> Even we set the numa_node with NUMA_NO_NODE, it does not always produce
> error.
> 
> Please see the alloc_pages_node().
> 
> 
> Thanks
> 
> Huang Shijie
> 

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv