From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.150]) (using TLSv1 with cipher CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 53DDC1A0015 for ; Thu, 9 Jul 2015 09:16:31 +1000 (AEST) Received: from /spool/local by e32.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 8 Jul 2015 17:16:29 -0600 Received: from b03cxnp08026.gho.boulder.ibm.com (b03cxnp08026.gho.boulder.ibm.com [9.17.130.18]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id 94C141FF002E for ; Wed, 8 Jul 2015 17:07:35 -0600 (MDT) Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id t68NFwZw49152128 for ; Wed, 8 Jul 2015 16:15:58 -0700 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id t68NGOkd032266 for ; Wed, 8 Jul 2015 17:16:25 -0600 Date: Wed, 8 Jul 2015 16:16:23 -0700 From: Nishanth Aravamudan To: Michael Ellerman Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, Paul Mackerras , Anton Blanchard , David Rientjes , linuxppc-dev@lists.ozlabs.org Subject: Re: [RFC,1/2] powerpc/numa: fix cpu_to_node() usage during boot Message-ID: <20150708231623.GB44862@linux.vnet.ibm.com> References: <20150702230202.GA2807@linux.vnet.ibm.com> <20150708040056.948A1140770@ozlabs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20150708040056.948A1140770@ozlabs.org> List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On 08.07.2015 [14:00:56 +1000], Michael Ellerman wrote: > On Thu, 2015-02-07 at 23:02:02 UTC, Nishanth Aravamudan wrote: > > Much like on x86, now that powerpc is using USE_PERCPU_NUMA_NODE_ID, we > > have an ordering issue during boot with early calls to cpu_to_node(). > > "now that .." implies we changed something and broke this. What commit was > it that changed the behaviour? Well, that's something I'm trying to still unearth. In the commits before and after adding USE_PERCPU_NUMA_NODE_ID (8c272261194d "powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), the dmesg reports: pcpu-alloc: [0] 0 1 2 3 4 5 6 7 At least prior to 8c272261194d, this might have been due to the old powerpc-specific cpu_to_node(): static inline int cpu_to_node(int cpu) { int nid; nid = numa_cpu_lookup_table[cpu]; /* * During early boot, the numa-cpu lookup table might not have been * setup for all CPUs yet. In such cases, default to node 0. */ return (nid < 0) ? 0 : nid; } which might imply that no one cares or that simply no one noticed. > > The value returned by those calls now depend on the per-cpu area being > > setup, but that is not guaranteed to be the case during boot. Instead, > > we need to add an early_cpu_to_node() which doesn't use the per-CPU area > > and call that from certain spots that are known to invoke cpu_to_node() > > before the per-CPU areas are not configured. > > > > On an example 2-node NUMA system with the following topology: > > > > available: 2 nodes (0-1) > > node 0 cpus: 0 1 2 3 > > node 0 size: 2029 MB > > node 0 free: 1753 MB > > node 1 cpus: 4 5 6 7 > > node 1 size: 2045 MB > > node 1 free: 1945 MB > > node distances: > > node 0 1 > > 0: 10 40 > > 1: 40 10 > > > > we currently emit at boot: > > > > [ 0.000000] pcpu-alloc: [0] 0 1 2 3 [0] 4 5 6 7 > > > > After this commit, we correctly emit: > > > > [ 0.000000] pcpu-alloc: [0] 0 1 2 3 [1] 4 5 6 7 > > > So it looks fairly sane, and I guess it's a bug fix. > > But I'm a bit reluctant to put it in straight away without some time in next. I'm fine with that -- it could use some more extensive testing, admittedly (I only have been able to verify the pcpu areas are being correctly allocated on the right node so far). I still need to test with hotplug and things like that. Hence the RFC. > It looks like the symptom is that the per-cpu areas are all allocated on node > 0, is that all that goes wrong? Yes, that's the symptom. I cc'd a few folks to see if they could help indicate the performance implications of such a setup -- sorry, I should have been more explicit about that. Thanks, Nish