From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934536AbbI1Rem (ORCPT ); Mon, 28 Sep 2015 13:34:42 -0400 Received: from e36.co.us.ibm.com ([32.97.110.154]:35139 "EHLO e36.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933540AbbI1Rel (ORCPT ); Mon, 28 Sep 2015 13:34:41 -0400 X-IBM-Helo: d03dlp02.boulder.ibm.com X-IBM-MailFrom: nacc@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Mon, 28 Sep 2015 10:34:34 -0700 From: Nishanth Aravamudan To: Raghavendra K T Cc: benh@kernel.crashing.org, paulus@samba.org, mpe@ellerman.id.au, anton@samba.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, cl@linux.com, gkurz@linux.vnet.ibm.com, grant.likely@linaro.org, nikunj@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com Subject: Re: [PATCH RFC 0/5] powerpc:numa Add serial nid support Message-ID: <20150928173434.GE48470@linux.vnet.ibm.com> References: <1443378553-2146-1-git-send-email-raghavendra.kt@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1443378553-2146-1-git-send-email-raghavendra.kt@linux.vnet.ibm.com> X-Operating-System: Linux 3.13.0-40-generic (x86_64) User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15092817-0021-0000-0000-0000133A5A12 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 27.09.2015 [23:59:08 +0530], Raghavendra K T wrote: > Problem description: > Powerpc has sparse node numbering, i.e. on a 4 node system nodes are > numbered (possibly) as 0,1,16,17. At a lower level, we map the chipid > got from device tree is naturally mapped (directly) to nid. chipid is a OPAL concept, I believe, and not documented in PAPR... How does this work under PowerVM? > Potential side effect of that is: > > 1) There are several places in kernel that assumes serial node numbering. > and memory allocations assume that all the nodes from 0-(highest nid) > exist inturn ending up allocating memory for the nodes that does not exist. > > 2) For virtualization use cases (such as qemu, libvirt, openstack), mapping > sparse nid of the host system to contiguous nids of guest (numa affinity, > placement) could be a challenge. > > Possible Solutions: > 1) Handling the memory allocations is kernel case by case: Though in some > cases it is easy to achieve, some cases may be intrusive/not trivial. > at the end it does not handle side effect (2) above. > > 2) Map the sparse chipid got from device tree to a serial nid at kernel > level (The idea proposed in this series). > Pro: It is more natural to handle at kernel level than at lower (OPAL) layer. > con: The chipid is in device tree no longer the same as nid in kernel Is there any debugging/logging? Looks like not -- so how does a sysadmin map from firmware-provided values to the Linux values? That's going to make debugging of large systems (PowerVM or otherwise) less than pleasant, it seems? Possibly you could put something in sysfs? -Nish