From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2 1/3] powerpc/numa: Introduce logical numa id
Date: Tue, 18 Aug 2020 13:51:16 +0530 [thread overview]
Message-ID: <87zh6s1i0z.fsf@linux.ibm.com> (raw)
In-Reply-To: <20200817114908.GA32655@linux.vnet.ibm.com>
Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> * Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [2020-08-17 17:04:24]:
>
>> On 8/17/20 4:29 PM, Srikar Dronamraju wrote:
>> > * Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [2020-08-17 16:02:36]:
>> >
>> > > We use ibm,associativity and ibm,associativity-lookup-arrays to derive the numa
>> > > node numbers. These device tree properties are firmware indicated grouping of
>> > > resources based on their hierarchy in the platform. These numbers (group id) are
>> > > not sequential and hypervisor/firmware can follow different numbering schemes.
>> > > For ex: on powernv platforms, we group them in the below order.
>> > >
>> > > * - CCM node ID
>> > > * - HW card ID
>> > > * - HW module ID
>> > > * - Chip ID
>> > > * - Core ID
>> > >
>> > > Based on ibm,associativity-reference-points we use one of the above group ids as
>> > > Linux NUMA node id. (On PowerNV platform Chip ID is used). This results
>> > > in Linux reporting non-linear NUMA node id and which also results in Linux
>> > > reporting empty node 0 NUMA nodes.
>> > >
>> > > This can be resolved by mapping the firmware provided group id to a logical Linux
>> > > NUMA id. In this patch, we do this only for pseries platforms considering the
>> > > firmware group id is a virtualized entity and users would not have drawn any
>> > > conclusion based on the Linux Numa Node id.
>> > >
>> > > On PowerNV platform since we have historically mapped Chip ID as Linux NUMA node
>> > > id, we keep the existing Linux NUMA node id numbering.
>> >
>> > I still dont understand how you are going to handle numa distances.
>> > With your patch, have you tried dlpar add/remove on a sparsely noded machine?
>> >
>>
>> We follow the same steps when fetching distance information. Instead of
>> using affinity domain id, we now use the mapped node id. The relevant hunk
>> in the patch is
>>
>> + nid = affinity_domain_to_nid(&domain);
>>
>> if (nid > 0 &&
>> - of_read_number(associativity, 1) >= distance_ref_points_depth) {
>> + of_read_number(associativity, 1) >= distance_ref_points_depth) {
>> /*
>> * Skip the length field and send start of associativity array
>> */
>>
>> I haven't tried dlpar add/remove. I don't have a setup to try that. Do you
>> see a problem there?
>>
>
> Yes, I think there can be 2 problems.
>
> 1. distance table may be filled with incorrect data.
> 2. numactl -H distance table shows symmetric data, the symmetric nature may
> be lost.
>
After discussing with srikar to understand these concern better, below
are the conclusions.
1) There is no corruption of node distance. We do handle node distance
correctly. But the numactl -H output can be such that we won't find the
numa nodes with a higher number to be further away from node 0. ie. We can
find output like below.
node 0 1 2 3
0: 10 40 40 20
1: 40 10 40 40
2: 40 40 10 40
3: 20 40 40 10
Here node 3 is closer to node 0 Than node 1 and 2. I am not sure this
is going to break any userspace.
2) We can find node number changing if we do a DLPAR add of memory/cpu
and reboot. ie, we boot with resource domain id 4 and 6 and then later
add resources from domain 5. In this above case, we will have node 0,1
and 2 mapping domain id 4, 6, 5. On reboot, we can map them such that
node 0 -> 4
node 1 -> 5
node 2 -> 6
I guess this is still ok because we are running in a virtualized
environment and node numbers to domain id are never guaranteed to be he
same across reboot.
-aneesh
prev parent reply other threads:[~2020-08-18 8:27 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-17 10:32 [PATCH v2 1/3] powerpc/numa: Introduce logical numa id Aneesh Kumar K.V
2020-08-17 10:32 ` [PATCH v2 2/3] powerpc/powernv/cpufreq: Don't assume chip id is same as Linux node id Aneesh Kumar K.V
2020-08-17 10:32 ` [PATCH v2 3/3] powerpc/numa: Move POWER4 restriction to the helper Aneesh Kumar K.V
2020-08-17 10:59 ` [PATCH v2 1/3] powerpc/numa: Introduce logical numa id Srikar Dronamraju
2020-08-17 11:34 ` Aneesh Kumar K.V
2020-08-17 11:49 ` Srikar Dronamraju
2020-08-18 8:21 ` Aneesh Kumar K.V [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zh6s1i0z.fsf@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=nathanl@linux.ibm.com \
--cc=srikar@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).