From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: Nathan Lynch <nathanl@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Subject: Re: [RFC PATCH 1/2] powerpc/numa: Introduce logical numa id
Date: Mon, 10 Aug 2020 00:10:56 +0530 [thread overview]
Message-ID: <87eeof4q87.fsf@linux.ibm.com> (raw)
In-Reply-To: <6d880a50-09c4-d591-c88c-09fd77512ad3@linux.ibm.com>
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> On 8/8/20 2:15 AM, Nathan Lynch wrote:
>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>> On 8/7/20 9:54 AM, Nathan Lynch wrote:
>>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>>>>> index e437a9ac4956..6c659aada55b 100644
>>>>> --- a/arch/powerpc/mm/numa.c
>>>>> +++ b/arch/powerpc/mm/numa.c
>>>>> @@ -221,25 +221,51 @@ static void initialize_distance_lookup_table(int nid,
>>>>> }
>>>>> }
>>>>>
>>>>> +static u32 nid_map[MAX_NUMNODES] = {[0 ... MAX_NUMNODES - 1] = NUMA_NO_NODE};
>>>>
>>>> It's odd to me to use MAX_NUMNODES for this array when it's going to be
>>>> indexed not by Linux's logical node IDs but by the platform-provided
>>>> domain number, which has no relation to MAX_NUMNODES.
>>>
>>>
>>> I didn't want to dynamically allocate this. We could fetch
>>> "ibm,max-associativity-domains" to find the size for that. The current
>>> code do assume firmware group id to not exceed MAX_NUMNODES. Hence kept
>>> the array size to be MAX_NUMNODEs. I do agree that it is confusing. May
>>> be we can do #define MAX_AFFINITY_DOMAIN MAX_NUMNODES?
>>
>> Well, consider:
>>
>> - ibm,max-associativity-domains can change at runtime with LPM. This
>> doesn't happen in practice yet, but we should probably start thinking
>> about how to support that.
>> - The domain numbering isn't clearly specified to have any particular
>> properties such as beginning at zero or a contiguous range.
>>
>> While the current code likely contains assumptions contrary to these
>> points, a change such as this is an opportunity to think about whether
>> those assumptions can be reduced or removed. In particular I think it
>> would be good to gracefully degrade when the number of NUMA affinity
>> domains can exceed MAX_NUMNODES. Using the platform-supplied domain
>> numbers to directly index Linux data structures will make that
>> impossible.
>>
>> So, maybe genradix or even xarray wouldn't actually be overengineering
>> here.
>>
>
> One of the challenges with such a data structure is that we initialize
> the nid_map before the slab is available. This means a memblock based
> allocation and we would end up implementing such a sparse data structure
> ourselves here.
>
> As you mentioned above, since we know that hypervisor as of now limits
> the max affinity domain id below ibm,max-associativity-domains we are
> good with an array-like nid_map we have here. This keeps the code simpler.
>
> This will also allow us to switch to a more sparse data structure as you
> requested here in the future because the main change that is pushed in
> this series is the usage of firmare_group_id_to_nid(). The details of
> the data structure we use to keep track of that mapping are pretty much
> internal to that function.
How about this? This makes it not a direct index. But it do limit the
search to max numa node on the system.
static int domain_id_map[MAX_NUMNODES] = {[0 ... MAX_NUMNODES - 1] = -1 };
static int __affinity_domain_to_nid(int domain_id, int max_nid)
{
int i;
for (i = 0; i < max_nid; i++) {
if (domain_id_map[i] == domain_id)
return i;
}
return NUMA_NO_NODE;
}
int affinity_domain_to_nid(struct affinity_domain *domain)
{
int nid, domain_id;
static int last_nid = 0;
static DEFINE_SPINLOCK(node_id_lock);
domain_id = domain->id;
/*
* For PowerNV we don't change the node id. This helps to avoid
* confusion w.r.t the expected node ids. On pseries, node numbers
* are virtualized. Hence do logical node id for pseries.
*/
if (!firmware_has_feature(FW_FEATURE_LPAR))
return domain_id;
if (domain_id == -1 || last_nid == MAX_NUMNODES)
return NUMA_NO_NODE;
nid = __affinity_domain_to_nid(domain_id, last_nid);
if (nid == NUMA_NO_NODE) {
spin_lock(&node_id_lock);
/* recheck with lock held */
nid = __affinity_domain_to_nid(domain_id, last_nid);
if (nid == NUMA_NO_NODE) {
nid = last_nid++;
domain_id_map[nid] = domain_id;
}
spin_unlock(&node_id_lock);
}
return nid;
}
next prev parent reply other threads:[~2020-08-09 18:42 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-31 11:19 [RFC PATCH 1/2] powerpc/numa: Introduce logical numa id Aneesh Kumar K.V
2020-07-31 11:22 ` [RFC PATCH 2/2] powerpc/powernv/cpufreq: Don't assume chip id is same as Linux node id Aneesh Kumar K.V
2020-08-04 7:47 ` Gautham R Shenoy
2020-08-01 5:20 ` [RFC PATCH 1/2] powerpc/numa: Introduce logical numa id Srikar Dronamraju
2020-08-02 14:21 ` Aneesh Kumar K.V
2020-08-04 7:25 ` Srikar Dronamraju
2020-08-06 10:44 ` Aneesh Kumar K.V
2020-08-10 8:05 ` Srikar Dronamraju
2020-08-07 4:24 ` Nathan Lynch
2020-08-07 5:02 ` Aneesh Kumar K.V
2020-08-07 20:45 ` Nathan Lynch
2020-08-09 14:12 ` Aneesh Kumar K.V
2020-08-09 18:40 ` Aneesh Kumar K.V [this message]
2020-08-13 22:53 ` Nathan Lynch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87eeof4q87.fsf@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=nathanl@linux.ibm.com \
--cc=srikar@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).