linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>,
	Daniel Henrique Barboza <danielhb413@gmail.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v8 5/5] powerpc/pseries: Add support for FORM2 associativity
Date: Tue, 17 Aug 2021 13:40:41 +1000	[thread overview]
Message-ID: <YRsvudTd3+J5WSgR@yekko> (raw)
In-Reply-To: <20210812132223.225214-6-aneesh.kumar@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 8177 bytes --]

On Thu, Aug 12, 2021 at 06:52:23PM +0530, Aneesh Kumar K.V wrote:
> PAPR interface currently supports two different ways of communicating resource
> grouping details to the OS. These are referred to as Form 0 and Form 1
> associativity grouping. Form 0 is the older format and is now considered
> deprecated. This patch adds another resource grouping named FORM2.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Though there a couple of cosmetic issues and one bad memory access
issue (though only in the case of buggy firmware).

[snip]
> +Form 2
> +-------
> +Form 2 associativity format adds separate device tree properties representing NUMA node distance
> +thereby making the node distance computation flexible. Form 2 also allows flexible primary
> +domain numbering. With numa distance computation now detached from the index value in
> +"ibm,associativity-reference-points" property, Form 2 allows a large number of primary domain
> +ids at the same domainID index representing resource groups of different performance/latency
> +characteristics.
> +
> +Hypervisor indicates the usage of FORM2 associativity using bit 2 of byte 5 in the
> +"ibm,architecture-vec-5" property.
> +
> +"ibm,numa-lookup-index-table" property contains a list of one or more numbers representing
> +the domainIDs present in the system. The offset of the domainID in this property is
> +used as an index while computing numa distance information via "ibm,numa-distance-table".
> +
> +prop-encoded-array: The number N of the domainIDs encoded as with encode-int, followed by
> +N domainID encoded as with encode-int
> +
> +For ex:
> +"ibm,numa-lookup-index-table" =  {4, 0, 8, 250, 252}. The offset of domainID 8 (2) is used when

Since you're using dts syntax below, it probably makes sense to use it
here as well.

> +computing the distance of domain 8 from other domains present in the system. For the rest of
> +this document, this offset will be referred to as domain distance offset.
> +
> +"ibm,numa-distance-table" property contains a list of one or more numbers representing the NUMA
> +distance between resource groups/domains present in the system.
> +
> +prop-encoded-array: The number N of the distance values encoded as with encode-int, followed by
> +N distance values encoded as with encode-bytes. The max distance value we could encode is 255.
> +The number N must be equal to the square of m where m is the number of domainIDs in the
> +numa-lookup-index-table.
> +
> +For ex:
> +ibm,numa-lookup-index-table = <3 0 8 40>;
> +ibm,numa-distace-table = <9>, /bits/ 8 < 10  20  80
> +					 20  10 160
> +					 80 160  10>;

[snip]
> +
> +	/* FORM2 affinity  */
> +	nid = of_node_to_nid_single(node);
> +	if (nid == NUMA_NO_NODE)
> +		return;
> +
> +	/*
> +	 * With FORM2 we expect NUMA distance of all possible NUMA
> +	 * nodes to be provided during boot.
> +	 */
> +	WARN(numa_distance_table[nid][nid] == -1,
> +	     "NUMA distance details for node %d not provided\n", nid);
> +}
> +
> +/*
> + * ibm,numa-lookup-index-table= {N, domainid1, domainid2, ..... domainidN}
> + * ibm,numa-distance-table = { N, 1, 2, 4, 5, 1, 6, .... N elements}

.. and here too.

> + */
> +static void initialize_form2_numa_distance_lookup_table(void)
> +{
> +	int i, j;
> +	struct device_node *root;
> +	const __u8 *numa_dist_table;
> +	const __be32 *numa_lookup_index;
> +	int numa_dist_table_length;
> +	int max_numa_index, distance_index;
> +
> +	if (firmware_has_feature(FW_FEATURE_OPAL))
> +		root = of_find_node_by_path("/ibm,opal");
> +	else
> +		root = of_find_node_by_path("/rtas");
> +	if (!root)
> +		root = of_find_node_by_path("/");
> +
> +	numa_lookup_index = of_get_property(root, "ibm,numa-lookup-index-table", NULL);
> +	max_numa_index = of_read_number(&numa_lookup_index[0], 1);
> +
> +	/* first element of the array is the size and is encode-int */
> +	numa_dist_table = of_get_property(root, "ibm,numa-distance-table", NULL);
> +	numa_dist_table_length = of_read_number((const __be32 *)&numa_dist_table[0], 1);
> +	/* Skip the size which is encoded int */
> +	numa_dist_table += sizeof(__be32);
> +
> +	pr_debug("numa_dist_table_len = %d, numa_dist_indexes_len = %d\n",
> +		 numa_dist_table_length, max_numa_index);

You validate numa_dist_table_length below.  However, AFAICT you don't
anywhere check that the property actually has the length its first
element claims it does.  Yes, that represents a firmware bug, but it's
probably best if we don't ready past the end of the array in that
case, which I think is what will happen now.

Same applies for the lookup-index-table.

> +	for (i = 0; i < max_numa_index; i++)
> +		/* +1 skip the max_numa_index in the property */
> +		numa_id_index_table[i] = of_read_number(&numa_lookup_index[i + 1], 1);
> +
> +
> +	if (numa_dist_table_length != max_numa_index * max_numa_index) {
> +		WARN(1, "Wrong NUMA distance information\n");
> +		/* consider everybody else just remote. */
> +		for (i = 0;  i < max_numa_index; i++) {
> +			for (j = 0; j < max_numa_index; j++) {
> +				int nodeA = numa_id_index_table[i];
> +				int nodeB = numa_id_index_table[j];
> +
> +				if (nodeA == nodeB)
> +					numa_distance_table[nodeA][nodeB] = LOCAL_DISTANCE;
> +				else
> +					numa_distance_table[nodeA][nodeB] = REMOTE_DISTANCE;
> +			}
> +		}
> +	}
> +
> +	distance_index = 0;
> +	for (i = 0;  i < max_numa_index; i++) {
> +		for (j = 0; j < max_numa_index; j++) {
> +			int nodeA = numa_id_index_table[i];
> +			int nodeB = numa_id_index_table[j];
> +
> +			numa_distance_table[nodeA][nodeB] = numa_dist_table[distance_index++];
> +			pr_debug("dist[%d][%d]=%d ", nodeA, nodeB, numa_distance_table[nodeA][nodeB]);
> +		}
> +	}
> +	of_node_put(root);
>  }
>  
>  static int __init find_primary_domain_index(void)
> @@ -344,6 +449,9 @@ static int __init find_primary_domain_index(void)
>  	 */
>  	if (firmware_has_feature(FW_FEATURE_OPAL)) {
>  		affinity_form = FORM1_AFFINITY;
> +	} else if (firmware_has_feature(FW_FEATURE_FORM2_AFFINITY)) {
> +		dbg("Using form 2 affinity\n");
> +		affinity_form = FORM2_AFFINITY;
>  	} else if (firmware_has_feature(FW_FEATURE_FORM1_AFFINITY)) {
>  		dbg("Using form 1 affinity\n");
>  		affinity_form = FORM1_AFFINITY;
> @@ -388,9 +496,12 @@ static int __init find_primary_domain_index(void)
>  
>  		index = of_read_number(&distance_ref_points[1], 1);
>  	} else {
> +		/*
> +		 * Both FORM1 and FORM2 affinity find the primary domain details
> +		 * at the same offset.
> +		 */
>  		index = of_read_number(distance_ref_points, 1);
>  	}
> -
>  	/*
>  	 * Warn and cap if the hardware supports more than
>  	 * MAX_DISTANCE_REF_POINTS domains.
> @@ -819,6 +930,12 @@ static int __init parse_numa_properties(void)
>  
>  	dbg("NUMA associativity depth for CPU/Memory: %d\n", primary_domain_index);
>  
> +	/*
> +	 * If it is FORM2 initialize the distance table here.
> +	 */
> +	if (affinity_form == FORM2_AFFINITY)
> +		initialize_form2_numa_distance_lookup_table();
> +
>  	/*
>  	 * Even though we connect cpus to numa domains later in SMP
>  	 * init, we need to know the node ids now. This is because
> diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c
> index 5d4c2bc20bba..f162156b7b68 100644
> --- a/arch/powerpc/platforms/pseries/firmware.c
> +++ b/arch/powerpc/platforms/pseries/firmware.c
> @@ -123,6 +123,7 @@ vec5_fw_features_table[] = {
>  	{FW_FEATURE_PRRN,		OV5_PRRN},
>  	{FW_FEATURE_DRMEM_V2,		OV5_DRMEM_V2},
>  	{FW_FEATURE_DRC_INFO,		OV5_DRC_INFO},
> +	{FW_FEATURE_FORM2_AFFINITY,	OV5_FORM2_AFFINITY},
>  };
>  
>  static void __init fw_vec5_feature_init(const char *vec5, unsigned long len)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2021-08-17  3:42 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-12 13:22 [PATCH v8 0/5] Add support for FORM2 associativity Aneesh Kumar K.V
2021-08-12 13:22 ` [PATCH v8 1/5] powerpc/pseries: rename min_common_depth to primary_domain_index Aneesh Kumar K.V
2021-08-12 13:22 ` [PATCH v8 2/5] powerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITY Aneesh Kumar K.V
2021-08-12 13:22 ` [PATCH v8 3/5] powerpc/pseries: Consolidate different NUMA distance update code paths Aneesh Kumar K.V
2021-08-17  3:27   ` David Gibson
2021-08-12 13:22 ` [PATCH v8 4/5] powerpc/pseries: Add a helper for form1 cpu distance Aneesh Kumar K.V
2021-08-12 13:22 ` [PATCH v8 5/5] powerpc/pseries: Add support for FORM2 associativity Aneesh Kumar K.V
2021-08-17  3:40   ` David Gibson [this message]
2021-08-18 13:38 ` [PATCH v8 0/5] " Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YRsvudTd3+J5WSgR@yekko \
    --to=david@gibson.dropbear.id.au \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=danielhb413@gmail.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=nathanl@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).