All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>,
	Daniel Henrique Barboza <danielhb413@gmail.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v8 5/5] powerpc/pseries: Add support for FORM2 associativity
Date: Tue, 17 Aug 2021 13:40:41 +1000	[thread overview]
Message-ID: <YRsvudTd3+J5WSgR@yekko> (raw)
In-Reply-To: <20210812132223.225214-6-aneesh.kumar@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 8177 bytes --]

On Thu, Aug 12, 2021 at 06:52:23PM +0530, Aneesh Kumar K.V wrote:
> PAPR interface currently supports two different ways of communicating resource
> grouping details to the OS. These are referred to as Form 0 and Form 1
> associativity grouping. Form 0 is the older format and is now considered
> deprecated. This patch adds another resource grouping named FORM2.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Though there a couple of cosmetic issues and one bad memory access
issue (though only in the case of buggy firmware).

[snip]
> +Form 2
> +-------
> +Form 2 associativity format adds separate device tree properties representing NUMA node distance
> +thereby making the node distance computation flexible. Form 2 also allows flexible primary
> +domain numbering. With numa distance computation now detached from the index value in
> +"ibm,associativity-reference-points" property, Form 2 allows a large number of primary domain
> +ids at the same domainID index representing resource groups of different performance/latency
> +characteristics.
> +
> +Hypervisor indicates the usage of FORM2 associativity using bit 2 of byte 5 in the
> +"ibm,architecture-vec-5" property.
> +
> +"ibm,numa-lookup-index-table" property contains a list of one or more numbers representing
> +the domainIDs present in the system. The offset of the domainID in this property is
> +used as an index while computing numa distance information via "ibm,numa-distance-table".
> +
> +prop-encoded-array: The number N of the domainIDs encoded as with encode-int, followed by
> +N domainID encoded as with encode-int
> +
> +For ex:
> +"ibm,numa-lookup-index-table" =  {4, 0, 8, 250, 252}. The offset of domainID 8 (2) is used when

Since you're using dts syntax below, it probably makes sense to use it
here as well.

> +computing the distance of domain 8 from other domains present in the system. For the rest of
> +this document, this offset will be referred to as domain distance offset.
> +
> +"ibm,numa-distance-table" property contains a list of one or more numbers representing the NUMA
> +distance between resource groups/domains present in the system.
> +
> +prop-encoded-array: The number N of the distance values encoded as with encode-int, followed by
> +N distance values encoded as with encode-bytes. The max distance value we could encode is 255.
> +The number N must be equal to the square of m where m is the number of domainIDs in the
> +numa-lookup-index-table.
> +
> +For ex:
> +ibm,numa-lookup-index-table = <3 0 8 40>;
> +ibm,numa-distace-table = <9>, /bits/ 8 < 10  20  80
> +					 20  10 160
> +					 80 160  10>;

[snip]
> +
> +	/* FORM2 affinity  */
> +	nid = of_node_to_nid_single(node);
> +	if (nid == NUMA_NO_NODE)
> +		return;
> +
> +	/*
> +	 * With FORM2 we expect NUMA distance of all possible NUMA
> +	 * nodes to be provided during boot.
> +	 */
> +	WARN(numa_distance_table[nid][nid] == -1,
> +	     "NUMA distance details for node %d not provided\n", nid);
> +}
> +
> +/*
> + * ibm,numa-lookup-index-table= {N, domainid1, domainid2, ..... domainidN}
> + * ibm,numa-distance-table = { N, 1, 2, 4, 5, 1, 6, .... N elements}

.. and here too.

> + */
> +static void initialize_form2_numa_distance_lookup_table(void)
> +{
> +	int i, j;
> +	struct device_node *root;
> +	const __u8 *numa_dist_table;
> +	const __be32 *numa_lookup_index;
> +	int numa_dist_table_length;
> +	int max_numa_index, distance_index;
> +
> +	if (firmware_has_feature(FW_FEATURE_OPAL))
> +		root = of_find_node_by_path("/ibm,opal");
> +	else
> +		root = of_find_node_by_path("/rtas");
> +	if (!root)
> +		root = of_find_node_by_path("/");
> +
> +	numa_lookup_index = of_get_property(root, "ibm,numa-lookup-index-table", NULL);
> +	max_numa_index = of_read_number(&numa_lookup_index[0], 1);
> +
> +	/* first element of the array is the size and is encode-int */
> +	numa_dist_table = of_get_property(root, "ibm,numa-distance-table", NULL);
> +	numa_dist_table_length = of_read_number((const __be32 *)&numa_dist_table[0], 1);
> +	/* Skip the size which is encoded int */
> +	numa_dist_table += sizeof(__be32);
> +
> +	pr_debug("numa_dist_table_len = %d, numa_dist_indexes_len = %d\n",
> +		 numa_dist_table_length, max_numa_index);

You validate numa_dist_table_length below.  However, AFAICT you don't
anywhere check that the property actually has the length its first
element claims it does.  Yes, that represents a firmware bug, but it's
probably best if we don't ready past the end of the array in that
case, which I think is what will happen now.

Same applies for the lookup-index-table.

> +	for (i = 0; i < max_numa_index; i++)
> +		/* +1 skip the max_numa_index in the property */
> +		numa_id_index_table[i] = of_read_number(&numa_lookup_index[i + 1], 1);
> +
> +
> +	if (numa_dist_table_length != max_numa_index * max_numa_index) {
> +		WARN(1, "Wrong NUMA distance information\n");
> +		/* consider everybody else just remote. */
> +		for (i = 0;  i < max_numa_index; i++) {
> +			for (j = 0; j < max_numa_index; j++) {
> +				int nodeA = numa_id_index_table[i];
> +				int nodeB = numa_id_index_table[j];
> +
> +				if (nodeA == nodeB)
> +					numa_distance_table[nodeA][nodeB] = LOCAL_DISTANCE;
> +				else
> +					numa_distance_table[nodeA][nodeB] = REMOTE_DISTANCE;
> +			}
> +		}
> +	}
> +
> +	distance_index = 0;
> +	for (i = 0;  i < max_numa_index; i++) {
> +		for (j = 0; j < max_numa_index; j++) {
> +			int nodeA = numa_id_index_table[i];
> +			int nodeB = numa_id_index_table[j];
> +
> +			numa_distance_table[nodeA][nodeB] = numa_dist_table[distance_index++];
> +			pr_debug("dist[%d][%d]=%d ", nodeA, nodeB, numa_distance_table[nodeA][nodeB]);
> +		}
> +	}
> +	of_node_put(root);
>  }
>  
>  static int __init find_primary_domain_index(void)
> @@ -344,6 +449,9 @@ static int __init find_primary_domain_index(void)
>  	 */
>  	if (firmware_has_feature(FW_FEATURE_OPAL)) {
>  		affinity_form = FORM1_AFFINITY;
> +	} else if (firmware_has_feature(FW_FEATURE_FORM2_AFFINITY)) {
> +		dbg("Using form 2 affinity\n");
> +		affinity_form = FORM2_AFFINITY;
>  	} else if (firmware_has_feature(FW_FEATURE_FORM1_AFFINITY)) {
>  		dbg("Using form 1 affinity\n");
>  		affinity_form = FORM1_AFFINITY;
> @@ -388,9 +496,12 @@ static int __init find_primary_domain_index(void)
>  
>  		index = of_read_number(&distance_ref_points[1], 1);
>  	} else {
> +		/*
> +		 * Both FORM1 and FORM2 affinity find the primary domain details
> +		 * at the same offset.
> +		 */
>  		index = of_read_number(distance_ref_points, 1);
>  	}
> -
>  	/*
>  	 * Warn and cap if the hardware supports more than
>  	 * MAX_DISTANCE_REF_POINTS domains.
> @@ -819,6 +930,12 @@ static int __init parse_numa_properties(void)
>  
>  	dbg("NUMA associativity depth for CPU/Memory: %d\n", primary_domain_index);
>  
> +	/*
> +	 * If it is FORM2 initialize the distance table here.
> +	 */
> +	if (affinity_form == FORM2_AFFINITY)
> +		initialize_form2_numa_distance_lookup_table();
> +
>  	/*
>  	 * Even though we connect cpus to numa domains later in SMP
>  	 * init, we need to know the node ids now. This is because
> diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c
> index 5d4c2bc20bba..f162156b7b68 100644
> --- a/arch/powerpc/platforms/pseries/firmware.c
> +++ b/arch/powerpc/platforms/pseries/firmware.c
> @@ -123,6 +123,7 @@ vec5_fw_features_table[] = {
>  	{FW_FEATURE_PRRN,		OV5_PRRN},
>  	{FW_FEATURE_DRMEM_V2,		OV5_DRMEM_V2},
>  	{FW_FEATURE_DRC_INFO,		OV5_DRC_INFO},
> +	{FW_FEATURE_FORM2_AFFINITY,	OV5_FORM2_AFFINITY},
>  };
>  
>  static void __init fw_vec5_feature_init(const char *vec5, unsigned long len)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2021-08-17  3:42 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-12 13:22 [PATCH v8 0/5] Add support for FORM2 associativity Aneesh Kumar K.V
2021-08-12 13:22 ` [PATCH v8 1/5] powerpc/pseries: rename min_common_depth to primary_domain_index Aneesh Kumar K.V
2021-08-12 13:22 ` [PATCH v8 2/5] powerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITY Aneesh Kumar K.V
2021-08-12 13:22 ` [PATCH v8 3/5] powerpc/pseries: Consolidate different NUMA distance update code paths Aneesh Kumar K.V
2021-08-17  3:27   ` David Gibson
2021-08-12 13:22 ` [PATCH v8 4/5] powerpc/pseries: Add a helper for form1 cpu distance Aneesh Kumar K.V
2021-08-12 13:22 ` [PATCH v8 5/5] powerpc/pseries: Add support for FORM2 associativity Aneesh Kumar K.V
2021-08-17  3:40   ` David Gibson [this message]
2021-08-18 13:38 ` [PATCH v8 0/5] " Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YRsvudTd3+J5WSgR@yekko \
    --to=david@gibson.dropbear.id.au \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=danielhb413@gmail.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=nathanl@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.