public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* CPU only nodes (no memory) patch for NUMA/DISCONTIG
@ 2004-02-23 19:45 Robert Picco
  2004-02-23 21:16 ` David Mosberger
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Robert Picco @ 2004-02-23 19:45 UTC (permalink / raw)
  To: linux-ia64

David:

Jesse has reviewed this for me and we are in agreement.  CPU only nodes 
are moved to a node with memory which is at the closest relative 
distance per the SLIT information.  Any node reassignments will result 
in the compression of the nodes and renumbering the nid values where 
appropriate.

There is hope our firmware folks will handle interleaved memory 
correctly in regards to SLIT and PXM.  There should be more data on this 
later in the week.  No matter what the outcome of a firmware change,  
the patch will work.

Bob

--- linux-2.6.3-orig/arch/ia64/mm/discontig.c	2004-02-17 22:59:19.000000000 -0500
+++ linux-2.6.3/arch/ia64/mm/discontig.c	2004-02-23 11:02:06.000000000 -0500
@@ -41,6 +41,99 @@
 static struct early_node_data mem_data[NR_NODES] __initdata;
 
 /*
+ * This function will move nodes with only CPUs (no memory)
+ * to a node with memory which is at the minimum numa_slit distance.
+ * Any reassigments will result in the compression of the nodes
+ * and renumbering the nid values where appropriate.
+ */
+static void __init reassign_cpu_only_nodes(void)
+{
+	struct node_memblk_s *p;
+	int i, j, k, nnode, nid, cpu, cpunid;
+	u8 cslit, slit;
+	static DECLARE_BITMAP(nodes_with_mem, NR_NODES) __initdata;
+	static u8 numa_slit_fix[MAX_NUMNODES * MAX_NUMNODES] __initdata;
+	static int node_flip[NR_NODES] __initdata;
+
+	for (nnode = 0, p = &node_memblk[0]; p < &node_memblk[num_node_memblks]; p++) 
+		if (!test_bit(p->nid, (void *) nodes_with_mem)) {
+			set_bit(p->nid, (void *) nodes_with_mem);
+			nnode++;
+		}
+	
+	/*
+	 * All nids with memory.
+	 */
+	if (nnode = numnodes) 
+		return;
+
+	/*
+	 * Change nids and attempt to migrate CPU only nodes
+	 * to the best numa_slit (closest neighbor) possible.
+	 */
+	for (nid = 0, i = 0; i < numnodes; i++)  {
+		if (test_bit(i, (void *) nodes_with_mem)) {
+			node_flip[nid] = i;
+
+			if (i = nid) {
+				nid++;
+				continue;
+			}
+
+			for (p = &node_memblk[0]; p < &node_memblk[num_node_memblks]; p++)
+				if (p->nid = i)
+					p->nid = nid;
+
+			cpunid = nid;
+			nid++;
+		} else
+			cpunid = numnodes;
+			
+		for (cpu = 0; cpu < NR_CPUS; cpu++) 
+			if (node_cpuid[cpu].nid = i) {
+				if (cpunid < numnodes) {
+					node_cpuid[cpu].nid = cpunid;
+					continue;
+				}
+
+				for (slit = 0xff, k = numnodes + numnodes, j = 0; j < numnodes; j++)
+					if (i = j)
+						continue;
+					else if (test_bit(j, (void *) nodes_with_mem)) {
+						cslit = numa_slit[i * numnodes + j];
+						if (cslit < slit) {
+							k = numnodes + j;
+							slit = cslit;
+						}
+					}
+
+				node_cpuid[cpu].nid = k;
+			}
+	}
+
+	for (cpu = 0; cpu < NR_CPUS; cpu++)
+		if (node_cpuid[cpu].nid = (numnodes + numnodes))
+			node_cpuid[cpu].nid = nnode - 1;
+		else
+			for (i = 0; i < nnode; i++)
+				if (node_flip[i] = (node_cpuid[cpu].nid - numnodes)) {
+					node_cpuid[cpu].nid = i;
+					break;
+				}
+
+	for (i = 0; i < nnode; i++)
+		for (j = 0; j < nnode; j++)
+			numa_slit_fix[i * nnode + j] = 
+				numa_slit[node_flip[i] * numnodes + node_flip[j]];
+			
+	memcpy(numa_slit, numa_slit_fix, sizeof (numa_slit));
+
+	numnodes = nnode;
+
+	return;
+}
+
+/*
  * To prevent cache aliasing effects, align per-node structures so that they
  * start at addresses that are strided by node number.
  */
@@ -301,6 +394,9 @@
 	min_low_pfn = -1;
 	max_low_pfn = 0;
 
+	if (numnodes > 1) 
+		reassign_cpu_only_nodes();
+
 	/* These actually end up getting called by call_pernode_memory() */
 	efi_memmap_walk(filter_rsvd_memory, build_node_maps);
 	efi_memmap_walk(filter_rsvd_memory, find_pernode_space);






^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CPU only nodes (no memory) patch for NUMA/DISCONTIG
  2004-02-23 19:45 CPU only nodes (no memory) patch for NUMA/DISCONTIG Robert Picco
@ 2004-02-23 21:16 ` David Mosberger
  2004-02-23 21:36 ` Jesse Barnes
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: David Mosberger @ 2004-02-23 21:16 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Mon, 23 Feb 2004 14:45:00 -0500, Robert Picco <Robert.Picco@hp.com> said:

  Robert> +	static DECLARE_BITMAP(nodes_with_mem, NR_NODES) __initdata;
  Robert> +	static u8 numa_slit_fix[MAX_NUMNODES * MAX_NUMNODES] __initdata;
  Robert> +	static int node_flip[NR_NODES] __initdata;

Perhaps a comment would be in place as to why these are static?
I assume they're to avoid inordinate stack-space consumption?  Also,
the code won't be re-entrant which may be something worth pointing
out in the comment for the function.

Then there is my usual complaint about trailing white space.

Otherwise, the patch is fine with me (though I don't think the code is
easy to follow; there seem to be some non-obvious inversions in the
meaning of the bitmap bits).

	--david

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CPU only nodes (no memory) patch for NUMA/DISCONTIG
  2004-02-23 19:45 CPU only nodes (no memory) patch for NUMA/DISCONTIG Robert Picco
  2004-02-23 21:16 ` David Mosberger
@ 2004-02-23 21:36 ` Jesse Barnes
  2004-02-24 14:40 ` Robert Picco
  2004-02-24 21:54 ` David Mosberger
  3 siblings, 0 replies; 5+ messages in thread
From: Jesse Barnes @ 2004-02-23 21:36 UTC (permalink / raw)
  To: linux-ia64

On Mon, Feb 23, 2004 at 02:45:00PM -0500, Robert Picco wrote:
> David:
> 
> Jesse has reviewed this for me and we are in agreement.  CPU only nodes 
> are moved to a node with memory which is at the closest relative 
> distance per the SLIT information.  Any node reassignments will result 
> in the compression of the nodes and renumbering the nid values where 
> appropriate.

I really just reviewed the feature, I think it actually ends up being a
little better (or at least easier to get right) for situations where a
node has CPUs but no memory than the early bootmem scheme I was
proposing.  I haven't looked at the code much though (aside from asking
for a function comment and DECLARE_BITMAP usage), so I'm not sure about
that part of it...

> There is hope our firmware folks will handle interleaved memory 
> correctly in regards to SLIT and PXM.  There should be more data on this 
> later in the week.  No matter what the outcome of a firmware change,  
> the patch will work.

Cool, keep us posted.

> /*
> + * This function will move nodes with only CPUs (no memory)
> + * to a node with memory which is at the minimum numa_slit distance.
> + * Any reassigments will result in the compression of the nodes
> + * and renumbering the nid values where appropriate.
> + */

May as well use the kdoc style function comment here, just to be
consistent with the rest of the file (it's documented somewhere in
Documentation/kernel-doc-nano-HOWTO.txt).

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CPU only nodes (no memory) patch for NUMA/DISCONTIG
  2004-02-23 19:45 CPU only nodes (no memory) patch for NUMA/DISCONTIG Robert Picco
  2004-02-23 21:16 ` David Mosberger
  2004-02-23 21:36 ` Jesse Barnes
@ 2004-02-24 14:40 ` Robert Picco
  2004-02-24 21:54 ` David Mosberger
  3 siblings, 0 replies; 5+ messages in thread
From: Robert Picco @ 2004-02-24 14:40 UTC (permalink / raw)
  To: linux-ia64

David Mosberger wrote:

>>>>>>On Mon, 23 Feb 2004 14:45:00 -0500, Robert Picco <Robert.Picco@hp.com> said:
>>>>>>            
>>>>>>
>
>  Robert> +	static DECLARE_BITMAP(nodes_with_mem, NR_NODES) __initdata;
>  Robert> +	static u8 numa_slit_fix[MAX_NUMNODES * MAX_NUMNODES] __initdata;
>  Robert> +	static int node_flip[NR_NODES] __initdata;
>
>Perhaps a comment would be in place as to why these are static?
>I assume they're to avoid inordinate stack-space consumption?  Also,
>the code won't be re-entrant which may be something worth pointing
>out in the comment for the function.
>  
>
Added commentary here.  There are additional comments in code which 
hopefully elucidates the algorithm.
Also I've addressed the other feedback.

thanks,

Bob

>Then there is my usual complaint about trailing white space.
>
>Otherwise, the patch is fine with me (though I don't think the code is
>easy to follow; there seem to be some non-obvious inversions in the
>meaning of the bitmap bits).
>
>	--david
>-
>To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>  
>

--- linux-2.6.3-orig/arch/ia64/mm/discontig.c	2004-02-23 14:50:56.000000000 -0500
+++ linux-2.6.3/arch/ia64/mm/discontig.c	2004-02-24 07:59:09.000000000 -0500
@@ -40,6 +40,125 @@
 
 static struct early_node_data mem_data[NR_NODES] __initdata;
 
+/**
+ * reassign_cpu_only_nodes - called from find_memory to move CPU only nodes to a memory node
+ *
+ * This function will move nodes with only CPUs (no memory)
+ * to a node with memory which is at the minimum numa_slit distance.
+ * Any reassigments will result in the compression of the nodes
+ * and renumbering the nid values where appropriate.
+ * The static declarations below are to avoid large stack size which
+ * makes the code not re-entrant.
+ */
+static void __init reassign_cpu_only_nodes(void)
+{
+	struct node_memblk_s *p;
+	int i, j, k, nnode, nid, cpu, cpunid;
+	u8 cslit, slit;
+	static DECLARE_BITMAP(nodes_with_mem, NR_NODES) __initdata;
+	static u8 numa_slit_fix[MAX_NUMNODES * MAX_NUMNODES] __initdata;
+	static int node_flip[NR_NODES] __initdata;
+
+	for (nnode = 0, p = &node_memblk[0]; p < &node_memblk[num_node_memblks]; p++)
+		if (!test_bit(p->nid, (void *) nodes_with_mem)) {
+			set_bit(p->nid, (void *) nodes_with_mem);
+			nnode++;
+		}
+	
+	/*
+	 * All nids with memory.
+	 */
+	if (nnode = numnodes)
+		return;
+
+	/*
+	 * Change nids and attempt to migrate CPU only nodes
+	 * to the best numa_slit (closest neighbor) possible.
+	 * For reassigned CPU nodes a nid can't be arrived at
+	 * until after this loop because the target nid's new
+	 * identity might not have been established yet. So
+	 * new nid values are fabricated above numnodes and
+	 * mapped back later to their true value.
+	 */
+	for (nid = 0, i = 0; i < numnodes; i++)  {
+		if (test_bit(i, (void *) nodes_with_mem)) {
+			/*
+			 * Save original nid value for numa_slit
+			 * fixup and node_cpuid reassignments.
+			 */
+			node_flip[nid] = i;
+
+			if (i = nid) {
+				nid++;
+				continue;
+			}
+
+			for (p = &node_memblk[0]; p < &node_memblk[num_node_memblks]; p++)
+				if (p->nid = i)
+					p->nid = nid;
+
+			cpunid = nid;
+			nid++;
+		} else
+			cpunid = numnodes;
+			
+		for (cpu = 0; cpu < NR_CPUS; cpu++)
+			if (node_cpuid[cpu].nid = i) {
+				/* For nodes not being reassigned just fix the cpu's nid. */
+				if (cpunid < numnodes) {
+					node_cpuid[cpu].nid = cpunid;
+					continue;
+				}
+
+				/*
+				 * For nodes being reassigned, find best node by
+				 * numa_slit information and then make a temporary
+				 * nid value based on current nid and numnodes.
+				 */
+				for (slit = 0xff, k = numnodes + numnodes, j = 0; j < numnodes; j++)
+					if (i = j)
+						continue;
+					else if (test_bit(j, (void *) nodes_with_mem)) {
+						cslit = numa_slit[i * numnodes + j];
+						if (cslit < slit) {
+							k = numnodes + j;
+							slit = cslit;
+						}
+					}
+
+				node_cpuid[cpu].nid = k;
+			}
+	}
+
+	/*
+	 * Fixup temporary nid values for CPU only nodes.
+	 */
+	for (cpu = 0; cpu < NR_CPUS; cpu++)
+		if (node_cpuid[cpu].nid = (numnodes + numnodes))
+			node_cpuid[cpu].nid = nnode - 1;
+		else
+			for (i = 0; i < nnode; i++)
+				if (node_flip[i] = (node_cpuid[cpu].nid - numnodes)) {
+					node_cpuid[cpu].nid = i;
+					break;
+				}
+
+	/*
+	 * Fix numa_slit by compressing from larger
+	 * nid array to reduced nid array.
+	 */
+	for (i = 0; i < nnode; i++)
+		for (j = 0; j < nnode; j++)
+			numa_slit_fix[i * nnode + j] +				numa_slit[node_flip[i] * numnodes + node_flip[j]];
+			
+	memcpy(numa_slit, numa_slit_fix, sizeof (numa_slit));
+
+	numnodes = nnode;
+
+	return;
+}
+
 /*
  * To prevent cache aliasing effects, align per-node structures so that they
  * start at addresses that are strided by node number.
@@ -301,6 +420,9 @@
 	min_low_pfn = -1;
 	max_low_pfn = 0;
 
+	if (numnodes > 1)
+		reassign_cpu_only_nodes();
+
 	/* These actually end up getting called by call_pernode_memory() */
 	efi_memmap_walk(filter_rsvd_memory, build_node_maps);
 	efi_memmap_walk(filter_rsvd_memory, find_pernode_space);



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CPU only nodes (no memory) patch for NUMA/DISCONTIG
  2004-02-23 19:45 CPU only nodes (no memory) patch for NUMA/DISCONTIG Robert Picco
                   ` (2 preceding siblings ...)
  2004-02-24 14:40 ` Robert Picco
@ 2004-02-24 21:54 ` David Mosberger
  3 siblings, 0 replies; 5+ messages in thread
From: David Mosberger @ 2004-02-24 21:54 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Tue, 24 Feb 2004 09:40:49 -0500, Robert Picco <Robert.Picco@hp.com> said:

  Robert> Added commentary here.  There are additional comments in code which 
  Robert> hopefully elucidates the algorithm.
  Robert> Also I've addressed the other feedback.

The patch looked fine to me.  The only changes I made is to squish the
remaining trailing whitespace (which was only on otherwise empty
lines, in your case) and to change "CPU only" to "CPU-only" (seems
easier to parse that way).  I cooked up a ChangeLog entry, but in the
future, I'd really appreciate it if all patch-submissions would always
include the desired ChangeLog.  Otherwise, I have to go digging
through old mail and that takes time and makes it likely that I'll get
something wrong.  I attached the ChangeLog entry below.  If you want
something corrected, let me know soon, while I can still edit it.

Thanks,

	--david

ChangeSet@1.1666, 2004-02-24 13:48:30-08:00, Robert.Picco@hp.com
  [PATCH] ia64: add support for NUMA machines with CPU-only (memory-less) nodes

  This patch works around a limitation in the current NUMA code, which
  doesn't support CPU-only (memory-less) nodes.  With this patch
  CPU-only nodes are moved to a node with memory which is at the closest
  relative distance per the SLIT information.  Any node reassignments
  will result in the compression of the nodes and renumbering the nid
  values where appropriate.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-02-24 21:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-23 19:45 CPU only nodes (no memory) patch for NUMA/DISCONTIG Robert Picco
2004-02-23 21:16 ` David Mosberger
2004-02-23 21:36 ` Jesse Barnes
2004-02-24 14:40 ` Robert Picco
2004-02-24 21:54 ` David Mosberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox