public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Haicheng Li <haicheng.li@linux.intel.com>
To: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>,
	Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	x86@kernel.org, Andi Kleen <andi@firstfloor.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86/mm/srat_64.c: nodes_parsed should include all nodes detected by ACPI.
Date: Thu, 21 Jan 2010 00:40:02 +0800	[thread overview]
Message-ID: <4B5731E2.4040207@linux.intel.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1001191513010.24223@chino.kir.corp.google.com>

David Rientjes wrote:
> On Tue, 19 Jan 2010, Haicheng Li wrote:
>> David, per my understanding, your concern should be like, with this fix, if
>> 3rd or 4th entry of Node0 has no address range, then Node0 won't be recoverd
>> with oldnode and won't be cleared in nodes_parsed. But how is it handled by
>> old code?
>>
>
> It's not evident with your machine because you do not have two SRAT
> entries for the same node id, one without ACPI_SRAT_MEM_HOT_PLUGGABLE and
> other with ACPI_SRAT_MEM_HOT_PLUGGABLE.
>
> The old code would preserve the address range for the former in oldnode
> and then reset its data in the struct bootnode since nodes_parsed has a
> bit set for that node.  That's needed by later code that I've mentioned:
> acpi_get_nodes(), specifically, which breaks with your patch in addition
> to nodes_cover_memory() and e820_register_active_regions().
>
> Only when the previous oldnode entry does not have a valid address range,
> meaning it is [0, 0), does the bit get cleared in nodes_parsed.

Understood, the old code is meant to make nodes_parsed _NEVER_ include the node whose memory regions
are all hotpluggable.

>> - it recovers node with oldnode as long as current entry is HOT_PLUGGABLE. so
>> it handles the recover issue. but I think following patch can simply fix it as
>> well.
>>
>
> If it's not ACPI_SRAT_MEM_HOT_PLUGGABLE, we know the address range is
> already valid given the sanity checks that it has successfully passed
> through in acpi_numa_memory_affinity_init(), so we require no further
> checking.  However, your patch will not reset the previous address range
> when a ACPI_SRAT_MEM_HOT_PLUGGABLE entry is found for the same address
> range and you're leaving the bit set in nodes_parsed.

I see. the precondition is that nodes_parsed should not include such hotpluggable node, then such
data of hotpluggable mem should be kept in nodes_add[] other than in nodes[].

>>> cpu_nodes_parsed handles nodes without memory, there's no reason why a bit
>>> should be set in nodes_parsed if its corresponding node does not have a
>>> valid address range.
>> For a node has _NOT_ either CPU or Memory like Node1, cpu_nodes_parsed cannot
>> handle it.
>>
>
> It most certainly can since its sole purpose is to include memoryless
> nodes in node_possible_map.  It has no other use case that would break as
> the result of adding hotpluggable nodes, hence the reason I suggested
> renaming it no_mem_nodes_parsed.

Yeah, so the key point is who should keep hotpluggable nodes, cpu_nodes_parsed or nodes_parsed?
Actually now I agree with you on this, let cpu_nodes_parsed keep hotpluggable nodes since it won't
break any old code. Originally my patch wanna let nodes_parsed keep hotpluggable nodes, which would
make things complex.

but name "no_mem_nodes_parsed" seems convoluted too because (from code logic) this nodemask is
usually based on CPU/APIC Affinity Structure.
How about rename cpu_nodes_parsed as "rest_nodes_parsed" (comparing with "mem_nodes_parsed), since
it handles
   - nodes with CPU on
   - nodes with hotpluggable memory region
?

>>> We have a reasonable expectation that nodes_parsed represents memory nodes
>>> given its use for e820_register_active_regions() and nodes_cover_memory() as
>>> well as acpi_get_nodes() for NUMA emulation, for example, which would be
>>> broken with this patch.  See dc0985519.
>>>
>> either nodes_cover_memory() or e820_register_active_regions() or
>> acpi_get_nodes(), they all have node-addr-range check code, if the
>> node-addr-range is invalid, they won't be harmed.
>>
>
> Wrong, acpi_get_nodes() does not have such a check it only iterates over
> nodes_parsed.  In other words, you'd be starting a new requirement for
> nodes_parsed with your patch: it would now be necessary to check for a
> valid (non-zero) address range for each set bit.  Instead, I'm suggesting
> the nodes_parsed represents only nodes with valid memory, which is a
> reasonable expectation given the semantics of both it and cpu_nodes_parsed
> to handle their memoryless counterparts.

agreed. In term of this, using nodes_parsed to represent only nodes with valid memory can make
things simple.

> In other words, the following should easily fix the issue without breaking
> the existing logic that preserves the old address range for node ids that
> have SRAT entries both with and without ACPI_SRAT_MEM_HOT_PLUGGABLE.
> Could you give it a try?

of course, it fixes the issue because node_possible_map now includes hotpluggable node, and then
nr_node_ids becomes equal to maximum of possible nodes on the motherboard;).

let's add more changes to fix naming issue as well since it's too confusing for people to understand
the code logic. how about below patch?
---
  arch/x86/mm/srat_64.c |   39 +++++++++++++++++++++++++--------------
  1 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index a271241..aebbdd4 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -27,8 +27,17 @@ int acpi_numa __initdata;

  static struct acpi_table_slit *acpi_slit;

-static nodemask_t nodes_parsed __initdata;
-static nodemask_t cpu_nodes_parsed __initdata;
+/* mem_nodes_parsed:
+ *  - nodes with memory on
+ *
+ * rest_nodes_parsed:
+ *  - nodes with CPU on
+ *  - nodes with hotpluggable memory region
+ *
+ * We union these two nodemasks to get node_possible_map.
+ */
+static nodemask_t mem_nodes_parsed __initdata;
+static nodemask_t rest_nodes_parsed __initdata;
  static struct bootnode nodes[MAX_NUMNODES] __initdata;
  static struct bootnode nodes_add[MAX_NUMNODES];

@@ -134,7 +143,7 @@ acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)

  	apic_id = pa->apic_id;
  	apicid_to_node[apic_id] = node;
-	node_set(node, cpu_nodes_parsed);
+	node_set(node, rest_nodes_parsed);
  	acpi_numa = 1;
  	printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n",
  	       pxm, apic_id, node);
@@ -168,7 +177,7 @@ acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
  	else
  		apic_id = pa->apic_id;
  	apicid_to_node[apic_id] = node;
-	node_set(node, cpu_nodes_parsed);
+	node_set(node, rest_nodes_parsed);
  	acpi_numa = 1;
  	printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u\n",
  	       pxm, apic_id, node);
@@ -229,9 +238,11 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
  			printk(KERN_ERR "SRAT: Hotplug zone not continuous. Partly ignored\n");
  	}

-	if (changed)
+	if (changed) {
+		node_set(node, rest_nodes_parsed);
  		printk(KERN_INFO "SRAT: hot plug zone found %Lx - %Lx\n",
  				 nd->start, nd->end);
+	}
  }

  /* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
@@ -278,7 +289,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
  	}
  	nd = &nodes[node];
  	oldnode = *nd;
-	if (!node_test_and_set(node, nodes_parsed)) {
+	if (!node_test_and_set(node, mem_nodes_parsed)) {
  		nd->start = start;
  		nd->end = end;
  	} else {
@@ -296,7 +307,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
  		/* restore nodes[node] */
  		*nd = oldnode;
  		if ((nd->start | nd->end) == 0)
-			node_clear(node, nodes_parsed);
+			node_clear(node, mem_nodes_parsed);
  	}

  	node_memblk_range[num_node_memblks].start = start;
@@ -313,7 +324,7 @@ static int __init nodes_cover_memory(const struct bootnode *nodes)
  	unsigned long pxmram, e820ram;

  	pxmram = 0;
-	for_each_node_mask(i, nodes_parsed) {
+	for_each_node_mask(i, mem_nodes_parsed) {
  		unsigned long s = nodes[i].start >> PAGE_SHIFT;
  		unsigned long e = nodes[i].end >> PAGE_SHIFT;
  		pxmram += e - s;
@@ -341,7 +352,7 @@ int __init acpi_get_nodes(struct bootnode *physnodes)
  	int i;
  	int ret = 0;

-	for_each_node_mask(i, nodes_parsed) {
+	for_each_node_mask(i, mem_nodes_parsed) {
  		physnodes[ret].start = nodes[i].start;
  		physnodes[ret].end = nodes[i].end;
  		ret++;
@@ -370,7 +381,7 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
  		return -1;
  	}

-	for_each_node_mask(i, nodes_parsed)
+	for_each_node_mask(i, mem_nodes_parsed)
  		e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
  						nodes[i].end >> PAGE_SHIFT);
  	/* for out of order entries in SRAT */
@@ -381,7 +392,7 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
  	}

  	/* Account for nodes with cpus and no memory */
-	nodes_or(node_possible_map, nodes_parsed, cpu_nodes_parsed);
+	nodes_or(node_possible_map, mem_nodes_parsed, rest_nodes_parsed);

  	/* Finally register nodes */
  	for_each_node_mask(i, node_possible_map)
@@ -416,7 +427,7 @@ static int __init find_node_by_addr(unsigned long addr)
  	int ret = NUMA_NO_NODE;
  	int i;

-	for_each_node_mask(i, nodes_parsed) {
+	for_each_node_mask(i, mem_nodes_parsed) {
  		/*
  		 * Find the real node that this emulated node appears on.  For
  		 * the sake of simplicity, we only use a real node's starting
@@ -466,10 +477,10 @@ void __init acpi_fake_nodes(const struct bootnode *fake_nodes, int num_nodes)
  		__acpi_map_pxm_to_node(fake_node_to_pxm_map[i], i);
  	memcpy(apicid_to_node, fake_apicid_to_node, sizeof(apicid_to_node));

-	nodes_clear(nodes_parsed);
+	nodes_clear(mem_nodes_parsed);
  	for (i = 0; i < num_nodes; i++)
  		if (fake_nodes[i].start != fake_nodes[i].end)
-			node_set(i, nodes_parsed);
+			node_set(i, mem_nodes_parsed);
  }

  static int null_slit_node_compare(int a, int b)
-- 
1.5.4.4



  reply	other threads:[~2010-01-20 16:40 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-15  7:42 [PATCH] x86/mm/srat_64.c: nodes_parsed should include all nodes detected by ACPI Haicheng Li
2010-01-17  2:22 ` Haicheng Li
2010-01-17 21:53 ` David Rientjes
2010-01-18  6:30   ` Yinghai Lu
2010-01-18 10:43     ` David Rientjes
2010-01-19 11:08       ` Haicheng Li
2010-01-19 11:29         ` Haicheng Li
2010-01-19 23:30         ` David Rientjes
2010-01-20 16:40           ` Haicheng Li [this message]
2010-01-20 20:10             ` [patch] x86: set hotpluggable nodes in nodes_possible_map David Rientjes
2010-01-20 22:45               ` Yinghai Lu
2010-01-20 23:32                 ` David Rientjes
2010-01-21  3:00                 ` Haicheng Li
2010-01-21  2:58               ` Haicheng Li
2010-01-21  6:58                 ` David Rientjes
2010-01-21  7:31                   ` Haicheng Li
2010-01-21  7:50                     ` David Rientjes
2010-01-21  8:33                       ` Haicheng Li
2010-01-21 23:12                         ` David Rientjes
2010-01-22  4:06                           ` [PATCH] x86/mm/srat_64.c: make node_possible_map include hotpluggable node Haicheng Li
2010-01-22  7:33                             ` H. Peter Anvin
2010-01-22  8:43                               ` Haicheng Li
2010-01-22 10:14                                 ` H. Peter Anvin
2010-01-22 10:35                                   ` Haicheng Li
2010-01-22 11:15               ` [tip:x86/urgent] x86: Set hotpluggable nodes in nodes_possible_map tip-bot for David Rientjes
2010-01-23  6:51               ` tip-bot for David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B5731E2.4040207@linux.intel.com \
    --to=haicheng.li@linux.intel.com \
    --cc=andi@firstfloor.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox