All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yinghai Lu <yinghai@kernel.org>
To: mingo@elte.hu, "H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <jens.axboe@oracle.com>,
	Jesse Barnes <jbarnes@virtuousgeek.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	rdreier@cisco.com, Suresh Siddha <suresh.b.siddha@intel.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Huang Ying <ying.huang@intel.com>,
	rientjes@google.com
Subject: Re: kexec boot regression
Date: Tue, 15 Dec 2009 14:24:59 -0800	[thread overview]
Message-ID: <4B280CBB.9090406@kernel.org> (raw)
In-Reply-To: <20091215215214.GF28252@kernel.dk>

Jens Axboe wrote:
> On Tue, Dec 15 2009, Jens Axboe wrote:
>>> oh, i post one patch last week, 
>>>
>>> can you check it?
>> Sure, let me try it. I already found out that commit 8716273c is the
>> guilty one (x86: Export srat physical topology).
> 
> Confirmed, -git with that patch works as well. So that's all of them I
> think, can we please get this expedited in so that -rc1 will work?
> Thanks!

updated version:

[PATCH] x86: fix checking of SRAT when node0 ram is not from 0 -v3

Found one system that boot from socket1 instead of socket0, SRAT get rejected...

[    0.000000] SRAT: Node 1 PXM 0 0-a0000
[    0.000000] SRAT: Node 1 PXM 0 100000-80000000
[    0.000000] SRAT: Node 1 PXM 0 100000000-2080000000
[    0.000000] SRAT: Node 0 PXM 1 2080000000-4080000000
[    0.000000] SRAT: Node 2 PXM 2 4080000000-6080000000
[    0.000000] SRAT: Node 3 PXM 3 6080000000-8080000000
[    0.000000] SRAT: Node 4 PXM 4 8080000000-a080000000
[    0.000000] SRAT: Node 5 PXM 5 a080000000-c080000000
[    0.000000] SRAT: Node 6 PXM 6 c080000000-e080000000
[    0.000000] SRAT: Node 7 PXM 7 e080000000-10080000000
...
[    0.000000] NUMA: Allocated memnodemap from 500000 - 701040
[    0.000000] NUMA: Using 20 for the hash shift.
[    0.000000] Adding active range (0, 0x2080000, 0x4080000) 0 entries of 3200 used
[    0.000000] Adding active range (1, 0x0, 0x96) 1 entries of 3200 used
[    0.000000] Adding active range (1, 0x100, 0x7f750) 2 entries of 3200 used
[    0.000000] Adding active range (1, 0x100000, 0x2080000) 3 entries of 3200 used
[    0.000000] Adding active range (2, 0x4080000, 0x6080000) 4 entries of 3200 used
[    0.000000] Adding active range (3, 0x6080000, 0x8080000) 5 entries of 3200 used
[    0.000000] Adding active range (4, 0x8080000, 0xa080000) 6 entries of 3200 used
[    0.000000] Adding active range (5, 0xa080000, 0xc080000) 7 entries of 3200 used
[    0.000000] Adding active range (6, 0xc080000, 0xe080000) 8 entries of 3200 used
[    0.000000] Adding active range (7, 0xe080000, 0x10080000) 9 entries of 3200 used
[    0.000000] SRAT: PXMs only cover 917504MB of your 1048566MB e820 RAM. Not used.
[    0.000000] SRAT: SRAT not used.

the early_node_map is not sorted because node0 with non zero start come first.

so try to sort it right away after all regions are registered.

also fixs refression by 8716273c (x86: Export srat physical topology)

-v2: make it more solid to handle cross node case like node0 [0,4g), [8,12g) and node1 [4g, 8g), [12g, 16g)
-v3: update comments.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Jens Axboe <jens.axboe@oracle.com>

---
 arch/x86/mm/srat_32.c |    2 ++
 arch/x86/mm/srat_64.c |    4 +++-
 include/linux/mm.h    |    3 +++
 mm/page_alloc.c       |    4 ++--
 4 files changed, 10 insertions(+), 3 deletions(-)

Index: linux-2.6/arch/x86/mm/srat_32.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/srat_32.c
+++ linux-2.6/arch/x86/mm/srat_32.c
@@ -267,6 +267,8 @@ int __init get_memcfg_from_srat(void)
 		e820_register_active_regions(chunk->nid, chunk->start_pfn,
 					     min(chunk->end_pfn, max_pfn));
 	}
+	/* for out of order entries in SRAT */
+	sort_node_map();
 
 	for_each_online_node(nid) {
 		unsigned long start = node_start_pfn[nid];
Index: linux-2.6/arch/x86/mm/srat_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/srat_64.c
+++ linux-2.6/arch/x86/mm/srat_64.c
@@ -317,7 +317,7 @@ static int __init nodes_cover_memory(con
 		unsigned long s = nodes[i].start >> PAGE_SHIFT;
 		unsigned long e = nodes[i].end >> PAGE_SHIFT;
 		pxmram += e - s;
-		pxmram -= absent_pages_in_range(s, e);
+		pxmram -= __absent_pages_in_range(i, s, e);
 		if ((long)pxmram < 0)
 			pxmram = 0;
 	}
@@ -373,6 +373,8 @@ int __init acpi_scan_nodes(unsigned long
 	for_each_node_mask(i, nodes_parsed)
 		e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
 						nodes[i].end >> PAGE_SHIFT);
+	/* for out of order entries in SRAT */
+	sort_node_map();
 	if (!nodes_cover_memory(nodes)) {
 		bad_srat();
 		return -1;
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -1037,6 +1037,9 @@ extern void add_active_range(unsigned in
 extern void remove_active_range(unsigned int nid, unsigned long start_pfn,
 					unsigned long end_pfn);
 extern void remove_all_active_ranges(void);
+void sort_node_map(void);
+unsigned long __absent_pages_in_range(int nid, unsigned long start_pfn,
+						unsigned long end_pfn);
 extern unsigned long absent_pages_in_range(unsigned long start_pfn,
 						unsigned long end_pfn);
 extern void get_pfn_range_for_nid(unsigned int nid,
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -3569,7 +3569,7 @@ static unsigned long __meminit zone_span
  * Return the number of holes in a range on a node. If nid is MAX_NUMNODES,
  * then all holes in the requested range will be accounted for.
  */
-static unsigned long __meminit __absent_pages_in_range(int nid,
+unsigned long __meminit __absent_pages_in_range(int nid,
 				unsigned long range_start_pfn,
 				unsigned long range_end_pfn)
 {
@@ -4098,7 +4098,7 @@ static int __init cmp_node_active_region
 }
 
 /* sort the node_map by start_pfn */
-static void __init sort_node_map(void)
+void __init sort_node_map(void)
 {
 	sort(early_node_map, (size_t)nr_nodemap_entries,
 			sizeof(struct node_active_region),

  reply	other threads:[~2009-12-15 22:26 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-15 11:50 kexec boot regression Jens Axboe
2009-12-15 12:01 ` Yinghai Lu
2009-12-15 12:14   ` Jens Axboe
2009-12-15 12:31     ` Yinghai Lu
2009-12-15 12:39       ` Jens Axboe
2009-12-15 12:55         ` Yinghai Lu
2009-12-15 14:11           ` Jens Axboe
2009-12-15 18:39             ` Yinghai Lu
2009-12-15 18:47               ` Matthew Wilcox
2009-12-15 18:54               ` Jens Axboe
2009-12-15 18:59               ` Jens Axboe
2009-12-15 19:04                 ` Yinghai Lu
2009-12-15 19:11                   ` Jens Axboe
2009-12-15 19:17                     ` Yinghai Lu
2009-12-15 19:22                       ` Jens Axboe
2009-12-15 19:28                         ` Jens Axboe
2009-12-15 19:44                     ` Yinghai Lu
2009-12-15 19:48                       ` Jens Axboe
2009-12-15 19:49                         ` Yinghai Lu
2009-12-15 19:57                           ` Jens Axboe
2009-12-15 21:30                   ` Markus Trippelsdorf
2009-12-15 23:02                     ` kexec boot regression radeon/kms (bisected) Markus Trippelsdorf
2009-12-15 19:43               ` kexec boot regression Jens Axboe
2009-12-15 19:48                 ` Yinghai Lu
2009-12-15 19:51                   ` Jens Axboe
2009-12-15 19:56                     ` Yinghai Lu
2009-12-15 20:09                       ` Jens Axboe
2009-12-15 20:14                     ` Yinghai Lu
2009-12-15 20:19                       ` Jens Axboe
2009-12-15 20:21                         ` Yinghai Lu
2009-12-15 20:42                           ` Jens Axboe
2009-12-15 20:55                             ` Jens Axboe
2009-12-15 21:01                               ` Jens Axboe
2009-12-15 21:26                                 ` Yinghai Lu
2009-12-15 21:30                                   ` Jens Axboe
2009-12-15 21:40                                     ` Jens Axboe
2009-12-15 21:43                                       ` Yinghai Lu
2009-12-15 21:47                                         ` Jens Axboe
2009-12-15 21:50                                           ` Yinghai Lu
2009-12-15 21:52                                           ` Jens Axboe
2009-12-15 22:24                                             ` Yinghai Lu [this message]
2009-12-16 10:01                                               ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B280CBB.9090406@kernel.org \
    --to=yinghai@kernel.org \
    --cc=hpa@zytor.com \
    --cc=jbarnes@virtuousgeek.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rdreier@cisco.com \
    --cc=rientjes@google.com \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.