From: Yinghai Lu <yinghai@kernel.org>
To: Jiri Slaby <jirislaby@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Greg Thelen <gthelen@google.com>,
linux-mm@kvack.org,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch
Date: Fri, 05 Mar 2010 12:27:14 -0800 [thread overview]
Message-ID: <4B916922.3010807@kernel.org> (raw)
In-Reply-To: <4B90DC3C.1060000@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2163 bytes --]
On 03/05/2010 02:26 AM, Jiri Slaby wrote:
> On 03/05/2010 10:04 AM, Yinghai Lu wrote:
>> according to context
>> http://patchwork.kernel.org/patch/73893/
>>
>> Jiri,
>> please check current linus tree still have problem about mem_map is using that much low mem?
>
> Hi!
>
> Sorry, I don't have direct access to the machine. I might try to ask the
> owners to do so.
>
>> on my 1024g system first node has 128G ram, [2g, 4g) are mmio range.
>
> So where gets your mem_map allocated (I suppose you're running flat model)?
>
> Note that the failure we were seeing was with different amount of memory
> on different machines. Obviously because of different e820 reservations
> and driver requirements at boot time. So the required memory to trigger
> the error oscillated around 128G, sometimes being 130G.
>
> It triggered when mem_map fit exactly into 0-2G (and 2-4G was reserved)
> and no more space was there. If RAM was more than 130G, mem_map was
> above 4G boundary implicitly, so that there was enough space in the
> first 4G of memory for others with specific bootmem limitations.
>
>> with NO_BOOTMEM
>> [ 0.000000] a - 11
>> [ 0.000000] 19 40 - 80 95
>> [ 0.000000] 702 740 - 1000 1000
>> [ 0.000000] 331f 3340 - 3400 3400
>> [ 0.000000] 35dd - 3600
>> [ 0.000000] 37dd - 3800
>> [ 0.000000] 39dd - 3a00
>> [ 0.000000] 3bdd - 3c00
>> [ 0.000000] 3ddd - 3e00
>> [ 0.000000] 3fdd - 4000
>> [ 0.000000] 41dd - 4200
>> [ 0.000000] 43dd - 4400
>> [ 0.000000] 45dd - 4600
>> [ 0.000000] 47dd - 4800
>> [ 0.000000] 49dd - 4a00
>> [ 0.000000] 4bdd - 4c00
>> [ 0.000000] 4ddd - 4e00
>> [ 0.000000] 4fdd - 5000
>> [ 0.000000] 51dd - 5200
>> [ 0.000000] 93dd 9400 - 7d500 7d53b
>> [ 0.000000] 7f730 - 7f750
>> [ 0.000000] 100012 100040 - 100200 100200
>> [ 0.000000] 170200 170200 - 2080000 2080000
>> [ 0.000000] 2080065 2080080 - 2080200 2080200
>>
>> so PFN: 9400 - 7d500 are free.
>
> Could you explain more the dmesg output?
it will list free pfn range that will be use for slab...
attached is debug patch for print out without CONFIG_NO_BOOTMEM set.
YH
[-- Attachment #2: print_free_bootmem.patch --]
[-- Type: text/x-patch, Size: 3981 bytes --]
Subject: [PATCH -v3] x86: print bootmem free before and free_all_bootmem
so we could double check if we have enough low pages later
-v2: fix errors checkpatch.pl reported
-v3: move after pci_iommu_alloc, so could compare it with NO_BOOTMEM
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
arch/x86/mm/init_64.c | 2 +
include/linux/bootmem.h | 3 +
mm/bootmem.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 96 insertions(+)
Index: linux-2.6/mm/bootmem.c
===================================================================
--- linux-2.6.orig/mm/bootmem.c
+++ linux-2.6/mm/bootmem.c
@@ -335,6 +335,97 @@ static void __init __free(bootmem_data_t
BUG();
}
+static void __init print_all_bootmem_free_core(bootmem_data_t *bdata)
+{
+ int aligned;
+ unsigned long *map;
+ unsigned long start, end, count = 0;
+ unsigned long free_start = -1UL, free_end = 0;
+
+ if (!bdata->node_bootmem_map)
+ return;
+
+ start = bdata->node_min_pfn;
+ end = bdata->node_low_pfn;
+
+ /*
+ * If the start is aligned to the machines wordsize, we might
+ * be able to count it in bulks of that order.
+ */
+ aligned = !(start & (BITS_PER_LONG - 1));
+
+ printk(KERN_DEBUG "nid=%td start=0x%010lx end=0x%010lx aligned=%d\n",
+ bdata - bootmem_node_data, start, end, aligned);
+ map = bdata->node_bootmem_map;
+
+ while (start < end) {
+ unsigned long idx, vec;
+
+ idx = start - bdata->node_min_pfn;
+ vec = ~map[idx / BITS_PER_LONG];
+
+ if (aligned && vec == ~0UL && start + BITS_PER_LONG < end) {
+ if (free_start == -1UL) {
+ free_start = idx;
+ free_end = free_start + BITS_PER_LONG;
+ } else {
+ if (free_end == idx) {
+ free_end += BITS_PER_LONG;
+ } else {
+ /* there is gap, print old */
+ printk(KERN_DEBUG " free [0x%010lx - 0x%010lx]\n",
+ free_start + bdata->node_min_pfn,
+ free_end + bdata->node_min_pfn);
+ free_start = idx;
+ free_end = idx + BITS_PER_LONG;
+ }
+ }
+ count += BITS_PER_LONG;
+ } else {
+ unsigned long off = 0;
+
+ while (vec && off < BITS_PER_LONG) {
+ if (vec & 1) {
+ if (free_start == -1UL) {
+ free_start = idx + off;
+ free_end = free_start + 1;
+ } else {
+ if (free_end == (idx + off)) {
+ free_end++;
+ } else {
+ /* there is gap, print old */
+ printk(KERN_DEBUG " free [0x%010lx - 0x%010lx]\n",
+ free_start + bdata->node_min_pfn,
+ free_end + bdata->node_min_pfn);
+ free_start = idx + off;
+ free_end = free_start + 1;
+ }
+ }
+ count++;
+ }
+ vec >>= 1;
+ off++;
+ }
+ }
+ start += BITS_PER_LONG;
+ }
+
+ /* last one */
+ if (free_start != -1UL)
+ printk(KERN_DEBUG " free [0x%010lx - 0x%010lx]\n",
+ free_start + bdata->node_min_pfn,
+ free_end + bdata->node_min_pfn);
+ printk(KERN_DEBUG " total free 0x%010lx\n", count);
+}
+
+void __init print_bootmem_free(void)
+{
+ bootmem_data_t *bdata;
+
+ list_for_each_entry(bdata, &bdata_list, list)
+ print_all_bootmem_free_core(bdata);
+}
+
static int __init __reserve(bootmem_data_t *bdata, unsigned long sidx,
unsigned long eidx, int flags)
{
Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -679,6 +679,8 @@ void __init mem_init(void)
pci_iommu_alloc();
+ print_bootmem_free();
+
/* clear_bss() already clear the empty_zero_page */
reservedpages = 0;
Index: linux-2.6/include/linux/bootmem.h
===================================================================
--- linux-2.6.orig/include/linux/bootmem.h
+++ linux-2.6/include/linux/bootmem.h
@@ -38,6 +38,9 @@ typedef struct bootmem_data {
} bootmem_data_t;
extern bootmem_data_t bootmem_node_data[];
+void print_bootmem_free(void);
+#else
+static inline void print_bootmem_free(void) {}
#endif
extern unsigned long bootmem_bootmap_pages(unsigned long);
next prev parent reply other threads:[~2010-03-05 20:28 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-04 21:21 mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Greg Thelen
2010-03-05 3:21 ` Johannes Weiner
2010-03-05 5:00 ` Yinghai Lu
2010-03-05 5:14 ` Yinghai Lu
2010-03-05 12:51 ` Johannes Weiner
2010-03-05 16:38 ` Yinghai
2010-03-05 5:17 ` Greg Thelen
2010-03-05 5:34 ` Greg Thelen
2010-03-05 18:41 ` Yinghai Lu
2010-03-05 18:41 ` Yinghai Lu
2010-03-05 19:09 ` Greg Thelen
2010-03-05 19:09 ` Greg Thelen
2010-03-05 20:38 ` [PATCH] x86/bootmem: introduce bootmem_default_goal Yinghai Lu
2010-03-05 20:38 ` Yinghai Lu
2010-03-06 5:44 ` please don't apply : bootmem: avoid DMA32 zone by default Yinghai Lu
2010-03-06 5:44 ` Yinghai Lu
2010-03-07 0:22 ` Andrew Morton
2010-03-07 0:22 ` Andrew Morton
2010-03-07 0:42 ` Yinghai Lu
2010-03-07 0:42 ` Yinghai Lu
2010-03-07 0:53 ` Yinghai Lu
2010-03-07 0:53 ` Yinghai Lu
2010-03-07 2:15 ` [PATCH] sparsemem: on no vmemmap path put mem_map on node high too Yinghai Lu
2010-03-07 1:03 ` please don't apply : bootmem: avoid DMA32 zone by default Paul Mackerras
2010-03-07 1:03 ` Paul Mackerras
2010-03-07 1:48 ` Stephen Rothwell
2010-03-07 9:16 ` Russell King
2010-03-07 9:16 ` Russell King
2010-03-05 23:58 ` mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch Johannes Weiner
2010-03-05 23:58 ` Johannes Weiner
2010-03-06 1:50 ` Yinghai Lu
2010-03-06 1:50 ` Yinghai Lu
2010-03-06 2:24 ` Johannes Weiner
2010-03-06 2:24 ` Johannes Weiner
2010-03-06 2:31 ` Yinghai Lu
2010-03-06 2:31 ` Yinghai Lu
2010-03-05 9:04 ` Yinghai Lu
2010-03-05 9:04 ` Yinghai Lu
2010-03-05 10:26 ` Jiri Slaby
2010-03-05 10:26 ` Jiri Slaby
2010-03-05 20:27 ` Yinghai Lu [this message]
2010-03-07 1:17 ` Yinghai Lu
2010-03-11 10:54 ` Jiri Slaby
2010-03-11 20:12 ` Yinghai Lu
2010-03-11 21:40 ` Jiri Slaby
2010-03-11 21:42 ` Yinghai Lu
2010-03-05 13:08 ` Johannes Weiner
2010-03-05 13:08 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B916922.3010807@kernel.org \
--to=yinghai@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=jirislaby@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.