From: Chris Lalancette <clalance@redhat.com>
To: xen-devel@lists.xensource.com
Subject: [PATCH]: Allow Xen to boot/run on large memory (>64G) machines
Date: Wed, 21 Feb 2007 19:38:12 -0500 [thread overview]
Message-ID: <45DCE5F4.6060007@redhat.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 1685 bytes --]
All,
I've been tracking down a problem where dom0 refuses to boot on very large
memory x86_64 machines. Here's what happens:
The hypervisor starts up with 1GB in the DMA zone. Two large allocations come
out of the DMA zone; the frame table (in init_frametable()), and the memory for
dom0 (in construct_dom0()). With a lot of memory in the box, most of the DMA
zone gets allocated during init_frametable; so much so, in fact, that there is
no room to make the allocation in construct_dom0, and the dom0 fails to boot with:
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Not enough RAM for domain 0 allocation.
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
The solution (suggested by Keir), is to make the frametable allocated out of
high memory instead of the DMA zone. The attached patch (against 3.0.3, but the
problem is the same in unstable), does this. I tested this out on a 96GB
machine; without the patch, the machine would reboot as described above; with
the patch, I was able to boot dom0 and create a PV guest with 92GB of memory.
I only compile tested this on ia64, but I don't see anything in it that
should cause problems there.
Note that this is not the end of the story, however. For even larger
machines, it can *still* be the case that the allocation in construct_dom0()
fails; in particular, if the order goes above 17, it will fail in the same way.
One way to fix it would be to just allocate that memory out of the normal zone
for x86_64, as well; however, I'm not sure if this will break anything else.
Any comments?
Signed-off-by: Chris Lalancette <clalance@redhat.com>
[-- Attachment #2: xen-hugemem-3.patch --]
[-- Type: text/x-patch, Size: 2952 bytes --]
diff -urp xen.orig/arch/x86/mm.c xen/arch/x86/mm.c
--- xen.orig/arch/x86/mm.c 2007-02-21 14:45:38.000000000 -0500
+++ xen/arch/x86/mm.c 2007-02-21 16:11:34.000000000 -0500
@@ -179,7 +179,15 @@ void __init init_frametable(void)
for ( i = 0; i < nr_pages; i += page_step )
{
+#ifdef __x86_64__
+ /* for x86_64 we want to allocate the frame table from the top
+ * of memory rather than the bottom; otherwise, on large memory
+ * machines (> 64G), we exhaust DMA memory, and dom0 cannot boot
+ */
+ mfn = alloc_boot_pages_reverse(min(nr_pages - i, page_step), page_step);
+#else
mfn = alloc_boot_pages(min(nr_pages - i, page_step), page_step);
+#endif
if ( mfn == 0 )
panic("Not enough memory for frame table\n");
map_pages_to_xen(
diff -urp xen.orig/common/page_alloc.c xen/common/page_alloc.c
--- xen.orig/common/page_alloc.c 2006-10-16 16:07:17.000000000 -0400
+++ xen/common/page_alloc.c 2007-02-21 16:09:38.000000000 -0500
@@ -213,26 +213,44 @@ void init_boot_pages(paddr_t ps, paddr_t
}
}
-unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align)
+static unsigned long check_and_map_page(unsigned long pg, unsigned long nr_pfns)
{
- unsigned long pg, i;
+ unsigned long i;
- for ( pg = 0; (pg + nr_pfns) < max_page; pg += pfn_align )
+ for ( i = 0; i < nr_pfns; i++ )
+ if ( allocated_in_map(pg + i) )
+ break;
+
+ if ( i == nr_pfns )
{
- for ( i = 0; i < nr_pfns; i++ )
- if ( allocated_in_map(pg + i) )
- break;
+ map_alloc(pg, nr_pfns);
+ return pg;
+ }
- if ( i == nr_pfns )
- {
- map_alloc(pg, nr_pfns);
+ return 0;
+}
+
+unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align)
+{
+ unsigned long pg;
+
+ for ( pg = 0; (pg + nr_pfns) < max_page; pg += pfn_align )
+ if (check_and_map_page(pg, nr_pfns))
return pg;
- }
- }
return 0;
}
+unsigned long alloc_boot_pages_reverse(unsigned long nr_pfns, unsigned long pfn_align)
+{
+ unsigned long pg;
+
+ for ( pg = (max_page - nr_pfns); pg > 0; pg -= pfn_align )
+ if (check_and_map_page(pg, nr_pfns))
+ return pg;
+
+ return 0;
+}
/*************************
diff -urp xen.orig/include/xen/mm.h xen/include/xen/mm.h
--- xen.orig/include/xen/mm.h 2006-10-16 16:07:18.000000000 -0400
+++ xen/include/xen/mm.h 2007-02-21 15:57:46.000000000 -0500
@@ -39,6 +39,7 @@ struct page_info;
/* Boot-time allocator. Turns into generic allocator after bootstrap. */
paddr_t init_boot_allocator(paddr_t bitmap_start);
void init_boot_pages(paddr_t ps, paddr_t pe);
+unsigned long alloc_boot_pages_reverse(unsigned long nr_pfns, unsigned long pfn_align);
unsigned long alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align);
void end_boot_allocator(void);
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
next reply other threads:[~2007-02-22 0:38 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-02-22 0:38 Chris Lalancette [this message]
2007-02-22 7:50 ` [PATCH]: Allow Xen to boot/run on large memory (>64G) machines Keir Fraser
2007-02-22 10:33 ` Jan Beulich
2007-02-22 10:40 ` Keir Fraser
2007-02-22 14:57 ` Chris Lalancette
2007-02-22 15:11 ` Keir Fraser
-- strict thread matches above, loose matches on Subject: below --
2007-02-22 10:39 Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45DCE5F4.6060007@redhat.com \
--to=clalance@redhat.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.