From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: linux-kernel@vger.kernel.org, xen-devel@lists.xensource.com
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: [PATCH 5/7] xen/setup: Transfer MFNs from non-RAM E820 entries and gaps to E820 RAM
Date: Fri, 30 Mar 2012 16:37:28 -0400 [thread overview]
Message-ID: <1333139850-28456-6-git-send-email-konrad.wilk@oracle.com> (raw)
In-Reply-To: <1333139850-28456-1-git-send-email-konrad.wilk@oracle.com>
When the Xen hypervisor boots a PV kernel it hands it two pieces
of information: nr_pages and a made up E820 entry.
The nr_pages value defines the range from zero to nr_pages of PFNs
which have a valid Machine Frame Number (MFN) underneath it. The
E820 mirrors that (with the VGA hole):
BIOS-provided physical RAM map:
Xen: 0000000000000000 - 00000000000a0000 (usable)
Xen: 00000000000a0000 - 0000000000100000 (reserved)
Xen: 0000000000100000 - 0000000080800000 (usable)
The fun comes when a PV guest that is run with a system E820 - that
can either be the initial domain or a PCI PV guest, where the E820
looks like the normal thing:
BIOS-provided physical RAM map:
Xen: 0000000000000000 - 000000000009e000 (usable)
Xen: 000000000009ec00 - 0000000000100000 (reserved)
Xen: 0000000000100000 - 0000000020000000 (usable)
Xen: 0000000020000000 - 0000000020200000 (reserved)
Xen: 0000000020200000 - 0000000040000000 (usable)
Xen: 0000000040000000 - 0000000040200000 (reserved)
Xen: 0000000040200000 - 00000000bad80000 (usable)
Xen: 00000000bad80000 - 00000000badc9000 (ACPI NVS)
..
With that overlaying the nr_pages directly on the E820 does not
work as there are gaps and non-RAM regions that won't be used
by the memory allocator. The 'xen_release_chunk' helps with that
by punching holes in the P2M (PFN to MFN lookup tree) for those
regions and tells us that:
Freeing 20000-20200 pfn range: 512 pages freed
Freeing 40000-40200 pfn range: 512 pages freed
Freeing bad80-badf4 pfn range: 116 pages freed
Freeing badf6-bae7f pfn range: 137 pages freed
Freeing bb000-100000 pfn range: 282624 pages freed
Released 283999 pages of unused memory
Those 283999 pages are subtracted from the nr_pages and are returned
to the hypervisor. The end result is that the initial domain
boots with 1GB less memory as the nr_pages has been subtraced by
the amount of pages residing within the PCI hole. It can balloon up
to that if desired using 'xl mem-set 0 8092', but the balloon driver
is not always compiled in for the initial domain.
The 'xen_exchange_chunk' solves this by transfering the
MFNs that would have been freed to the E820_RAM entries that
are past the nr_pages by using the early_set_phys_to_machine
mechanism that allows the P2M tree to allocate new leafs during
early bootup.
It does that by copying the MFNs to the E820_RAM that has not
been used and setting the old PFNs to INVALID_P2M_ENTRY.
The end result is that the kernel can now boot with the
nr_pages without having to subtract the 283999 pages.
We will now get:
-Released 283999 pages of unused memory
+Exchanged 283999 pages
.. snip..
-Memory: 6487732k/9208688k available (5817k kernel code, 1136060k absent, 1584896k reserved, 2900k data, 692k init)
+Memory: 6503888k/8072692k available (5817k kernel code, 1136060k absent, 432744k reserved, 2900k data, 692k init)
which is more in line with classic XenOLinux.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
arch/x86/xen/setup.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++--
1 files changed, 82 insertions(+), 3 deletions(-)
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 1ba8dff..2a12143 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -120,12 +120,89 @@ static unsigned long __init xen_release_chunk(unsigned long start,
return len;
}
+static unsigned long __init xen_exchange_chunk(unsigned long start_pfn,
+ unsigned long end_pfn, unsigned long nr_pages, unsigned long exchanged,
+ unsigned long *pages_left, const struct e820entry *list,
+ size_t map_size)
+{
+ const struct e820entry *entry;
+ unsigned int i;
+ unsigned long credits = (end_pfn - start_pfn) + *pages_left;
+ unsigned long done = 0;
+
+ for (i = 0, entry = list; i < map_size; i++, entry++) {
+ unsigned long s_pfn;
+ unsigned long e_pfn;
+ unsigned long pfn;
+ unsigned long dest_pfn;
+ long nr;
+
+ if (credits == 0)
+ break;
+
+ if (entry->type != E820_RAM)
+ continue;
+
+ e_pfn = PFN_UP(entry->addr + entry->size);
+
+ /* We only care about E820 _after_ the xen_start_info->nr_pages */
+ if (e_pfn <= nr_pages)
+ continue;
+
+ s_pfn = PFN_DOWN(entry->addr);
+ /* If the E820 falls within the nr_pages, we want to start
+ * at the nr_pages PFN (plus whatever we already had exchanged)
+ * If that would mean going past the E820 entry, skip it
+ */
+ if (s_pfn <= nr_pages) {
+ nr = e_pfn - exchanged - nr_pages;
+ dest_pfn = nr_pages + exchanged;
+ } else {
+ nr = e_pfn - exchanged - s_pfn;
+ dest_pfn = s_pfn + exchanged;
+ }
+ /* If we had filled this E820_RAM entry, go to the next one. */
+ if (nr <= 0)
+ continue;
+
+ pr_debug("[%lx->%lx] (starting at %lx and have space for %ld pages) will move %ld pages from [%lx->%lx]\n",
+ s_pfn, e_pfn, dest_pfn, nr, credits, start_pfn, end_pfn);
+
+ for (pfn = start_pfn; pfn < start_pfn + nr; pfn++) {
+ unsigned long mfn = pfn_to_mfn(pfn);
+
+ if (mfn == INVALID_P2M_ENTRY || mfn_to_pfn(mfn) != pfn)
+ break;
+
+ if (!early_set_phys_to_machine(dest_pfn, mfn))
+ break;
+
+ /* You would think we should do HYPERVISOR_update_va_mapping
+ * but we don't need to as the hypervisor only sets up the
+ * initial pagetables up to nr_pages, and we stick the MFNs
+ * past that.
+ */
+ __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
+ ++dest_pfn;
+ ++done;
+ if (--credits == 0)
+ break;
+ }
+ }
+ if (done)
+ printk(KERN_INFO "Transfered from %lx->%lx range %ld pages\n", start_pfn, end_pfn, done);
+ /* How many left on the next iteration */
+ *pages_left = credits;
+ return done;
+}
static unsigned long __init xen_set_identity_and_release(
const struct e820entry *list, size_t map_size, unsigned long nr_pages)
{
phys_addr_t start = 0;
unsigned long released = 0;
unsigned long identity = 0;
+ unsigned long exchanged = 0;
+ unsigned long credits = 0;
const struct e820entry *entry;
int i;
@@ -151,17 +228,19 @@ static unsigned long __init xen_set_identity_and_release(
end_pfn = PFN_UP(entry->addr);
if (start_pfn < end_pfn) {
- if (start_pfn < nr_pages)
+ exchanged += xen_exchange_chunk(start_pfn, end_pfn, nr_pages,
+ exchanged, &credits, list, map_size);
+ if (start_pfn < nr_pages) {
released += xen_release_chunk(
start_pfn, min(end_pfn, nr_pages));
-
+ }
identity += set_phys_range_identity(
start_pfn, end_pfn);
}
start = end;
}
}
-
+ printk(KERN_INFO "Exchanged %lu pages\n", exchanged);
printk(KERN_INFO "Released %lu pages of unused memory\n", released);
printk(KERN_INFO "Set %ld page(s) to 1-1 mapping\n", identity);
--
1.7.7.5
next prev parent reply other threads:[~2012-03-30 20:37 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-30 20:37 [PATCH] fix /proc/meminfo reporting (v1) Konrad Rzeszutek Wilk
2012-03-30 20:37 ` [PATCH 1/7] xen/p2m: Move code around to allow for better re-usage Konrad Rzeszutek Wilk
2012-03-30 20:37 ` [PATCH 2/7] xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument Konrad Rzeszutek Wilk
2012-03-30 20:37 ` [PATCH 3/7] xen/p2m: Collapse early_alloc_p2m_middle redundant checks Konrad Rzeszutek Wilk
2012-03-30 20:37 ` [PATCH 4/7] xen/p2m: An early bootup variant of set_phys_to_machine Konrad Rzeszutek Wilk
2012-03-30 20:37 ` Konrad Rzeszutek Wilk [this message]
2012-04-03 8:48 ` [Xen-devel] [PATCH 5/7] xen/setup: Transfer MFNs from non-RAM E820 entries and gaps to E820 RAM David Vrabel
2012-04-03 13:13 ` Konrad Rzeszutek Wilk
2012-04-06 21:02 ` Konrad Rzeszutek Wilk
2012-03-30 20:37 ` [PATCH 6/7] xen/setup: Make dom0_mem=XGB behavior be similar to classic Xen kernels Konrad Rzeszutek Wilk
2012-04-03 8:58 ` [Xen-devel] " David Vrabel
2012-04-03 9:46 ` Jan Beulich
2012-04-06 21:01 ` Konrad Rzeszutek Wilk
2012-04-09 16:39 ` Jan Beulich
2012-04-09 21:33 ` Konrad Rzeszutek Wilk
2012-04-06 20:59 ` Konrad Rzeszutek Wilk
2012-04-09 16:56 ` Jan Beulich
2012-04-09 21:49 ` Konrad Rzeszutek Wilk
2012-03-30 20:37 ` [PATCH 7/7] xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 Konrad Rzeszutek Wilk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1333139850-28456-6-git-send-email-konrad.wilk@oracle.com \
--to=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).