From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Andi Kleen <ak@muc.de>
Cc: Chris Wright <chrisw@sous-sol.org>,
virtualization@lists.osdl.org, xen-devel@lists.xensource.com,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org,
William Lee Irwin III <wli@holomorphy.com>
Subject: [patch 05/26] Xen-paravirt_ops: paravirt_ops: hooks to set up initial pagetable
Date: Tue, 27 Feb 2007 00:13:42 -0800 [thread overview]
Message-ID: <20070227081630.650554241@goop.org> (raw)
In-Reply-To: 20070227081337.434798469@goop.org
[-- Attachment #1: paravirt-memory-init.patch --]
[-- Type: text/plain, Size: 12349 bytes --]
This patch introduces paravirt_ops hooks to control how the kernel's
initial pagetable is set up.
In the case of a native boot, the very early bootstrap code creates a
simple non-PAE pagetable to map the kernel and physical memory. When
the VM subsystem is initialized, it creates a proper pagetable which
respects the PAE mode, large pages, etc.
When booting under a hypervisor, there are many possibilities for what
paging environment the hypervisor establishes for the guest kernel, so
the constructon of the kernel's pagetable depends on the hypervisor.
In the case of Xen, the hypervisor boots the kernel with a fully
constructed pagetable, which is already using PAE if necessary. Also,
Xen requires particular care when constructing pagetables to make sure
all pagetables are always mapped read-only.
In order to make this easier, kernel's initial pagetable construction
has been changed to only allocate and initialize a pagetable page if
there's no page already present in the pagetable. This allows the Xen
paravirt backend to make a copy of the hypervisor-provided pagetable,
allowing the kernel to establish any more mappings it needs while
keeping the existing ones.
A slightly subtle point which is worth highlighting here is that Xen
requires all kernel mappings to share the same pte_t pages between all
pagetables, so that updating a kernel page's mapping in one pagetable
is reflected in all other pagetables. This makes it possible to
allocate a page and attach it to a pagetable without having to
explicitly enumerate that page's mapping in all pagetables.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
---
arch/i386/kernel/paravirt.c | 40 +++++++++++++++++++
arch/i386/mm/init.c | 87 +++++++++++++++++++++++++------------------
include/asm-i386/paravirt.h | 54 ++++++++++++++++++++++++++
include/asm-i386/pgtable.h | 3 +
4 files changed, 148 insertions(+), 36 deletions(-)
===================================================================
--- a/arch/i386/kernel/paravirt.c
+++ b/arch/i386/kernel/paravirt.c
@@ -379,6 +379,43 @@ static void native_io_delay(void)
{
asm volatile("outb %al,$0x80");
}
+
+void native_pagetable_setup_start(pgd_t *base)
+{
+#ifdef CONFIG_X86_PAE
+ int i;
+
+ /*
+ * Init entries of the first-level page table to the
+ * zero page, if they haven't already been set up.
+ *
+ * In a normal native boot, we'll be running on a
+ * pagetable rooted in swapper_pg_dir, but not in PAE
+ * mode, so this will end up clobbering the mappings
+ * for the lower 24Mbytes of the address space,
+ * without affecting the kernel address space.
+ */
+ for (i = 0; i < USER_PTRS_PER_PGD; i++)
+ set_pgd(&base[i],
+ __pgd(__pa(empty_zero_page) | _PAGE_PRESENT));
+ memset(&base[USER_PTRS_PER_PGD], 0, sizeof(pgd_t));
+#endif
+}
+
+void native_pagetable_setup_done(pgd_t *base)
+{
+#ifdef CONFIG_X86_PAE
+ /*
+ * Add low memory identity-mappings - SMP needs it when
+ * starting up on an AP from real-mode. In the non-PAE
+ * case we already have these mappings through head.S.
+ * All user-space mappings are explicitly cleared after
+ * SMP startup.
+ */
+ set_pgd(&base[0], base[USER_PTRS_PER_PGD]);
+#endif
+}
+
static void native_flush_tlb(void)
{
@@ -627,6 +664,9 @@ struct paravirt_ops paravirt_ops = {
#endif
.set_lazy_mode = (void *)native_nop,
+ .pagetable_setup_start = native_pagetable_setup_start,
+ .pagetable_setup_done = native_pagetable_setup_done,
+
.flush_tlb_user = native_flush_tlb,
.flush_tlb_kernel = native_flush_tlb_global,
.flush_tlb_single = native_flush_tlb_single,
===================================================================
--- a/arch/i386/mm/init.c
+++ b/arch/i386/mm/init.c
@@ -42,6 +42,7 @@
#include <asm/tlb.h>
#include <asm/tlbflush.h>
#include <asm/sections.h>
+#include <asm/paravirt.h>
unsigned int __VMALLOC_RESERVE = 128 << 20;
@@ -62,6 +63,7 @@ static pmd_t * __init one_md_table_init(
#ifdef CONFIG_X86_PAE
pmd_table = (pmd_t *) alloc_bootmem_low_pages(PAGE_SIZE);
+
paravirt_alloc_pd(__pa(pmd_table) >> PAGE_SHIFT);
set_pgd(pgd, __pgd(__pa(pmd_table) | _PAGE_PRESENT));
pud = pud_offset(pgd, 0);
@@ -83,12 +85,10 @@ static pte_t * __init one_page_table_ini
{
if (pmd_none(*pmd)) {
pte_t *page_table = (pte_t *) alloc_bootmem_low_pages(PAGE_SIZE);
+
paravirt_alloc_pt(__pa(page_table) >> PAGE_SHIFT);
set_pmd(pmd, __pmd(__pa(page_table) | _PAGE_TABLE));
- if (page_table != pte_offset_kernel(pmd, 0))
- BUG();
-
- return page_table;
+ BUG_ON(page_table != pte_offset_kernel(pmd, 0));
}
return pte_offset_kernel(pmd, 0);
@@ -119,7 +119,7 @@ static void __init page_table_range_init
pgd = pgd_base + pgd_idx;
for ( ; (pgd_idx < PTRS_PER_PGD) && (vaddr != end); pgd++, pgd_idx++) {
- if (pgd_none(*pgd))
+ if (!(pgd_val(*pgd) & _PAGE_PRESENT))
one_md_table_init(pgd);
pud = pud_offset(pgd, vaddr);
pmd = pmd_offset(pud, vaddr);
@@ -158,7 +158,11 @@ static void __init kernel_physical_mappi
pfn = 0;
for (; pgd_idx < PTRS_PER_PGD; pgd++, pgd_idx++) {
- pmd = one_md_table_init(pgd);
+ if (!(pgd_val(*pgd) & _PAGE_PRESENT))
+ pmd = one_md_table_init(pgd);
+ else
+ pmd = pmd_offset(pud_offset(pgd, PAGE_OFFSET), PAGE_OFFSET);
+
if (pfn >= max_low_pfn)
continue;
for (pmd_idx = 0; pmd_idx < PTRS_PER_PMD && pfn < max_low_pfn; pmd++, pmd_idx++) {
@@ -167,20 +171,26 @@ static void __init kernel_physical_mappi
/* Map with big pages if possible, otherwise create normal page tables. */
if (cpu_has_pse) {
unsigned int address2 = (pfn + PTRS_PER_PTE - 1) * PAGE_SIZE + PAGE_OFFSET + PAGE_SIZE-1;
-
- if (is_kernel_text(address) || is_kernel_text(address2))
- set_pmd(pmd, pfn_pmd(pfn, PAGE_KERNEL_LARGE_EXEC));
- else
- set_pmd(pmd, pfn_pmd(pfn, PAGE_KERNEL_LARGE));
+ if (!pmd_present(*pmd)) {
+ if (is_kernel_text(address) || is_kernel_text(address2))
+ set_pmd(pmd, pfn_pmd(pfn, PAGE_KERNEL_LARGE_EXEC));
+ else
+ set_pmd(pmd, pfn_pmd(pfn, PAGE_KERNEL_LARGE));
+ }
pfn += PTRS_PER_PTE;
} else {
pte = one_page_table_init(pmd);
- for (pte_ofs = 0; pte_ofs < PTRS_PER_PTE && pfn < max_low_pfn; pte++, pfn++, pte_ofs++) {
- if (is_kernel_text(address))
- set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
- else
- set_pte(pte, pfn_pte(pfn, PAGE_KERNEL));
+ for (pte_ofs = 0;
+ pte_ofs < PTRS_PER_PTE && pfn < max_low_pfn;
+ pte++, pfn++, pte_ofs++, address += PAGE_SIZE) {
+ if (pte_present(*pte))
+ continue;
+
+ if (is_kernel_text(address))
+ set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
+ else
+ set_pte(pte, pfn_pte(pfn, PAGE_KERNEL));
}
}
}
@@ -337,19 +347,32 @@ extern void __init remap_numa_kva(void);
#define remap_numa_kva() do {} while (0)
#endif
+/*
+ * Build a proper pagetable for the kernel mappings. Up until this
+ * point, we've been running on some set of pagetables constructed by
+ * the boot process.
+ *
+ * If we're booting on native hardware, this will be a pagetable
+ * constructed in arch/i386/kernel/head.S, and not running in PAE mode
+ * (even if we'll end up running in PAE). The root of the pagetable
+ * will be swapper_pg_dir.
+ *
+ * If we're booting paravirtualized under a hypervisor, then there are
+ * more options: we may already be running PAE, and the pagetable may
+ * or may not be based in swapper_pg_dir. In any case,
+ * paravirt_pagetable_setup_start() will set up swapper_pg_dir
+ * appropriately for the rest of the initialization to work.
+ *
+ * In general, pagetable_init() assumes that the pagetable may already
+ * be partially populated, and so it avoids stomping on any existing
+ * mappings.
+ */
static void __init pagetable_init (void)
{
- unsigned long vaddr;
+ unsigned long vaddr, end;
pgd_t *pgd_base = swapper_pg_dir;
-#ifdef CONFIG_X86_PAE
- int i;
- /* Init entries of the first-level page table to the zero page */
- for (i = 0; i < PTRS_PER_PGD; i++)
- set_pgd(pgd_base + i, __pgd(__pa(empty_zero_page) | _PAGE_PRESENT));
-#else
- paravirt_alloc_pd(__pa(swapper_pg_dir) >> PAGE_SHIFT);
-#endif
+ paravirt_pagetable_setup_start(pgd_base);
/* Enable PSE if available */
if (cpu_has_pse) {
@@ -371,20 +394,12 @@ static void __init pagetable_init (void)
* created - mappings will be set by set_fixmap():
*/
vaddr = __fix_to_virt(__end_of_fixed_addresses - 1) & PMD_MASK;
- page_table_range_init(vaddr, 0, pgd_base);
+ end = (FIXADDR_TOP + PMD_SIZE - 1) & PMD_MASK;
+ page_table_range_init(vaddr, end, pgd_base);
permanent_kmaps_init(pgd_base);
-#ifdef CONFIG_X86_PAE
- /*
- * Add low memory identity-mappings - SMP needs it when
- * starting up on an AP from real-mode. In the non-PAE
- * case we already have these mappings through head.S.
- * All user-space mappings are explicitly cleared after
- * SMP startup.
- */
- set_pgd(&pgd_base[0], pgd_base[USER_PTRS_PER_PGD]);
-#endif
+ paravirt_pagetable_setup_done(pgd_base);
}
#if defined(CONFIG_SOFTWARE_SUSPEND) || defined(CONFIG_ACPI_SLEEP)
===================================================================
--- a/include/asm-i386/paravirt.h
+++ b/include/asm-i386/paravirt.h
@@ -49,6 +49,9 @@ struct paravirt_ops
void (*arch_setup)(void);
char *(*memory_setup)(void);
void (*init_IRQ)(void);
+
+ void (*pagetable_setup_start)(pgd_t *pgd_base);
+ void (*pagetable_setup_done)(pgd_t *pgd_base);
void (*banner)(void);
@@ -185,6 +188,8 @@ struct paravirt_ops
extern struct paravirt_ops paravirt_ops;
+void native_pagetable_setup_start(pgd_t *pgd);
+
#ifdef CONFIG_X86_PAE
unsigned long long native_pte_val(pte_t);
unsigned long long native_pmd_val(pmd_t);
@@ -389,6 +394,17 @@ static inline void setup_secondary_clock
}
#endif
+static inline void paravirt_pagetable_setup_start(pgd_t *base)
+{
+ if (paravirt_ops.pagetable_setup_start)
+ (*paravirt_ops.pagetable_setup_start)(base);
+}
+
+static inline void paravirt_pagetable_setup_done(pgd_t *base)
+{
+ if (paravirt_ops.pagetable_setup_done)
+ (*paravirt_ops.pagetable_setup_done)(base);
+}
void native_set_pte(pte_t *ptep, pte_t pteval);
void native_set_pte_at(struct mm_struct *mm, u32 addr, pte_t *ptep, pte_t pteval);
@@ -615,5 +631,43 @@ 772:; \
call *paravirt_ops+PARAVIRT_read_cr0
#endif /* __ASSEMBLY__ */
+#else /* !CONFIG_PARAVIRT */
+#include <asm/pgtable.h>
+
+static inline void paravirt_pagetable_setup_start(pgd_t *base)
+{
+#ifdef CONFIG_X86_PAE
+ int i;
+
+ /*
+ * Init entries of the first-level page table to the
+ * zero page, if they haven't already been set up.
+ *
+ * In a normal native boot, we'll be running on a
+ * pagetable rooted in swapper_pg_dir, but not in PAE
+ * mode, so this will end up clobbering the mappings
+ * for the lower 24Mbytes of the address space,
+ * without affecting the kernel address space.
+ */
+ for (i = 0; i < USER_PTRS_PER_PGD; i++)
+ set_pgd(&base[i],
+ __pgd(__pa(empty_zero_page) | _PAGE_PRESENT));
+ memset(&base[USER_PTRS_PER_PGD], 0, sizeof(pgd_t));
+#endif
+}
+
+static inline void paravirt_pagetable_setup_done(pgd_t *base)
+{
+#ifdef CONFIG_X86_PAE
+ /*
+ * Add low memory identity-mappings - SMP needs it when
+ * starting up on an AP from real-mode. In the non-PAE
+ * case we already have these mappings through head.S.
+ * All user-space mappings are explicitly cleared after
+ * SMP startup.
+ */
+ set_pgd(&base[0], base[USER_PTRS_PER_PGD]);
+#endif
+}
#endif /* CONFIG_PARAVIRT */
#endif /* __ASM_PARAVIRT_H */
===================================================================
--- a/include/asm-i386/pgtable.h
+++ b/include/asm-i386/pgtable.h
@@ -15,7 +15,10 @@
#include <asm/processor.h>
#include <asm/fixmap.h>
#include <linux/threads.h>
+
+#ifdef CONFIG_PARAVIRT /* guarded to prevent cyclic dependency */
#include <asm/paravirt.h>
+#endif
#ifndef _I386_BITOPS_H
#include <asm/bitops.h>
--
next prev parent reply other threads:[~2007-02-27 8:13 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-02-27 8:13 [patch 00/26] Xen-paravirt_ops: Xen guest implementation for paravirt_ops interface Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 01/26] Xen-paravirt_ops: Fix typo in sync_constant_test_bit()s name Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 02/26] Xen-paravirt_ops: ignore vgacon if hardware not present Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 03/26] Xen-paravirt_ops: use paravirt_nop to consistently mark no-op operations Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 04/26] Xen-paravirt_ops: Add pagetable accessors to pack and unpack pagetable entries Jeremy Fitzhardinge
2007-02-27 10:08 ` Ingo Molnar
2007-02-27 18:52 ` Jeremy Fitzhardinge
2007-02-28 8:32 ` Ingo Molnar
2007-02-28 20:12 ` Jeremy Fitzhardinge
2007-02-28 21:02 ` Rusty Russell
2007-02-27 8:13 ` Jeremy Fitzhardinge [this message]
2007-02-27 10:09 ` [patch 05/26] Xen-paravirt_ops: paravirt_ops: hooks to set up initial pagetable Ingo Molnar
2007-02-27 18:53 ` Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 06/26] Xen-paravirt_ops: paravirt_ops: allocate a fixmap slot Jeremy Fitzhardinge
2007-02-27 10:11 ` Ingo Molnar
2007-02-27 18:55 ` Jeremy Fitzhardinge
2007-02-28 0:49 ` Jeremy Fitzhardinge
2007-02-28 8:30 ` Ingo Molnar
2007-02-28 20:09 ` Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 07/26] Xen-paravirt_ops: Allow paravirt backend to choose kernel PMD sharing Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 08/26] Xen-paravirt_ops: add hooks to intercept mm creation and destruction Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 09/26] Xen-paravirt_ops: remove HAVE_ARCH_MM_LIFETIME, define no-op architecture implementations Jeremy Fitzhardinge
2007-02-27 15:15 ` James Bottomley
2007-02-27 17:21 ` Jeremy Fitzhardinge
2007-02-27 17:35 ` James Bottomley
2007-02-27 17:56 ` Jeremy Fitzhardinge
2007-02-27 18:04 ` James Bottomley
2007-02-27 18:48 ` Jeremy Fitzhardinge
2007-02-27 18:53 ` James Bottomley
2007-02-27 8:13 ` [patch 10/26] Xen-paravirt_ops: rename struct paravirt_patch to paravirt_patch_site for clarity Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 11/26] Xen-paravirt_ops: Use patch site IDs computed from offset in paravirt_ops structure Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 12/26] Xen-paravirt_ops: Fix patch site clobbers to include return register Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 14/26] Xen-paravirt_ops: add common patching machinery Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 15/26] Xen-paravirt_ops: Add apply_to_page_range() which applies a function to a pte range Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 16/26] Xen-paravirt_ops: Allocate and free vmalloc areas Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 17/26] Xen-paravirt_ops: Add nosegneg capability to the vsyscall page notes Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 18/26] Xen-paravirt_ops: Add XEN config options Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 19/26] Xen-paravirt_ops: Add Xen interface header files Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 20/26] Xen-paravirt_ops: Core Xen implementation Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 21/26] Xen-paravirt_ops: Use the hvc console infrastructure for Xen console Jeremy Fitzhardinge
2007-02-27 8:13 ` [patch 22/26] Xen-paravirt_ops: Add early printk support via hvc console Jeremy Fitzhardinge
2007-02-27 8:14 ` [patch 23/26] Xen-paravirt_ops: Add Xen grant table support Jeremy Fitzhardinge
2007-02-27 8:14 ` [patch 24/26] Xen-paravirt_ops: Add the Xenbus sysfs and virtual device hotplug driver Jeremy Fitzhardinge
2007-02-27 8:14 ` [patch 25/26] Xen-paravirt_ops: Add Xen virtual block device driver Jeremy Fitzhardinge
2007-02-27 8:14 ` [patch 26/26] Xen-paravirt_ops: Add the Xen virtual network " Jeremy Fitzhardinge
-- strict thread matches above, loose matches on Subject: below --
2007-03-01 23:24 [patch 00/26] Xen-paravirt_ops: Xen guest implementation for paravirt_ops interface Jeremy Fitzhardinge
2007-03-01 23:24 ` [patch 05/26] Xen-paravirt_ops: paravirt_ops: hooks to set up initial pagetable Jeremy Fitzhardinge
2007-03-16 9:33 ` Ingo Molnar
2007-03-16 18:39 ` Jeremy Fitzhardinge
2007-03-16 19:35 ` Steven Rostedt
2007-03-17 9:47 ` Rusty Russell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070227081630.650554241@goop.org \
--to=jeremy@goop.org \
--cc=ak@muc.de \
--cc=akpm@linux-foundation.org \
--cc=chrisw@sous-sol.org \
--cc=linux-kernel@vger.kernel.org \
--cc=virtualization@lists.osdl.org \
--cc=wli@holomorphy.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).