xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
@ 2012-07-31 14:43 Konrad Rzeszutek Wilk
  2012-07-31 14:43 ` [PATCH 1/6] xen/mmu: use copy_page instead of memcpy Konrad Rzeszutek Wilk
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-31 14:43 UTC (permalink / raw)
  To: linux-kernel, xen-devel, jbeulich, stefano.stabellini

Changelog:
Since v1: [http://lists.xen.org/archives/html/xen-devel/2012-07/msg01561.html]
 - added more comments, and #ifdefs
 - squashed The L4 and L4, L3, and L2 recycle patches together
 - Added Acked-by's

The explanation of these patches is exactly what v1 had:

The details of this problem are nicely explained in:

 [PATCH 4/6] xen/p2m: Add logic to revector a P2M tree to use __va
 [PATCH 5/6] xen/mmu: Copy and revector the P2M tree.
 [PATCH 6/6] xen/mmu: Remove from __ka space PMD entries for

and the supporting patches are just nice optimizations. Pasting in
what those patches mentioned:


During bootup Xen supplies us with a P2M array. It sticks
it right after the ramdisk, as can be seen with a 128GB PV guest:

(certain parts removed for clarity):
xc_dom_build_image: called
xc_dom_alloc_segment:   kernel       : 0xffffffff81000000 -> 0xffffffff81e43000 
 (pfn 0x1000 + 0xe43 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000
xc_dom_alloc_segment:   ramdisk      : 0xffffffff81e43000 -> 0xffffffff925c7000 
 (pfn 0x1e43 + 0x10784 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000
xc_dom_alloc_segment:   phys2mach    : 0xffffffff925c7000 -> 0xffffffffa25c7000 
 (pfn 0x125c7 + 0x10000 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x10000 at 0x7f0942dd2000
xc_dom_alloc_page   :   start info   : 0xffffffffa25c7000 (pfn 0x225c7)
xc_dom_alloc_page   :   xenstore     : 0xffffffffa25c8000 (pfn 0x225c8)
xc_dom_alloc_page   :   console      : 0xffffffffa25c9000 (pfn 0x225c9)
nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 
0xffffffffffffffff, 1 table(s)
nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 
0xffffffffffffffff, 1 table(s)
nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 
0xffffffffbfffffff, 1 table(s)
nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 
0xffffffffa27fffff, 276 table(s)
xc_dom_alloc_segment:   page tables  : 0xffffffffa25ca000 -> 0xffffffffa26e1000 
 (pfn 0x225ca + 0x117 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000
xc_dom_alloc_page   :   boot stack   : 0xffffffffa26e1000 (pfn 0x226e1)
xc_dom_build_image  : virt_alloc_end : 0xffffffffa26e2000
xc_dom_build_image  : virt_pgtab_end : 0xffffffffa2800000

So the physical memory and virtual (using __START_KERNEL_map addresses)
layout looks as so:

  phys                             __ka
/------------\                   /-------------------\
| 0          | empty             | 0xffffffff80000000|
| ..         |                   | ..                |
| 16MB       | <= kernel starts  | 0xffffffff81000000|
| ..         |                   |                   |
| 30MB       | <= kernel ends => | 0xffffffff81e43000|
| ..         |  & ramdisk starts | ..                |
| 293MB      | <= ramdisk ends=> | 0xffffffff925c7000|
| ..         |  & P2M starts     | ..                |
| ..         |                   | ..                |
| 549MB      | <= P2M ends    => | 0xffffffffa25c7000|
| ..         | start_info        | 0xffffffffa25c7000|
| ..         | xenstore          | 0xffffffffa25c8000|
| ..         | cosole            | 0xffffffffa25c9000|
| 549MB      | <= page tables => | 0xffffffffa25ca000|
| ..         |                   |                   |
| 550MB      | <= PGT end     => | 0xffffffffa26e1000|
| ..         | boot stack        |                   |
\------------/                   \-------------------/

As can be seen, the ramdisk, P2M and pagetables are taking
a bit of __ka addresses space. Which is a problem since the
MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits
right in there! This results during bootup with the inability to
load modules, with this error:

------------[ cut here ]------------
WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 
vmap_page_range_noflush+0x2d9/0x370()
Call Trace:
 [<ffffffff810719fa>] warn_slowpath_common+0x7a/0xb0
 [<ffffffff81030279>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff81071a45>] warn_slowpath_null+0x15/0x20
 [<ffffffff81130b89>] vmap_page_range_noflush+0x2d9/0x370
 [<ffffffff81130c4d>] map_vm_area+0x2d/0x50
 [<ffffffff811326d0>] __vmalloc_node_range+0x160/0x250
 [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
 [<ffffffff810c6186>] ? load_module+0x66/0x19c0
 [<ffffffff8105cadc>] module_alloc+0x5c/0x60
 [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
 [<ffffffff810c5369>] module_alloc_update_bounds+0x19/0x80
 [<ffffffff810c70c3>] load_module+0xfa3/0x19c0
 [<ffffffff812491f6>] ? security_file_permission+0x86/0x90
 [<ffffffff810c7b3a>] sys_init_module+0x5a/0x220
 [<ffffffff815ce339>] system_call_fastpath+0x16/0x1b
---[ end trace fd8f7704fdea0291 ]---
vmalloc: allocation failure, allocated 16384 of 20480 bytes
modprobe: page allocation failure: order:0, mode:0xd2

Since the __va and __ka are 1:1 up to MODULES_VADDR and
cleanup_highmap rids __ka of the ramdisk mapping, what
we want to do is similar - get rid of the P2M in the __ka
address space. There are two ways of fixing this:

 1) All P2M lookups instead of using the __ka address would
    use the __va address. This means we can safely erase from
    __ka space the PMD pointers that point to the PFNs for
    P2M array and be OK.
 2). Allocate a new array, copy the existing P2M into it,
    revector the P2M tree to use that, and return the old
    P2M to the memory allocate. This has the advantage that
    it sets the stage for using XEN_ELF_NOTE_INIT_P2M
    feature. That feature allows us to set the exact virtual
    address space we want for the P2M - and allows us to
    boot as initial domain on large machines.

So we pick option 2).

This patch only lays the groundwork in the P2M code. The patch
that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree."

-- xen/mmu: Copy and revector the P2M tree:

The 'xen_revector_p2m_tree()' function allocates a new P2M tree
copies the contents of the old one in it, and returns the new one.

At this stage, the __ka address space (which is what the old
P2M tree was using) is partially disassembled. The cleanup_highmap
has removed the PMD entries from 0-16MB and anything past _brk_end
up to the max_pfn_mapped (which is the end of the ramdisk).

We have revectored the P2M tree (and the one for save/restore as well)
to use new shiny __va address to new MFNs. The xen_start_info
has been taken care of already in 'xen_setup_kernel_pagetable()' and
xen_start_info->shared_info in 'xen_setup_shared_info()', so
we are free to roam and delete PMD entries - which is exactly what
we are going to do. We rip out the __ka for the old P2M array.

-- xen/mmu:   Remove from __ka space PMD entries for

At this stage, the __ka address space (which is what the old
P2M tree was using) is partially disassembled. The cleanup_highmap
has removed the PMD entries from 0-16MB and anything past _brk_end
up to the max_pfn_mapped (which is the end of the ramdisk).

The xen_remove_p2m_tree and code around has ripped out the __ka for
the old P2M array.

Here we continue on doing it to where the Xen page-tables were.
It is safe to do it, as the page-tables are addressed using __va.
For good measure we delete anything that is within MODULES_VADDR
and up to the end of the PMD.

At this point the __ka only contains PMD entries for the start
of the kernel up to __brk.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/6] xen/mmu: use copy_page instead of memcpy.
  2012-07-31 14:43 [PATCH] Boot PV guests with more than 128GB (v2) for 3.7 Konrad Rzeszutek Wilk
@ 2012-07-31 14:43 ` Konrad Rzeszutek Wilk
  2012-07-31 14:43 ` [PATCH 2/6] xen/mmu: For 64-bit do not call xen_map_identity_early Konrad Rzeszutek Wilk
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-31 14:43 UTC (permalink / raw)
  To: linux-kernel, xen-devel, jbeulich, stefano.stabellini
  Cc: Konrad Rzeszutek Wilk

After all, this is what it is there for.

Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |   13 ++++++-------
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 6ba6100..7247e5a 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1754,14 +1754,14 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	 * it will be also modified in the __ka space! (But if you just
 	 * modify the PMD table to point to other PTE's or none, then you
 	 * are OK - which is what cleanup_highmap does) */
-	memcpy(level2_ident_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
+	copy_page(level2_ident_pgt, l2);
 	/* Graft it onto L4[511][511] */
-	memcpy(level2_kernel_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
+	copy_page(level2_kernel_pgt, l2);
 
 	/* Get [511][510] and graft that in level2_fixmap_pgt */
 	l3 = m2v(pgd[pgd_index(__START_KERNEL_map + PMD_SIZE)].pgd);
 	l2 = m2v(l3[pud_index(__START_KERNEL_map + PMD_SIZE)].pud);
-	memcpy(level2_fixmap_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
+	copy_page(level2_fixmap_pgt, l2);
 	/* Note that we don't do anything with level1_fixmap_pgt which
 	 * we don't need. */
 
@@ -1821,8 +1821,7 @@ static void __init xen_write_cr3_init(unsigned long cr3)
 	 */
 	swapper_kernel_pmd =
 		extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
-	memcpy(swapper_kernel_pmd, initial_kernel_pmd,
-	       sizeof(pmd_t) * PTRS_PER_PMD);
+	copy_page(swapper_kernel_pmd, initial_kernel_pmd);
 	swapper_pg_dir[KERNEL_PGD_BOUNDARY] =
 		__pgd(__pa(swapper_kernel_pmd) | _PAGE_PRESENT);
 	set_page_prot(swapper_kernel_pmd, PAGE_KERNEL_RO);
@@ -1851,11 +1850,11 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 				  512*1024);
 
 	kernel_pmd = m2v(pgd[KERNEL_PGD_BOUNDARY].pgd);
-	memcpy(initial_kernel_pmd, kernel_pmd, sizeof(pmd_t) * PTRS_PER_PMD);
+	copy_page(initial_kernel_pmd, kernel_pmd);
 
 	xen_map_identity_early(initial_kernel_pmd, max_pfn);
 
-	memcpy(initial_page_table, pgd, sizeof(pgd_t) * PTRS_PER_PGD);
+	copy_page(initial_page_table, pgd);
 	initial_page_table[KERNEL_PGD_BOUNDARY] =
 		__pgd(__pa(initial_kernel_pmd) | _PAGE_PRESENT);
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/6] xen/mmu: For 64-bit do not call xen_map_identity_early
  2012-07-31 14:43 [PATCH] Boot PV guests with more than 128GB (v2) for 3.7 Konrad Rzeszutek Wilk
  2012-07-31 14:43 ` [PATCH 1/6] xen/mmu: use copy_page instead of memcpy Konrad Rzeszutek Wilk
@ 2012-07-31 14:43 ` Konrad Rzeszutek Wilk
  2012-07-31 14:43 ` [PATCH 3/6] xen/mmu: Recycle the Xen provided L4, L3, and L2 pages Konrad Rzeszutek Wilk
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-31 14:43 UTC (permalink / raw)
  To: linux-kernel, xen-devel, jbeulich, stefano.stabellini
  Cc: Konrad Rzeszutek Wilk

B/c we do not need it. During the startup the Xen provides
us with all the memory mapped that we need to function.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 7247e5a..a59070b 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -84,6 +84,7 @@
  */
 DEFINE_SPINLOCK(xen_reservation_lock);
 
+#ifdef CONFIG_X86_32
 /*
  * Identity map, in addition to plain kernel map.  This needs to be
  * large enough to allocate page table pages to allocate the rest.
@@ -91,7 +92,7 @@ DEFINE_SPINLOCK(xen_reservation_lock);
  */
 #define LEVEL1_IDENT_ENTRIES	(PTRS_PER_PTE * 4)
 static RESERVE_BRK_ARRAY(pte_t, level1_ident_pgt, LEVEL1_IDENT_ENTRIES);
-
+#endif
 #ifdef CONFIG_X86_64
 /* l3 pud for userspace vsyscall mapping */
 static pud_t level3_user_vsyscall[PTRS_PER_PUD] __page_aligned_bss;
@@ -1628,7 +1629,7 @@ static void set_page_prot(void *addr, pgprot_t prot)
 	if (HYPERVISOR_update_va_mapping((unsigned long)addr, pte, 0))
 		BUG();
 }
-
+#ifdef CONFIG_X86_32
 static void __init xen_map_identity_early(pmd_t *pmd, unsigned long max_pfn)
 {
 	unsigned pmdidx, pteidx;
@@ -1679,7 +1680,7 @@ static void __init xen_map_identity_early(pmd_t *pmd, unsigned long max_pfn)
 
 	set_page_prot(pmd, PAGE_KERNEL_RO);
 }
-
+#endif
 void __init xen_setup_machphys_mapping(void)
 {
 	struct xen_machphys_mapping mapping;
@@ -1765,14 +1766,12 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	/* Note that we don't do anything with level1_fixmap_pgt which
 	 * we don't need. */
 
-	/* Set up identity map */
-	xen_map_identity_early(level2_ident_pgt, max_pfn);
-
 	/* Make pagetable pieces RO */
 	set_page_prot(init_level4_pgt, PAGE_KERNEL_RO);
 	set_page_prot(level3_ident_pgt, PAGE_KERNEL_RO);
 	set_page_prot(level3_kernel_pgt, PAGE_KERNEL_RO);
 	set_page_prot(level3_user_vsyscall, PAGE_KERNEL_RO);
+	set_page_prot(level2_ident_pgt, PAGE_KERNEL_RO);
 	set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO);
 	set_page_prot(level2_fixmap_pgt, PAGE_KERNEL_RO);
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/6] xen/mmu: Recycle the Xen provided L4, L3, and L2 pages
  2012-07-31 14:43 [PATCH] Boot PV guests with more than 128GB (v2) for 3.7 Konrad Rzeszutek Wilk
  2012-07-31 14:43 ` [PATCH 1/6] xen/mmu: use copy_page instead of memcpy Konrad Rzeszutek Wilk
  2012-07-31 14:43 ` [PATCH 2/6] xen/mmu: For 64-bit do not call xen_map_identity_early Konrad Rzeszutek Wilk
@ 2012-07-31 14:43 ` Konrad Rzeszutek Wilk
  2012-07-31 14:43 ` [PATCH 4/6] xen/p2m: Add logic to revector a P2M tree to use __va leafs Konrad Rzeszutek Wilk
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-31 14:43 UTC (permalink / raw)
  To: linux-kernel, xen-devel, jbeulich, stefano.stabellini
  Cc: Konrad Rzeszutek Wilk

As we are not using them. We end up only using the L1 pagetables
and grafting those to our page-tables.

[v1: Per Stefano's suggestion squashed two commits]
[v2: Per Stefano's suggestion simplified loop]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |   40 +++++++++++++++++++++++++++++++++-------
 1 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index a59070b..de4b8fd 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1708,7 +1708,20 @@ static void convert_pfn_mfn(void *v)
 	for (i = 0; i < PTRS_PER_PTE; i++)
 		pte[i] = xen_make_pte(pte[i].pte);
 }
-
+static __init check_pt_base(unsigned long *pt_base, unsigned long *pt_end,
+			    unsigned long addr)
+{
+	if (pt_base == PFN_DOWN(__pa(addr))) {
+		set_page_prot((void *)addr, PAGE_KERNEL);
+		clear_page((void *)addr);
+		*pt_base++;
+	}
+	if (pt_end == PFN_DOWN(__pa(addr))) {
+		set_page_prot((void *)addr, PAGE_KERNEL);
+		clear_page((void *)addr);
+		*pt_end--;
+	}
+}
 /*
  * Set up the initial kernel pagetable.
  *
@@ -1724,6 +1737,9 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 {
 	pud_t *l3;
 	pmd_t *l2;
+	unsigned long addr[3];
+	unsigned long pt_base, pt_end;
+	unsigned i;
 
 	/* max_pfn_mapped is the last pfn mapped in the initial memory
 	 * mappings. Considering that on Xen after the kernel mappings we
@@ -1731,6 +1747,9 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	 * set max_pfn_mapped to the last real pfn mapped. */
 	max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->mfn_list));
 
+	pt_base = PFN_DOWN(__pa(xen_start_info->pt_base));
+	pt_end = PFN_DOWN(__pa(xen_start_info->pt_base + (xen_start_info->nr_pt_frames * PAGE_SIZE)));
+
 	/* Zap identity mapping */
 	init_level4_pgt[0] = __pgd(0);
 
@@ -1749,6 +1768,9 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	l3 = m2v(pgd[pgd_index(__START_KERNEL_map)].pgd);
 	l2 = m2v(l3[pud_index(__START_KERNEL_map)].pud);
 
+	addr[0] = (unsigned long)pgd;
+	addr[1] = (unsigned long)l3;
+	addr[2] = (unsigned long)l2;
 	/* Graft it onto L4[272][0]. Note that we creating an aliasing problem:
 	 * Both L4[272][0] and L4[511][511] have entries that point to the same
 	 * L2 (PMD) tables. Meaning that if you modify it in __va space
@@ -1782,20 +1804,24 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	/* Unpin Xen-provided one */
 	pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
 
-	/* Switch over */
-	pgd = init_level4_pgt;
-
 	/*
 	 * At this stage there can be no user pgd, and no page
 	 * structure to attach it to, so make sure we just set kernel
 	 * pgd.
 	 */
 	xen_mc_batch();
-	__xen_write_cr3(true, __pa(pgd));
+	__xen_write_cr3(true, __pa(init_level4_pgt));
 	xen_mc_issue(PARAVIRT_LAZY_CPU);
 
-	memblock_reserve(__pa(xen_start_info->pt_base),
-			 xen_start_info->nr_pt_frames * PAGE_SIZE);
+	/* We can't that easily rip out L3 and L2, as the Xen pagetables are
+	 * set out this way: [L4], [L1], [L2], [L3], [L1], [L1] ...  for
+	 * the initial domain. For guests using the toolstack, they are in:
+	 * [L4], [L3], [L2], [L1], [L1], order .. */
+	for (i = 0; i < ARRAY_SIZE(addr); i++)
+		check_pt_base(&pt_base, &pt_end, addr[i]);
+
+	/* Our (by three pages) smaller Xen pagetable that we are using */
+	memblock_reserve(PFN_PHYS(pt_base), (pt_end - pt_base) * PAGE_SIZE);
 }
 #else	/* !CONFIG_X86_64 */
 static RESERVE_BRK_ARRAY(pmd_t, initial_kernel_pmd, PTRS_PER_PMD);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/6] xen/p2m: Add logic to revector a P2M tree to use __va leafs.
  2012-07-31 14:43 [PATCH] Boot PV guests with more than 128GB (v2) for 3.7 Konrad Rzeszutek Wilk
                   ` (2 preceding siblings ...)
  2012-07-31 14:43 ` [PATCH 3/6] xen/mmu: Recycle the Xen provided L4, L3, and L2 pages Konrad Rzeszutek Wilk
@ 2012-07-31 14:43 ` Konrad Rzeszutek Wilk
  2012-07-31 14:43 ` [PATCH 5/6] xen/mmu: Copy and revector the P2M tree Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-31 14:43 UTC (permalink / raw)
  To: linux-kernel, xen-devel, jbeulich, stefano.stabellini
  Cc: Konrad Rzeszutek Wilk

During bootup Xen supplies us with a P2M array. It sticks
it right after the ramdisk, as can be seen with a 128GB PV guest:

(certain parts removed for clarity):
xc_dom_build_image: called
xc_dom_alloc_segment:   kernel       : 0xffffffff81000000 -> 0xffffffff81e43000  (pfn 0x1000 + 0xe43 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000
xc_dom_alloc_segment:   ramdisk      : 0xffffffff81e43000 -> 0xffffffff925c7000  (pfn 0x1e43 + 0x10784 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000
xc_dom_alloc_segment:   phys2mach    : 0xffffffff925c7000 -> 0xffffffffa25c7000  (pfn 0x125c7 + 0x10000 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x10000 at 0x7f0942dd2000
xc_dom_alloc_page   :   start info   : 0xffffffffa25c7000 (pfn 0x225c7)
xc_dom_alloc_page   :   xenstore     : 0xffffffffa25c8000 (pfn 0x225c8)
xc_dom_alloc_page   :   console      : 0xffffffffa25c9000 (pfn 0x225c9)
nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s)
nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s)
nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffbfffffff, 1 table(s)
nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffffa27fffff, 276 table(s)
xc_dom_alloc_segment:   page tables  : 0xffffffffa25ca000 -> 0xffffffffa26e1000  (pfn 0x225ca + 0x117 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000
xc_dom_alloc_page   :   boot stack   : 0xffffffffa26e1000 (pfn 0x226e1)
xc_dom_build_image  : virt_alloc_end : 0xffffffffa26e2000
xc_dom_build_image  : virt_pgtab_end : 0xffffffffa2800000

So the physical memory and virtual (using __START_KERNEL_map addresses)
layout looks as so:

  phys                             __ka
/------------\                   /-------------------\
| 0          | empty             | 0xffffffff80000000|
| ..         |                   | ..                |
| 16MB       | <= kernel starts  | 0xffffffff81000000|
| ..         |                   |                   |
| 30MB       | <= kernel ends => | 0xffffffff81e43000|
| ..         |  & ramdisk starts | ..                |
| 293MB      | <= ramdisk ends=> | 0xffffffff925c7000|
| ..         |  & P2M starts     | ..                |
| ..         |                   | ..                |
| 549MB      | <= P2M ends    => | 0xffffffffa25c7000|
| ..         | start_info        | 0xffffffffa25c7000|
| ..         | xenstore          | 0xffffffffa25c8000|
| ..         | cosole            | 0xffffffffa25c9000|
| 549MB      | <= page tables => | 0xffffffffa25ca000|
| ..         |                   |                   |
| 550MB      | <= PGT end     => | 0xffffffffa26e1000|
| ..         | boot stack        |                   |
\------------/                   \-------------------/

As can be seen, the ramdisk, P2M and pagetables are taking
a bit of __ka addresses space. Which is a problem since the
MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits
right in there! This results during bootup with the inability to
load modules, with this error:

------------[ cut here ]------------
WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 vmap_page_range_noflush+0x2d9/0x370()
Call Trace:
 [<ffffffff810719fa>] warn_slowpath_common+0x7a/0xb0
 [<ffffffff81030279>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff81071a45>] warn_slowpath_null+0x15/0x20
 [<ffffffff81130b89>] vmap_page_range_noflush+0x2d9/0x370
 [<ffffffff81130c4d>] map_vm_area+0x2d/0x50
 [<ffffffff811326d0>] __vmalloc_node_range+0x160/0x250
 [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
 [<ffffffff810c6186>] ? load_module+0x66/0x19c0
 [<ffffffff8105cadc>] module_alloc+0x5c/0x60
 [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
 [<ffffffff810c5369>] module_alloc_update_bounds+0x19/0x80
 [<ffffffff810c70c3>] load_module+0xfa3/0x19c0
 [<ffffffff812491f6>] ? security_file_permission+0x86/0x90
 [<ffffffff810c7b3a>] sys_init_module+0x5a/0x220
 [<ffffffff815ce339>] system_call_fastpath+0x16/0x1b
---[ end trace fd8f7704fdea0291 ]---
vmalloc: allocation failure, allocated 16384 of 20480 bytes
modprobe: page allocation failure: order:0, mode:0xd2

Since the __va and __ka are 1:1 up to MODULES_VADDR and
cleanup_highmap rids __ka of the ramdisk mapping, what
we want to do is similar - get rid of the P2M in the __ka
address space. There are two ways of fixing this:

 1) All P2M lookups instead of using the __ka address would
    use the __va address. This means we can safely erase from
    __ka space the PMD pointers that point to the PFNs for
    P2M array and be OK.
 2). Allocate a new array, copy the existing P2M into it,
    revector the P2M tree to use that, and return the old
    P2M to the memory allocate. This has the advantage that
    it sets the stage for using XEN_ELF_NOTE_INIT_P2M
    feature. That feature allows us to set the exact virtual
    address space we want for the P2M - and allows us to
    boot as initial domain on large machines.

So we pick option 2).

This patch only lays the groundwork in the P2M code. The patch
that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree."

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/p2m.c     |   70 ++++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/xen/xen-ops.h |    1 +
 2 files changed, 71 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 6a2bfa4..bbfd085 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -394,7 +394,77 @@ void __init xen_build_dynamic_phys_to_machine(void)
 	 * Xen provided pagetable). Do it later in xen_reserve_internals.
 	 */
 }
+#ifdef CONFIG_X86_64
+#include <linux/bootmem.h>
+unsigned long __init xen_revector_p2m_tree(void)
+{
+	unsigned long va_start;
+	unsigned long va_end;
+	unsigned long pfn;
+	unsigned long *mfn_list = NULL;
+	unsigned long size;
+
+	va_start = xen_start_info->mfn_list;
+	/*We copy in increments of P2M_PER_PAGE * sizeof(unsigned long),
+	 * so make sure it is rounded up to that */
+	size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
+	va_end = va_start + size;
+
+	/* If we were revectored already, don't do it again. */
+	if (va_start <= __START_KERNEL_map && va_start >= __PAGE_OFFSET)
+		return 0;
+
+	mfn_list = alloc_bootmem_align(size, PAGE_SIZE);
+	if (!mfn_list) {
+		pr_warn("Could not allocate space for a new P2M tree!\n");
+		return xen_start_info->mfn_list;
+	}
+	/* Fill it out with INVALID_P2M_ENTRY value */
+	memset(mfn_list, 0xFF, size);
+
+	for (pfn = 0; pfn < ALIGN(MAX_DOMAIN_PAGES, P2M_PER_PAGE); pfn += P2M_PER_PAGE) {
+		unsigned topidx = p2m_top_index(pfn);
+		unsigned mididx;
+		unsigned long *mid_p;
+
+		if (!p2m_top[topidx])
+			continue;
+
+		if (p2m_top[topidx] == p2m_mid_missing)
+			continue;
+
+		mididx = p2m_mid_index(pfn);
+		mid_p = p2m_top[topidx][mididx];
+		if (!mid_p)
+			continue;
+		if ((mid_p == p2m_missing) || (mid_p == p2m_identity))
+			continue;
+
+		if ((unsigned long)mid_p == INVALID_P2M_ENTRY)
+			continue;
+
+		/* The old va. Rebase it on mfn_list */
+		if (mid_p >= (unsigned long *)va_start && mid_p <= (unsigned long *)va_end) {
+			unsigned long *new;
+
+			new = &mfn_list[pfn];
+
+			copy_page(new, mid_p);
+			p2m_top[topidx][mididx] = &mfn_list[pfn];
+			p2m_top_mfn_p[topidx][mididx] = virt_to_mfn(&mfn_list[pfn]);
 
+		}
+		/* This should be the leafs allocated for identity from _brk. */
+	}
+	return (unsigned long)mfn_list;
+
+}
+#else
+unsigned long __init xen_revector_p2m_tree(void)
+{
+	return 0;
+}
+#endif
 unsigned long get_phys_to_machine(unsigned long pfn)
 {
 	unsigned topidx, mididx, idx;
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 2230f57..bb5a810 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -45,6 +45,7 @@ void xen_hvm_init_shared_info(void);
 void xen_unplug_emulated_devices(void);
 
 void __init xen_build_dynamic_phys_to_machine(void);
+unsigned long __init xen_revector_p2m_tree(void);
 
 void xen_init_irq_ops(void);
 void xen_setup_timer(int cpu);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/6] xen/mmu: Copy and revector the P2M tree.
  2012-07-31 14:43 [PATCH] Boot PV guests with more than 128GB (v2) for 3.7 Konrad Rzeszutek Wilk
                   ` (3 preceding siblings ...)
  2012-07-31 14:43 ` [PATCH 4/6] xen/p2m: Add logic to revector a P2M tree to use __va leafs Konrad Rzeszutek Wilk
@ 2012-07-31 14:43 ` Konrad Rzeszutek Wilk
  2012-07-31 14:43 ` [PATCH 6/6] xen/mmu: Remove from __ka space PMD entries for pagetables Konrad Rzeszutek Wilk
  2012-08-01 15:50 ` [PATCH] Boot PV guests with more than 128GB (v2) for 3.7 Konrad Rzeszutek Wilk
  6 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-31 14:43 UTC (permalink / raw)
  To: linux-kernel, xen-devel, jbeulich, stefano.stabellini
  Cc: Konrad Rzeszutek Wilk

Please first read the description in "xen/p2m: Add logic to revector a
P2M tree to use __va leafs" patch.

The 'xen_revector_p2m_tree()' function allocates a new P2M tree
copies the contents of the old one in it, and returns the new one.

At this stage, the __ka address space (which is what the old
P2M tree was using) is partially disassembled. The cleanup_highmap
has removed the PMD entries from 0-16MB and anything past _brk_end
up to the max_pfn_mapped (which is the end of the ramdisk).

We have revectored the P2M tree (and the one for save/restore as well)
to use new shiny __va address to new MFNs. The xen_start_info
has been taken care of already in 'xen_setup_kernel_pagetable()' and
xen_start_info->shared_info in 'xen_setup_shared_info()', so
we are free to roam and delete PMD entries - which is exactly what
we are going to do. We rip out the __ka for the old P2M array.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |   57 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 57 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index de4b8fd..9358b75 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1183,9 +1183,64 @@ static __init void xen_mapping_pagetable_reserve(u64 start, u64 end)
 
 static void xen_post_allocator_init(void);
 
+#ifdef CONFIG_X86_64
+void __init xen_cleanhighmap(unsigned long vaddr, unsigned long vaddr_end)
+{
+	unsigned long kernel_end = roundup((unsigned long)_brk_end, PMD_SIZE) - 1;
+	pmd_t *pmd = level2_kernel_pgt + pmd_index(vaddr);
+
+	/* NOTE: The loop is more greedy than the cleanup_highmap variant.
+	 * We include the PMD passed in on _both_ boundaries. */
+	for (; vaddr <= vaddr_end && (pmd < (level2_kernel_pgt + PAGE_SIZE));
+			pmd++, vaddr += PMD_SIZE) {
+		if (pmd_none(*pmd))
+			continue;
+		if (vaddr < (unsigned long) _text || vaddr > kernel_end)
+			set_pmd(pmd, __pmd(0));
+	}
+	/* In case we did something silly, we should crash in this function
+	 * instead of somewhere later and be confusing. */
+	xen_mc_flush();
+}
+#endif
 static void __init xen_pagetable_setup_done(pgd_t *base)
 {
+#ifdef CONFIG_X86_64
+	unsigned long size;
+	unsigned long addr;
+#endif
+
 	xen_setup_shared_info();
+#ifdef CONFIG_X86_64
+	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+		unsigned long new_mfn_list;
+
+		size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
+
+		new_mfn_list = xen_revector_p2m_tree();
+
+		/* On 32-bit, we get zero so this never gets executed. */
+		if (new_mfn_list && new_mfn_list != xen_start_info->mfn_list) {
+			/* using __ka address! */
+			memset((void *)xen_start_info->mfn_list, 0, size);
+
+			/* We should be in __ka space. */
+			BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
+			addr = xen_start_info->mfn_list;
+			size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
+			/* We roundup to the PMD, which means that if anybody at this stage is
+			 * using the __ka address of xen_start_info or xen_start_info->shared_info
+			 * they are in going to crash. Fortunatly we have already revectored
+			 * in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
+			size = roundup(size, PMD_SIZE);
+			xen_cleanhighmap(addr, addr + size);
+
+			memblock_free(__pa(xen_start_info->mfn_list), size);
+			/* And revector! Bye bye old array */
+			xen_start_info->mfn_list = new_mfn_list;
+		}
+	}
+#endif
 	xen_post_allocator_init();
 }
 
@@ -1822,6 +1877,8 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 
 	/* Our (by three pages) smaller Xen pagetable that we are using */
 	memblock_reserve(PFN_PHYS(pt_base), (pt_end - pt_base) * PAGE_SIZE);
+	/* Revector the xen_start_info */
+	xen_start_info = (struct start_info *)__va(__pa(xen_start_info));
 }
 #else	/* !CONFIG_X86_64 */
 static RESERVE_BRK_ARRAY(pmd_t, initial_kernel_pmd, PTRS_PER_PMD);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/6] xen/mmu: Remove from __ka space PMD entries for pagetables.
  2012-07-31 14:43 [PATCH] Boot PV guests with more than 128GB (v2) for 3.7 Konrad Rzeszutek Wilk
                   ` (4 preceding siblings ...)
  2012-07-31 14:43 ` [PATCH 5/6] xen/mmu: Copy and revector the P2M tree Konrad Rzeszutek Wilk
@ 2012-07-31 14:43 ` Konrad Rzeszutek Wilk
  2012-08-01 15:50 ` [PATCH] Boot PV guests with more than 128GB (v2) for 3.7 Konrad Rzeszutek Wilk
  6 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-31 14:43 UTC (permalink / raw)
  To: linux-kernel, xen-devel, jbeulich, stefano.stabellini
  Cc: Konrad Rzeszutek Wilk

Please first read the description in "xen/mmu: Copy and revector the
P2M tree."

At this stage, the __ka address space (which is what the old
P2M tree was using) is partially disassembled. The cleanup_highmap
has removed the PMD entries from 0-16MB and anything past _brk_end
up to the max_pfn_mapped (which is the end of the ramdisk).

The xen_remove_p2m_tree and code around has ripped out the __ka for
the old P2M array.

Here we continue on doing it to where the Xen page-tables were.
It is safe to do it, as the page-tables are addressed using __va.
For good measure we delete anything that is within MODULES_VADDR
and up to the end of the PMD.

At this point the __ka only contains PMD entries for the start
of the kernel up to __brk.

[v1: Per Stefano's suggestion wrapped the MODULES_VADDR in debug]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 9358b75..fa4d208 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1240,6 +1240,25 @@ static void __init xen_pagetable_setup_done(pgd_t *base)
 			xen_start_info->mfn_list = new_mfn_list;
 		}
 	}
+	/* At this stage, cleanup_highmap has already cleaned __ka space
+	 * from _brk_limit way up to the max_pfn_mapped (which is the end of
+	 * the ramdisk). We continue on, erasing PMD entries that point to page
+	 * tables - do note that they are accessible at this stage via __va.
+	 * For good measure we also round up to the PMD - which means that if
+	 * anybody is using __ka address to the initial boot-stack - and try
+	 * to use it - they are going to crash. The xen_start_info has been
+	 * taken care of already in xen_setup_kernel_pagetable. */
+	addr = xen_start_info->pt_base;
+	size = roundup(xen_start_info->nr_pt_frames * PAGE_SIZE, PMD_SIZE);
+
+	xen_cleanhighmap(addr, addr + size);
+	xen_start_info->pt_base = (unsigned long)__va(__pa(xen_start_info->pt_base));
+#ifdef DEBUG
+	/* This is superflous and shouldn't be neccessary, but you know what
+	 * lets do it. The MODULES_VADDR -> MODULES_END should be clear of
+	 * anything at this stage. */
+	xen_cleanhighmap(MODULES_VADDR, roundup(MODULES_VADDR, PUD_SIZE) - 1);
+#endif
 #endif
 	xen_post_allocator_init();
 }
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
  2012-07-31 14:43 [PATCH] Boot PV guests with more than 128GB (v2) for 3.7 Konrad Rzeszutek Wilk
                   ` (5 preceding siblings ...)
  2012-07-31 14:43 ` [PATCH 6/6] xen/mmu: Remove from __ka space PMD entries for pagetables Konrad Rzeszutek Wilk
@ 2012-08-01 15:50 ` Konrad Rzeszutek Wilk
  2012-08-02  9:05   ` Jan Beulich
  6 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-08-01 15:50 UTC (permalink / raw)
  To: xen-devel, jbeulich, stefano.stabellini

On Tue, Jul 31, 2012 at 10:43:18AM -0400, Konrad Rzeszutek Wilk wrote:
> Changelog:
> Since v1: [http://lists.xen.org/archives/html/xen-devel/2012-07/msg01561.html]
>  - added more comments, and #ifdefs
>  - squashed The L4 and L4, L3, and L2 recycle patches together
>  - Added Acked-by's
> 
> The explanation of these patches is exactly what v1 had:
> 
> The details of this problem are nicely explained in:
> 
>  [PATCH 4/6] xen/p2m: Add logic to revector a P2M tree to use __va
>  [PATCH 5/6] xen/mmu: Copy and revector the P2M tree.
>  [PATCH 6/6] xen/mmu: Remove from __ka space PMD entries for
> 
> and the supporting patches are just nice optimizations. Pasting in
> what those patches mentioned:

With these patches I've gotten it to boot up to 384GB. Around that area
something weird happens - mainly the pagetables that the toolstack allocated
seems to have missing data. I hadn't looked in details, but this is what
domain builder tells me:


xc_dom_alloc_segment:   ramdisk      : 0xffffffff82278000 -> 0xffffffff930b4000  (pfn 0x2278 + 0x10e3c pages)
xc_dom_malloc            : 1621 kB
xc_dom_pfn_to_ptr: domU mapping: pfn 0x2278+0x10e3c at 0x7fb0853a2000
xc_dom_do_gunzip: unzip ok, 0x4ba831c -> 0x10e3be10
xc_dom_alloc_segment:   phys2mach    : 0xffffffff930b4000 -> 0xffffffffc30b4000  (pfn 0x130b4 + 0x30000 pages)
xc_dom_malloc            : 4608 kB
xc_dom_pfn_to_ptr: domU mapping: pfn 0x130b4+0x30000 at 0x7fb0553a2000
xc_dom_alloc_page   :   start info   : 0xffffffffc30b4000 (pfn 0x430b4)
xc_dom_alloc_page   :   xenstore     : 0xffffffffc30b5000 (pfn 0x430b5)
xc_dom_alloc_page   :   console      : 0xffffffffc30b6000 (pfn 0x430b6)
nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s)
nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s)
nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffffffffff, 2 table(s)
nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffffc33fffff, 538 table(s)
xc_dom_alloc_segment:   page tables  : 0xffffffffc30b7000 -> 0xffffffffc32d5000  (pfn 0x430b7 + 0x21e pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x430b7+0x21e at 0x7fb055184000
xc_dom_alloc_page   :   boot stack   : 0xffffffffc32d5000 (pfn 0x432d5)
xc_dom_build_image  : virt_alloc_end : 0xffffffffc32d6000
xc_dom_build_image  : virt_pgtab_end : 0xffffffffc3400000

Note it is is 0xffffffffc30b4000 - so already past the level2_kernel_pgt (L3[510]
and in level2_fixmap_pgt territory (L3[511]).

Hypervisor tells me:

(XEN) Pagetable walk from ffffffffc32d5ff8:
(XEN)  L4[0x1ff] = 000000b9804d9067 00000000000430b8
(XEN)  L3[0x1ff] = 0000000000000000 ffffffffffffffff
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 13 (vcpu#0) crashed on cpu#121:
(XEN) ----[ Xen-4.1.2-OVM  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    121
(XEN) RIP:    e033:[<ffffffff818a4200>]
(XEN) RFLAGS: 0000000000010202   EM: 1   CONTEXT: pv guest
(XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 0000000000000000
(XEN) rdx: 0000000000000000   rsi: ffffffffc30b4000   rdi: 0000000000000000
(XEN) rbp: 0000000000000000   rsp: ffffffffc32d6000   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 000000b9804da000   cr2: ffffffffc32d5ff8
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffffffffc32d6000:
(XEN)   Fault while accessing guest memory.

And that EIP translates to ffffffff818a4200 T startup_xen
which does:

ENTRY(startup_xen)
        cld      
ffffffff818a4200:       fc                      cld      
#ifdef CONFIG_X86_32
        mov %esi,xen_start_info
        mov $init_thread_union+THREAD_SIZE,%esp
#else
        mov %rsi,xen_start_info
ffffffff818a4201:       48 89 34 25 48 92 94    mov    %rsi,0xffffffff81949248
ffffffff818a4208:       81       


At that stage we are still operating using the Xen provided pagetable - which
look to have the L4[511][511] empty! Which sounds to me like a Xen tool-stack
problem? Jan, have you seen something similar to this?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
  2012-08-01 15:50 ` [PATCH] Boot PV guests with more than 128GB (v2) for 3.7 Konrad Rzeszutek Wilk
@ 2012-08-02  9:05   ` Jan Beulich
  2012-08-02 14:17     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2012-08-02  9:05 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Stefano Stabellini

>>> On 01.08.12 at 17:50, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> With these patches I've gotten it to boot up to 384GB. Around that area
> something weird happens - mainly the pagetables that the toolstack allocated
> seems to have missing data. I hadn't looked in details, but this is what
> domain builder tells me:
> 
> 
> xc_dom_alloc_segment:   ramdisk      : 0xffffffff82278000 -> 
> 0xffffffff930b4000  (pfn 0x2278 + 0x10e3c pages)
> xc_dom_malloc            : 1621 kB
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x2278+0x10e3c at 0x7fb0853a2000
> xc_dom_do_gunzip: unzip ok, 0x4ba831c -> 0x10e3be10
> xc_dom_alloc_segment:   phys2mach    : 0xffffffff930b4000 -> 
> 0xffffffffc30b4000  (pfn 0x130b4 + 0x30000 pages)
> xc_dom_malloc            : 4608 kB
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x130b4+0x30000 at 0x7fb0553a2000
> xc_dom_alloc_page   :   start info   : 0xffffffffc30b4000 (pfn 0x430b4)
> xc_dom_alloc_page   :   xenstore     : 0xffffffffc30b5000 (pfn 0x430b5)
> xc_dom_alloc_page   :   console      : 0xffffffffc30b6000 (pfn 0x430b6)
> nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 
> 0xffffffffffffffff, 1 table(s)
> nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 
> 0xffffffffffffffff, 1 table(s)
> nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 
> 0xffffffffffffffff, 2 table(s)
> nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 
> 0xffffffffc33fffff, 538 table(s)
> xc_dom_alloc_segment:   page tables  : 0xffffffffc30b7000 -> 
> 0xffffffffc32d5000  (pfn 0x430b7 + 0x21e pages)
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x430b7+0x21e at 0x7fb055184000
> xc_dom_alloc_page   :   boot stack   : 0xffffffffc32d5000 (pfn 0x432d5)
> xc_dom_build_image  : virt_alloc_end : 0xffffffffc32d6000
> xc_dom_build_image  : virt_pgtab_end : 0xffffffffc3400000
> 
> Note it is is 0xffffffffc30b4000 - so already past the level2_kernel_pgt 
> (L3[510]
> and in level2_fixmap_pgt territory (L3[511]).
> 
> At that stage we are still operating using the Xen provided pagetable - which
> look to have the L4[511][511] empty! Which sounds to me like a Xen tool-stack
> problem? Jan, have you seen something similar to this?

No we haven't, but I also don't think anyone tried to create as
big a DomU. I was, however, under the impression that DomU-s
this big had been created at Oracle before. Or was that only up
to 256Gb perhaps?

In any case, setup_pgtables_x86_64() indeed looks flawed
to me: While the clearing of l1tab looks right, l[23]tab get
cleared (and hence a new table allocated) too early. l2tab
should really get cleared only when l1tab gets cleared _and_
the L2 clearing condition is true. Similarly for l3tab then, and
of course - even though it would unlikely ever matter -
setup_pgtables_x86_32_pae() is broken in the same way.

Afaict this got broken with the domain build re-write between
3.0.4 and 3.1 (the old code looks alright).

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
  2012-08-02  9:05   ` Jan Beulich
@ 2012-08-02 14:17     ` Konrad Rzeszutek Wilk
  2012-08-02 23:04       ` Mukesh Rathor
  0 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-08-02 14:17 UTC (permalink / raw)
  To: Jan Beulich, Mukesh Rathor; +Cc: Stefano Stabellini, xen-devel

On Thu, Aug 02, 2012 at 10:05:27AM +0100, Jan Beulich wrote:
> >>> On 01.08.12 at 17:50, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > With these patches I've gotten it to boot up to 384GB. Around that area
> > something weird happens - mainly the pagetables that the toolstack allocated
> > seems to have missing data. I hadn't looked in details, but this is what
> > domain builder tells me:
> > 
> > 
> > xc_dom_alloc_segment:   ramdisk      : 0xffffffff82278000 -> 
> > 0xffffffff930b4000  (pfn 0x2278 + 0x10e3c pages)
> > xc_dom_malloc            : 1621 kB
> > xc_dom_pfn_to_ptr: domU mapping: pfn 0x2278+0x10e3c at 0x7fb0853a2000
> > xc_dom_do_gunzip: unzip ok, 0x4ba831c -> 0x10e3be10
> > xc_dom_alloc_segment:   phys2mach    : 0xffffffff930b4000 -> 
> > 0xffffffffc30b4000  (pfn 0x130b4 + 0x30000 pages)
> > xc_dom_malloc            : 4608 kB
> > xc_dom_pfn_to_ptr: domU mapping: pfn 0x130b4+0x30000 at 0x7fb0553a2000
> > xc_dom_alloc_page   :   start info   : 0xffffffffc30b4000 (pfn 0x430b4)
> > xc_dom_alloc_page   :   xenstore     : 0xffffffffc30b5000 (pfn 0x430b5)
> > xc_dom_alloc_page   :   console      : 0xffffffffc30b6000 (pfn 0x430b6)
> > nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 
> > 0xffffffffffffffff, 1 table(s)
> > nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 
> > 0xffffffffffffffff, 1 table(s)
> > nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 
> > 0xffffffffffffffff, 2 table(s)
> > nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 
> > 0xffffffffc33fffff, 538 table(s)
> > xc_dom_alloc_segment:   page tables  : 0xffffffffc30b7000 -> 
> > 0xffffffffc32d5000  (pfn 0x430b7 + 0x21e pages)
> > xc_dom_pfn_to_ptr: domU mapping: pfn 0x430b7+0x21e at 0x7fb055184000
> > xc_dom_alloc_page   :   boot stack   : 0xffffffffc32d5000 (pfn 0x432d5)
> > xc_dom_build_image  : virt_alloc_end : 0xffffffffc32d6000
> > xc_dom_build_image  : virt_pgtab_end : 0xffffffffc3400000
> > 
> > Note it is is 0xffffffffc30b4000 - so already past the level2_kernel_pgt 
> > (L3[510]
> > and in level2_fixmap_pgt territory (L3[511]).
> > 
> > At that stage we are still operating using the Xen provided pagetable - which
> > look to have the L4[511][511] empty! Which sounds to me like a Xen tool-stack
> > problem? Jan, have you seen something similar to this?
> 
> No we haven't, but I also don't think anyone tried to create as
> big a DomU. I was, however, under the impression that DomU-s
> this big had been created at Oracle before. Or was that only up
> to 256Gb perhaps?

Mukesh do you recall? Was it with OVM2.2.2 which was 3.4 based?
It might be that we did not have the 1TB hardware at that time yet.

Or perhaps I am missing some bug-fix from the old product..

> 
> In any case, setup_pgtables_x86_64() indeed looks flawed
> to me: While the clearing of l1tab looks right, l[23]tab get
> cleared (and hence a new table allocated) too early. l2tab
> should really get cleared only when l1tab gets cleared _and_
> the L2 clearing condition is true. Similarly for l3tab then, and
> of course - even though it would unlikely ever matter -
> setup_pgtables_x86_32_pae() is broken in the same way.
> 
> Afaict this got broken with the domain build re-write between
> 3.0.4 and 3.1 (the old code looks alright).

Oh wow. Long time ago. Thanks for the pointer - will look at this
once I am through with some of the current bug log.
> 
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
  2012-08-02 14:17     ` Konrad Rzeszutek Wilk
@ 2012-08-02 23:04       ` Mukesh Rathor
  2012-08-03 13:30         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 27+ messages in thread
From: Mukesh Rathor @ 2012-08-02 23:04 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Stefano Stabellini, Jan Beulich, xen-devel

On Thu, 2 Aug 2012 10:17:10 -0400
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

> On Thu, Aug 02, 2012 at 10:05:27AM +0100, Jan Beulich wrote:
> > >>> On 01.08.12 at 17:50, Konrad Rzeszutek Wilk
> > >>> <konrad.wilk@oracle.com> wrote:
> > > With these patches I've gotten it to boot up to 384GB. Around
> > > that area something weird happens - mainly the pagetables that
> > > the toolstack allocated seems to have missing data. I hadn't
> > > looked in details, but this is what domain builder tells me:
> > > 
> > > 
> > > xc_dom_alloc_segment:   ramdisk      : 0xffffffff82278000 -> 
> > > 0xffffffff930b4000  (pfn 0x2278 + 0x10e3c pages)
> > > xc_dom_malloc            : 1621 kB
> > > xc_dom_pfn_to_ptr: domU mapping: pfn 0x2278+0x10e3c at
> > > 0x7fb0853a2000 xc_dom_do_gunzip: unzip ok, 0x4ba831c -> 0x10e3be10
> > > xc_dom_alloc_segment:   phys2mach    : 0xffffffff930b4000 -> 
> > > 0xffffffffc30b4000  (pfn 0x130b4 + 0x30000 pages)
> > > xc_dom_malloc            : 4608 kB
> > > xc_dom_pfn_to_ptr: domU mapping: pfn 0x130b4+0x30000 at
> > > 0x7fb0553a2000 xc_dom_alloc_page   :   start info   :
> > > 0xffffffffc30b4000 (pfn 0x430b4) xc_dom_alloc_page   :
> > > xenstore     : 0xffffffffc30b5000 (pfn 0x430b5)
> > > xc_dom_alloc_page   :   console      : 0xffffffffc30b6000 (pfn
> > > 0x430b6) nr_page_tables: 0x0000ffffffffffff/48:
> > > 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s)
> > > nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 ->
> > > 0xffffffffffffffff, 1 table(s) nr_page_tables:
> > > 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffffffffff,
> > > 2 table(s) nr_page_tables: 0x00000000001fffff/21:
> > > 0xffffffff80000000 -> 0xffffffffc33fffff, 538 table(s)
> > > xc_dom_alloc_segment:   page tables  : 0xffffffffc30b7000 -> 
> > > 0xffffffffc32d5000  (pfn 0x430b7 + 0x21e pages)
> > > xc_dom_pfn_to_ptr: domU mapping: pfn 0x430b7+0x21e at
> > > 0x7fb055184000 xc_dom_alloc_page   :   boot stack   :
> > > 0xffffffffc32d5000 (pfn 0x432d5) xc_dom_build_image  :
> > > virt_alloc_end : 0xffffffffc32d6000 xc_dom_build_image  :
> > > virt_pgtab_end : 0xffffffffc3400000
> > > 
> > > Note it is is 0xffffffffc30b4000 - so already past the
> > > level2_kernel_pgt (L3[510]
> > > and in level2_fixmap_pgt territory (L3[511]).
> > > 
> > > At that stage we are still operating using the Xen provided
> > > pagetable - which look to have the L4[511][511] empty! Which
> > > sounds to me like a Xen tool-stack problem? Jan, have you seen
> > > something similar to this?
> > 
> > No we haven't, but I also don't think anyone tried to create as
> > big a DomU. I was, however, under the impression that DomU-s
> > this big had been created at Oracle before. Or was that only up
> > to 256Gb perhaps?
> 
> Mukesh do you recall? Was it with OVM2.2.2 which was 3.4 based?
> It might be that we did not have the 1TB hardware at that time yet.

Yes, in ovm2.x, I debugged/booted upto 500GB domU. So something
got broken after it looks like. I can debug later if it becomes hot. 

thanks,
Mukesh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
  2012-08-02 23:04       ` Mukesh Rathor
@ 2012-08-03 13:30         ` Konrad Rzeszutek Wilk
  2012-08-03 13:54           ` Jan Beulich
  2012-08-03 18:37           ` [PATCH] Boot PV guests with more than 128GB (v2) for 3.7 Mukesh Rathor
  0 siblings, 2 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-08-03 13:30 UTC (permalink / raw)
  To: Mukesh Rathor
  Cc: Stefano Stabellini, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

> > > > Note it is is 0xffffffffc30b4000 - so already past the
> > > > level2_kernel_pgt (L3[510]
> > > > and in level2_fixmap_pgt territory (L3[511]).
> > > > 
> > > > At that stage we are still operating using the Xen provided
> > > > pagetable - which look to have the L4[511][511] empty! Which
> > > > sounds to me like a Xen tool-stack problem? Jan, have you seen
> > > > something similar to this?
> > > 
> > > No we haven't, but I also don't think anyone tried to create as
> > > big a DomU. I was, however, under the impression that DomU-s
> > > this big had been created at Oracle before. Or was that only up
> > > to 256Gb perhaps?
> > 
> > Mukesh do you recall? Was it with OVM2.2.2 which was 3.4 based?
> > It might be that we did not have the 1TB hardware at that time yet.
> 
> Yes, in ovm2.x, I debugged/booted upto 500GB domU. So something
> got broken after it looks like. I can debug later if it becomes hot. 

I got the kernel part fixed but its the toolstack that got bugs in it.
If you recall - where there any patches in the toolstack for this or
did you just concentrate on the kernel?
Thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
  2012-08-03 13:30         ` Konrad Rzeszutek Wilk
@ 2012-08-03 13:54           ` Jan Beulich
       [not found]             ` <CAPbh3rsXaqQS9WQQmJ2uQ46LZdyFzkbSodUabGDAyFS+qTEwUg@mail.gmail.com>
  2012-08-03 18:37           ` [PATCH] Boot PV guests with more than 128GB (v2) for 3.7 Mukesh Rathor
  1 sibling, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2012-08-03 13:54 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Konrad Rzeszutek Wilk, Stefano Stabellini

>>> On 03.08.12 at 15:30, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:
>> > > > Note it is is 0xffffffffc30b4000 - so already past the
>> > > > level2_kernel_pgt (L3[510]
>> > > > and in level2_fixmap_pgt territory (L3[511]).
>> > > > 
>> > > > At that stage we are still operating using the Xen provided
>> > > > pagetable - which look to have the L4[511][511] empty! Which
>> > > > sounds to me like a Xen tool-stack problem? Jan, have you seen
>> > > > something similar to this?
>> > > 
>> > > No we haven't, but I also don't think anyone tried to create as
>> > > big a DomU. I was, however, under the impression that DomU-s
>> > > this big had been created at Oracle before. Or was that only up
>> > > to 256Gb perhaps?
>> > 
>> > Mukesh do you recall? Was it with OVM2.2.2 which was 3.4 based?
>> > It might be that we did not have the 1TB hardware at that time yet.
>> 
>> Yes, in ovm2.x, I debugged/booted upto 500GB domU. So something
>> got broken after it looks like. I can debug later if it becomes hot. 
> 
> I got the kernel part fixed but its the toolstack that got bugs in it.

So did you try the suggested fix? Or are you waiting for me to
put this in patch form?

Jan

> If you recall - where there any patches in the toolstack for this or
> did you just concentrate on the kernel?
> Thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
  2012-08-03 13:30         ` Konrad Rzeszutek Wilk
  2012-08-03 13:54           ` Jan Beulich
@ 2012-08-03 18:37           ` Mukesh Rathor
  1 sibling, 0 replies; 27+ messages in thread
From: Mukesh Rathor @ 2012-08-03 18:37 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Jan Beulich, Stefano Stabellini

On Fri, 3 Aug 2012 09:30:01 -0400
Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:

> > > > > Note it is is 0xffffffffc30b4000 - so already past the
> > > > > level2_kernel_pgt (L3[510]
> > > > > and in level2_fixmap_pgt territory (L3[511]).
> > > > > 
> > > > > At that stage we are still operating using the Xen provided
> > > > > pagetable - which look to have the L4[511][511] empty! Which
> > > > > sounds to me like a Xen tool-stack problem? Jan, have you seen
> > > > > something similar to this?
> > > > 
> > > > No we haven't, but I also don't think anyone tried to create as
> > > > big a DomU. I was, however, under the impression that DomU-s
> > > > this big had been created at Oracle before. Or was that only up
> > > > to 256Gb perhaps?
> > > 
> > > Mukesh do you recall? Was it with OVM2.2.2 which was 3.4 based?
> > > It might be that we did not have the 1TB hardware at that time
> > > yet.
> > 
> > Yes, in ovm2.x, I debugged/booted upto 500GB domU. So something
> > got broken after it looks like. I can debug later if it becomes
> > hot. 
> 
> I got the kernel part fixed but its the toolstack that got bugs in it.
> If you recall - where there any patches in the toolstack for this or
> did you just concentrate on the kernel?

Ah, I remember, it was issue in tool stack, xm, so I punted it for
tools expert. They were busy, so we hoped xl would fix it. 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
       [not found]             ` <CAPbh3rsXaqQS9WQQmJ2uQ46LZdyFzkbSodUabGDAyFS+qTEwUg@mail.gmail.com>
@ 2012-08-13  7:54               ` Jan Beulich
  2012-09-03  6:33                 ` Ping: " Jan Beulich
  2013-08-27 20:34                 ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 27+ messages in thread
From: Jan Beulich @ 2012-08-13  7:54 UTC (permalink / raw)
  To: konrad; +Cc: xen-devel

>>> On 03.08.12 at 16:46, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:
> Didn't get to it yet. Sorry for top posting. If you have a patch ready I
> can test it on Monday - travelling now.

So here's what I was thinking of (compile tested only).

Jan

--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -241,7 +241,7 @@ static int setup_pgtables_x86_32_pae(str
     l3_pgentry_64_t *l3tab;
     l2_pgentry_64_t *l2tab = NULL;
     l1_pgentry_64_t *l1tab = NULL;
-    unsigned long l3off, l2off, l1off;
+    unsigned long l3off, l2off = 0, l1off;
     xen_vaddr_t addr;
     xen_pfn_t pgpfn;
     xen_pfn_t l3mfn = xc_dom_p2m_guest(dom, l3pfn);
@@ -283,8 +283,6 @@ static int setup_pgtables_x86_32_pae(str
             l2off = l2_table_offset_pae(addr);
             l2tab[l2off] =
                 pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
-            if ( l2off == (L2_PAGETABLE_ENTRIES_PAE - 1) )
-                l2tab = NULL;
             l1pfn++;
         }
 
@@ -296,8 +294,13 @@ static int setup_pgtables_x86_32_pae(str
         if ( (addr >= dom->pgtables_seg.vstart) &&
              (addr < dom->pgtables_seg.vend) )
             l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
+
         if ( l1off == (L1_PAGETABLE_ENTRIES_PAE - 1) )
+        {
             l1tab = NULL;
+            if ( l2off == (L2_PAGETABLE_ENTRIES_PAE - 1) )
+                l2tab = NULL;
+        }
     }
 
     if ( dom->virt_pgtab_end <= 0xc0000000 )
@@ -340,7 +343,7 @@ static int setup_pgtables_x86_64(struct 
     l3_pgentry_64_t *l3tab = NULL;
     l2_pgentry_64_t *l2tab = NULL;
     l1_pgentry_64_t *l1tab = NULL;
-    uint64_t l4off, l3off, l2off, l1off;
+    uint64_t l4off, l3off = 0, l2off = 0, l1off;
     uint64_t addr;
     xen_pfn_t pgpfn;
 
@@ -364,8 +367,6 @@ static int setup_pgtables_x86_64(struct 
             l3off = l3_table_offset_x86_64(addr);
             l3tab[l3off] =
                 pfn_to_paddr(xc_dom_p2m_guest(dom, l2pfn)) | L3_PROT;
-            if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
-                l3tab = NULL;
             l2pfn++;
         }
 
@@ -376,8 +377,6 @@ static int setup_pgtables_x86_64(struct 
             l2off = l2_table_offset_x86_64(addr);
             l2tab[l2off] =
                 pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
-            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
-                l2tab = NULL;
             l1pfn++;
         }
 
@@ -389,8 +388,17 @@ static int setup_pgtables_x86_64(struct 
         if ( (addr >= dom->pgtables_seg.vstart) && 
              (addr < dom->pgtables_seg.vend) )
             l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
+
         if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) )
+        {
             l1tab = NULL;
+            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
+            {
+                l2tab = NULL;
+                if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
+                    l3tab = NULL;
+            }
+        }
     }
     return 0;
 }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Ping: Re: [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
  2012-08-13  7:54               ` Jan Beulich
@ 2012-09-03  6:33                 ` Jan Beulich
  2012-09-06 21:03                   ` Konrad Rzeszutek Wilk
  2013-08-27 20:34                 ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2012-09-03  6:33 UTC (permalink / raw)
  To: konrad, Konrad Rzeszutek Wilk; +Cc: xen-devel

>>> On 13.08.12 at 09:54, "Jan Beulich" <JBeulich@suse.com> wrote:
>>>> On 03.08.12 at 16:46, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:
>> Didn't get to it yet. Sorry for top posting. If you have a patch ready I
>> can test it on Monday - travelling now.
> 
> So here's what I was thinking of (compile tested only).

Obviously, if this works, I'd like to see this included in 4.2 (and
4.1-testing).

Jan

> --- a/tools/libxc/xc_dom_x86.c
> +++ b/tools/libxc/xc_dom_x86.c
> @@ -241,7 +241,7 @@ static int setup_pgtables_x86_32_pae(str
>      l3_pgentry_64_t *l3tab;
>      l2_pgentry_64_t *l2tab = NULL;
>      l1_pgentry_64_t *l1tab = NULL;
> -    unsigned long l3off, l2off, l1off;
> +    unsigned long l3off, l2off = 0, l1off;
>      xen_vaddr_t addr;
>      xen_pfn_t pgpfn;
>      xen_pfn_t l3mfn = xc_dom_p2m_guest(dom, l3pfn);
> @@ -283,8 +283,6 @@ static int setup_pgtables_x86_32_pae(str
>              l2off = l2_table_offset_pae(addr);
>              l2tab[l2off] =
>                  pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
> -            if ( l2off == (L2_PAGETABLE_ENTRIES_PAE - 1) )
> -                l2tab = NULL;
>              l1pfn++;
>          }
>  
> @@ -296,8 +294,13 @@ static int setup_pgtables_x86_32_pae(str
>          if ( (addr >= dom->pgtables_seg.vstart) &&
>               (addr < dom->pgtables_seg.vend) )
>              l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
> +
>          if ( l1off == (L1_PAGETABLE_ENTRIES_PAE - 1) )
> +        {
>              l1tab = NULL;
> +            if ( l2off == (L2_PAGETABLE_ENTRIES_PAE - 1) )
> +                l2tab = NULL;
> +        }
>      }
>  
>      if ( dom->virt_pgtab_end <= 0xc0000000 )
> @@ -340,7 +343,7 @@ static int setup_pgtables_x86_64(struct 
>      l3_pgentry_64_t *l3tab = NULL;
>      l2_pgentry_64_t *l2tab = NULL;
>      l1_pgentry_64_t *l1tab = NULL;
> -    uint64_t l4off, l3off, l2off, l1off;
> +    uint64_t l4off, l3off = 0, l2off = 0, l1off;
>      uint64_t addr;
>      xen_pfn_t pgpfn;
>  
> @@ -364,8 +367,6 @@ static int setup_pgtables_x86_64(struct 
>              l3off = l3_table_offset_x86_64(addr);
>              l3tab[l3off] =
>                  pfn_to_paddr(xc_dom_p2m_guest(dom, l2pfn)) | L3_PROT;
> -            if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
> -                l3tab = NULL;
>              l2pfn++;
>          }
>  
> @@ -376,8 +377,6 @@ static int setup_pgtables_x86_64(struct 
>              l2off = l2_table_offset_x86_64(addr);
>              l2tab[l2off] =
>                  pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
> -            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
> -                l2tab = NULL;
>              l1pfn++;
>          }
>  
> @@ -389,8 +388,17 @@ static int setup_pgtables_x86_64(struct 
>          if ( (addr >= dom->pgtables_seg.vstart) && 
>               (addr < dom->pgtables_seg.vend) )
>              l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
> +
>          if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) )
> +        {
>              l1tab = NULL;
> +            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
> +            {
> +                l2tab = NULL;
> +                if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
> +                    l3tab = NULL;
> +            }
> +        }
>      }
>      return 0;
>  }
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Ping: Re: [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
  2012-09-03  6:33                 ` Ping: " Jan Beulich
@ 2012-09-06 21:03                   ` Konrad Rzeszutek Wilk
  2012-09-07  9:01                     ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-09-06 21:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: konrad, xen-devel, Konrad Rzeszutek Wilk

On Mon, Sep 03, 2012 at 07:33:24AM +0100, Jan Beulich wrote:
> >>> On 13.08.12 at 09:54, "Jan Beulich" <JBeulich@suse.com> wrote:
> >>>> On 03.08.12 at 16:46, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:
> >> Didn't get to it yet. Sorry for top posting. If you have a patch ready I
> >> can test it on Monday - travelling now.
> > 
> > So here's what I was thinking of (compile tested only).
> 
> Obviously, if this works, I'd like to see this included in 4.2 (and
> 4.1-testing).

No luck. I still get:

(XEN) Pagetable walk from ffff8800443da070:
(XEN)  L4[0x110] = 0000009342f95067 0000000000001a0c
(XEN)  L3[0x001] = 0000000000000000 ffffffffffffffff
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 61 (vcpu#0) crashed on cpu#97:
(XEN) ----[ Xen-4.1.2-OVM  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    97
(XEN) RIP:    e033:[<ffffffff81abf971>]
(XEN) RFLAGS: 0000000000000246   EM: 1   CONTEXT: pv guest
(XEN) rax: ffff8800443da000   rbx: 0000000000000000   rcx: 0000000000000001
(XEN) rdx: ffffffff81f76000   rsi: 0000000000000000   rdi: 0000000000000006
(XEN) rbp: ffffffff81a01ff8   rsp: ffffffff81a01f70   r8:  0000000000000000
(XEN) r9:  00000000443e0000   r10: 0000000000225000   r11: 0000008b00fc2067
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000009342f96000   cr2: ffff8800443da070
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffffffff81a01f70:
(XEN)    0000000000000001 0000008b00fc2067 0000000000000000 ffffffff81abf971
(XEN)    000000010000e030 0000000000010046 ffffffff81a01fb8 000000000000e02b
(XEN)    ffffffff81abf918 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 80b822011f898975 000206e537200800 0000000000000001
(XEN)    0000000000000000 0000000000000000 0f00000060c0c748 ccccccccccccc305
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Ping: Re: [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
  2012-09-06 21:03                   ` Konrad Rzeszutek Wilk
@ 2012-09-07  9:01                     ` Jan Beulich
  2012-09-07 13:39                       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2012-09-07  9:01 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: konrad, Konrad Rzeszutek Wilk, xen-devel

>>> On 06.09.12 at 23:03, Konrad Rzeszutek Wilk <konrad@kernel.org> wrote:
> On Mon, Sep 03, 2012 at 07:33:24AM +0100, Jan Beulich wrote:
>> >>> On 13.08.12 at 09:54, "Jan Beulich" <JBeulich@suse.com> wrote:
>> >>>> On 03.08.12 at 16:46, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:
>> >> Didn't get to it yet. Sorry for top posting. If you have a patch ready I
>> >> can test it on Monday - travelling now.
>> > 
>> > So here's what I was thinking of (compile tested only).
>> 
>> Obviously, if this works, I'd like to see this included in 4.2 (and
>> 4.1-testing).
> 
> No luck. I still get:
> 
> (XEN) Pagetable walk from ffff8800443da070:
> (XEN)  L4[0x110] = 0000009342f95067 0000000000001a0c
> (XEN)  L3[0x001] = 0000000000000000 ffffffffffffffff

And I can't see why. I wasn't able to track down the original
stack trace you saw on the archives - was that identical to
this one (i.e. nothing changed at all)? If so (please forgive
that I'm asking, I just know that I happen to fall into this trap
once in a while myself), did you indeed build and install the
patched tools? In that case, adding some logging to the code
in question is presumably the only alternative, short of
anyone else seeing anything further wrong with that code.

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Ping: Re: [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
  2012-09-07  9:01                     ` Jan Beulich
@ 2012-09-07 13:39                       ` Konrad Rzeszutek Wilk
  2012-09-07 14:09                         ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-09-07 13:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, xen-devel, konrad

On Fri, Sep 07, 2012 at 10:01:36AM +0100, Jan Beulich wrote:
> >>> On 06.09.12 at 23:03, Konrad Rzeszutek Wilk <konrad@kernel.org> wrote:
> > On Mon, Sep 03, 2012 at 07:33:24AM +0100, Jan Beulich wrote:
> >> >>> On 13.08.12 at 09:54, "Jan Beulich" <JBeulich@suse.com> wrote:
> >> >>>> On 03.08.12 at 16:46, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:
> >> >> Didn't get to it yet. Sorry for top posting. If you have a patch ready I
> >> >> can test it on Monday - travelling now.
> >> > 
> >> > So here's what I was thinking of (compile tested only).
> >> 
> >> Obviously, if this works, I'd like to see this included in 4.2 (and
> >> 4.1-testing).
> > 
> > No luck. I still get:
> > 
> > (XEN) Pagetable walk from ffff8800443da070:
> > (XEN)  L4[0x110] = 0000009342f95067 0000000000001a0c
> > (XEN)  L3[0x001] = 0000000000000000 ffffffffffffffff
> 
> And I can't see why. I wasn't able to track down the original
> stack trace you saw on the archives - was that identical to
> this one (i.e. nothing changed at all)? If so (please forgive

It does look identical.

> that I'm asking, I just know that I happen to fall into this trap
> once in a while myself), did you indeed build and install the

I know. I did double check - as I couldn't install wholesale the
new RPM (owner of the box needed the old version of it), instead
I did this bit of hack:

xend stop
cd /konrad
rpmcpio xen-*konrad* | cpio -id
tar -czvf /xen.orig.tgz /usr/lib64/*xen*
rm -Rf /usr/lib64/*xen*
mv /usr/lib/python2.4/site-packages/xen /usr/lib/python2.4/site-packages/xen.old
ln -s /konrad/usr/lib/python2.4/site-packages/xen  /usr/lib/python2.4/site-packages/xen 
export PATH=/konrad/usr/bin:/konrad/usr/sbin:$PATH
export LD_LIBRARY_PATH=/konrad/usr/lib64

xend start
xm create /konrad/test.xm

Which _should_ have taken care of all in the toolstack.

> patched tools? In that case, adding some logging to the code
> in question is presumably the only alternative, short of
> anyone else seeing anything further wrong with that code.

That was my next thought too.. also that would verify that
my hac^H^H^Hinstallation worked properly.

> 
> Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Ping: Re: [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
  2012-09-07 13:39                       ` Konrad Rzeszutek Wilk
@ 2012-09-07 14:09                         ` Jan Beulich
  2012-09-07 14:11                           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2012-09-07 14:09 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: konrad, xen-devel, Konrad Rzeszutek Wilk

>>> On 07.09.12 at 15:39, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Fri, Sep 07, 2012 at 10:01:36AM +0100, Jan Beulich wrote:
>> that I'm asking, I just know that I happen to fall into this trap
>> once in a while myself), did you indeed build and install the
> 
> I know. I did double check - as I couldn't install wholesale the
> new RPM (owner of the box needed the old version of it), instead
> I did this bit of hack:
> 
> xend stop
> cd /konrad
> rpmcpio xen-*konrad* | cpio -id
> tar -czvf /xen.orig.tgz /usr/lib64/*xen*
> rm -Rf /usr/lib64/*xen*

So here you removed the old libraries. But where did you drop in
the new ones? Did you just forget to list this here?

> mv /usr/lib/python2.4/site-packages/xen /usr/lib/python2.4/site-packages/xen.old
> ln -s /konrad/usr/lib/python2.4/site-packages/xen  /usr/lib/python2.4/site-packages/xen 
> export PATH=/konrad/usr/bin:/konrad/usr/sbin:$PATH
> export LD_LIBRARY_PATH=/konrad/usr/lib64
> 
> xend start
> xm create /konrad/test.xm

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Ping: Re: [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
  2012-09-07 14:09                         ` Jan Beulich
@ 2012-09-07 14:11                           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-09-07 14:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: konrad, xen-devel, Konrad Rzeszutek Wilk

On Fri, Sep 07, 2012 at 03:09:00PM +0100, Jan Beulich wrote:
> >>> On 07.09.12 at 15:39, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > On Fri, Sep 07, 2012 at 10:01:36AM +0100, Jan Beulich wrote:
> >> that I'm asking, I just know that I happen to fall into this trap
> >> once in a while myself), did you indeed build and install the
> > 
> > I know. I did double check - as I couldn't install wholesale the
> > new RPM (owner of the box needed the old version of it), instead
> > I did this bit of hack:
> > 
> > xend stop
> > cd /konrad
> > rpmcpio xen-*konrad* | cpio -id
> > tar -czvf /xen.orig.tgz /usr/lib64/*xen*
> > rm -Rf /usr/lib64/*xen*
> 
> So here you removed the old libraries. But where did you drop in
> the new ones? Did you just forget to list this here?

There was no need since the LD_LIBRARY_PATH did the over-write.
This was to make double sure that the old libs wouldn't be called.

> 
> > mv /usr/lib/python2.4/site-packages/xen /usr/lib/python2.4/site-packages/xen.old
> > ln -s /konrad/usr/lib/python2.4/site-packages/xen  /usr/lib/python2.4/site-packages/xen 
> > export PATH=/konrad/usr/bin:/konrad/usr/sbin:$PATH
> > export LD_LIBRARY_PATH=/konrad/usr/lib64
> > 
> > xend start
> > xm create /konrad/test.xm
> 
> Jan
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
  2012-08-13  7:54               ` Jan Beulich
  2012-09-03  6:33                 ` Ping: " Jan Beulich
@ 2013-08-27 20:34                 ` Konrad Rzeszutek Wilk
  2013-08-28  7:55                   ` Jan Beulich
  1 sibling, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-08-27 20:34 UTC (permalink / raw)
  To: Jan Beulich; +Cc: konrad, xen-devel

[-- Attachment #1: Type: text/plain, Size: 3766 bytes --]

On Mon, Aug 13, 2012 at 08:54:47AM +0100, Jan Beulich wrote:
> >>> On 03.08.12 at 16:46, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:
> > Didn't get to it yet. Sorry for top posting. If you have a patch ready I
> > can test it on Monday - travelling now.
> 
> So here's what I was thinking of (compile tested only).

Wow. It took me a whole year to get back to this.

Anyhow I did test it and it worked rather nicely for 64-bit guests. I didn't
even try to boot 32-bit guests as the pvops changes I did were only for 64-bit
guests. But if you have a specific kernel for a 32-bit guest I still have
the 1TB machine for a week and can boot it up there.


> 
> Jan
> 
> --- a/tools/libxc/xc_dom_x86.c
> +++ b/tools/libxc/xc_dom_x86.c
> @@ -241,7 +241,7 @@ static int setup_pgtables_x86_32_pae(str
>      l3_pgentry_64_t *l3tab;
>      l2_pgentry_64_t *l2tab = NULL;
>      l1_pgentry_64_t *l1tab = NULL;
> -    unsigned long l3off, l2off, l1off;
> +    unsigned long l3off, l2off = 0, l1off;
>      xen_vaddr_t addr;
>      xen_pfn_t pgpfn;
>      xen_pfn_t l3mfn = xc_dom_p2m_guest(dom, l3pfn);
> @@ -283,8 +283,6 @@ static int setup_pgtables_x86_32_pae(str
>              l2off = l2_table_offset_pae(addr);
>              l2tab[l2off] =
>                  pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
> -            if ( l2off == (L2_PAGETABLE_ENTRIES_PAE - 1) )
> -                l2tab = NULL;
>              l1pfn++;
>          }
>  
> @@ -296,8 +294,13 @@ static int setup_pgtables_x86_32_pae(str
>          if ( (addr >= dom->pgtables_seg.vstart) &&
>               (addr < dom->pgtables_seg.vend) )
>              l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
> +
>          if ( l1off == (L1_PAGETABLE_ENTRIES_PAE - 1) )
> +        {
>              l1tab = NULL;
> +            if ( l2off == (L2_PAGETABLE_ENTRIES_PAE - 1) )
> +                l2tab = NULL;
> +        }
>      }
>  
>      if ( dom->virt_pgtab_end <= 0xc0000000 )
> @@ -340,7 +343,7 @@ static int setup_pgtables_x86_64(struct 
>      l3_pgentry_64_t *l3tab = NULL;
>      l2_pgentry_64_t *l2tab = NULL;
>      l1_pgentry_64_t *l1tab = NULL;
> -    uint64_t l4off, l3off, l2off, l1off;
> +    uint64_t l4off, l3off = 0, l2off = 0, l1off;
>      uint64_t addr;
>      xen_pfn_t pgpfn;
>  
> @@ -364,8 +367,6 @@ static int setup_pgtables_x86_64(struct 
>              l3off = l3_table_offset_x86_64(addr);
>              l3tab[l3off] =
>                  pfn_to_paddr(xc_dom_p2m_guest(dom, l2pfn)) | L3_PROT;
> -            if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
> -                l3tab = NULL;
>              l2pfn++;
>          }
>  
> @@ -376,8 +377,6 @@ static int setup_pgtables_x86_64(struct 
>              l2off = l2_table_offset_x86_64(addr);
>              l2tab[l2off] =
>                  pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
> -            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
> -                l2tab = NULL;
>              l1pfn++;
>          }
>  
> @@ -389,8 +388,17 @@ static int setup_pgtables_x86_64(struct 
>          if ( (addr >= dom->pgtables_seg.vstart) && 
>               (addr < dom->pgtables_seg.vend) )
>              l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
> +
>          if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) )
> +        {
>              l1tab = NULL;
> +            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
> +            {
> +                l2tab = NULL;
> +                if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
> +                    l3tab = NULL;
> +            }
> +        }
>      }
>      return 0;
>  }
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

[-- Attachment #2: debug.patch --]
[-- Type: text/plain, Size: 4502 bytes --]

diff --git a/tools/libxc/xc_dom.h b/tools/libxc/xc_dom.h
index 86e23ee..ebc77ac 100644
--- a/tools/libxc/xc_dom.h
+++ b/tools/libxc/xc_dom.h
@@ -136,6 +136,7 @@ struct xc_dom_image {
     int8_t vhpt_size_log2; /* for IA64 */
     int8_t superpages;
     int claim_enabled; /* 0 by default, 1 enables it */
+    int fix;
     int shadow_enabled;
 
     int xen_version;
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 126c0f8..57291ab 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -360,13 +360,14 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom)
     l3_pgentry_64_t *l3tab = NULL;
     l2_pgentry_64_t *l2tab = NULL;
     l1_pgentry_64_t *l1tab = NULL;
-    uint64_t l4off, l3off, l2off, l1off;
+    uint64_t l4off = 0, l3off = 0, l2off = 0, l1off = 0;
     uint64_t addr;
     xen_pfn_t pgpfn;
 
     if ( l4tab == NULL )
         goto pfn_error;
-
+    
+    DOMPRINTF("%s: fix %s", __FUNCTION__, dom->fix ? "enabled" : "disabled");
     for ( addr = dom->parms.virt_base; addr < dom->virt_pgtab_end;
           addr += PAGE_SIZE_X86 )
     {
@@ -391,8 +392,10 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom)
             l3off = l3_table_offset_x86_64(addr);
             l3tab[l3off] =
                 pfn_to_paddr(xc_dom_p2m_guest(dom, l2pfn)) | L3_PROT;
-            if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
-                l3tab = NULL;
+	    if (!dom->fix) {
+                if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
+                	l3tab = NULL;
+	    }
             l2pfn++;
         }
 
@@ -405,8 +408,10 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom)
             l2off = l2_table_offset_x86_64(addr);
             l2tab[l2off] =
                 pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
-            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
-                l2tab = NULL;
+            if (!dom->fix) {
+            	if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
+                	l2tab = NULL;
+	    }
             l1pfn++;
         }
 
@@ -418,8 +423,17 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom)
         if ( (addr >= dom->pgtables_seg.vstart) && 
              (addr < dom->pgtables_seg.vend) )
             l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
-        if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) )
-            l1tab = NULL;
+
+	if (dom->fix) {
+		if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) ) {
+		    l1tab = NULL;
+		    if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) ) {
+			l2tab = NULL;
+			if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
+				l3tab = NULL;
+		    }
+		}
+	}
     }
     return 0;
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 6e2252a..8ec8bab 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -375,6 +375,7 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
     dom->xenstore_evtchn = state->store_port;
     dom->xenstore_domid = state->store_domid;
     dom->claim_enabled = libxl_defbool_val(info->claim_mode);
+    dom->fix = libxl_defbool_val(info->u.pv.fix);
 
     if ( (ret = xc_dom_boot_xen_init(dom, ctx->xch, domid)) != 0 ) {
         LOGE(ERROR, "xc_dom_boot_xen_init failed");
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 85341a0..fdda8a9 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -347,6 +347,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                       ("features", string, {'const': True}),
                                       # Use host's E820 for PCI passthrough.
                                       ("e820_host", libxl_defbool),
+                                      ("fix", libxl_defbool),
                                       ])),
                  ("invalid", Struct(None, [])),
                  ], keyvar_init_val = "LIBXL_DOMAIN_TYPE_INVALID")),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 884f050..834ff74 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1279,7 +1279,8 @@ skip_vfb:
 
     if (!xlu_cfg_get_long (config, "pci_permissive", &l, 0))
         pci_permissive = l;
-
+ 
+    xlu_cfg_get_defbool(config, "fix", &b_info->u.pv.fix, 0);
     /* To be reworked (automatically enabled) once the auto ballooning
      * after guest starts is done (with PCI devices passed in). */
     if (c_info->type == LIBXL_DOMAIN_TYPE_PV) {

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
  2013-08-27 20:34                 ` Konrad Rzeszutek Wilk
@ 2013-08-28  7:55                   ` Jan Beulich
  2013-08-28 14:44                     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2013-08-28  7:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: konrad, xen-devel

>>> On 27.08.13 at 22:34, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Mon, Aug 13, 2012 at 08:54:47AM +0100, Jan Beulich wrote:
>> >>> On 03.08.12 at 16:46, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:
>> > Didn't get to it yet. Sorry for top posting. If you have a patch ready I
>> > can test it on Monday - travelling now.
>> 
>> So here's what I was thinking of (compile tested only).
> 
> Wow. It took me a whole year to get back to this.
> 
> Anyhow I did test it and it worked rather nicely for 64-bit guests. I didn't
> even try to boot 32-bit guests as the pvops changes I did were only for 64-bit
> guests. But if you have a specific kernel for a 32-bit guest I still have
> the 1TB machine for a week and can boot it up there.

Considering that you had also attached a debug patch - did it
work without that, i.e. just with the patch that I had handed
you? If so, I'd then finally be in the position to submit this,
putting your Tested-by (and perhaps Reported-by) underneath.

And no, I'm not really concerned about the 32-bit case. The
analogy with the 64-bit code is sufficient to tell that the change
(even if just cosmetic) should also be done to the 32-bit variant.

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Boot PV guests with more than 128GB (v2) for 3.7
  2013-08-28  7:55                   ` Jan Beulich
@ 2013-08-28 14:44                     ` Konrad Rzeszutek Wilk
  2013-08-28 14:58                       ` [PATCH] libxc/x86: fix page table creation for huge guests Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-08-28 14:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: konrad, xen-devel

On Wed, Aug 28, 2013 at 08:55:39AM +0100, Jan Beulich wrote:
> >>> On 27.08.13 at 22:34, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > On Mon, Aug 13, 2012 at 08:54:47AM +0100, Jan Beulich wrote:
> >> >>> On 03.08.12 at 16:46, Konrad Rzeszutek Wilk <konrad@darnok.org> wrote:
> >> > Didn't get to it yet. Sorry for top posting. If you have a patch ready I
> >> > can test it on Monday - travelling now.
> >> 
> >> So here's what I was thinking of (compile tested only).
> > 
> > Wow. It took me a whole year to get back to this.
> > 
> > Anyhow I did test it and it worked rather nicely for 64-bit guests. I didn't
> > even try to boot 32-bit guests as the pvops changes I did were only for 64-bit
> > guests. But if you have a specific kernel for a 32-bit guest I still have
> > the 1TB machine for a week and can boot it up there.
> 
> Considering that you had also attached a debug patch - did it
> work without that, i.e. just with the patch that I had handed
> you? If so, I'd then finally be in the position to submit this,
> putting your Tested-by (and perhaps Reported-by) underneath.

Yes it did with the 'memory=440000' guest config. I developed the
debug patch just to make sure I could see the failing case (fix=0) and
working case (fix=1) without having to reboot this monster machine.


Interestingly enough if I boot with a 486GB guest I end up with:

[root@ca-test111 konrad]# xl dmesg | tail -300
(XEN) d8:v0: unhandled page fault (ec=0000)
(XEN) Pagetable walk from ffff880043e75070:
(XEN)  L4[0x110] = 00000080ba854067 0000000000001a0d
(XEN)  L3[0x001] = 0000000000000000 ffffffffffffffff
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 8 (vcpu#0) crashed on cpu#16:
(XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    16
(XEN) RIP:    e033:[<ffffffff81acd29e>]
(XEN) RFLAGS: 0000000000000246   EM: 1   CONTEXT: pv guest
(XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: ffffffff8219e000
(XEN) rdx: 0000000000000000   rsi: ffff880043e75000   rdi: 00000000deadbeef
(XEN) rbp: ffffffff81a01ff8   rsp: ffffffff81a01f00   r8:  0000000043e7a000
(XEN) r9:  0000000043e7b000   r10: 0000000000223000   r11: 000000a0a66b6067
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 000000400fcb6000   cr2: ffff880043e75070
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffffffff81a01f00:
(XEN)    ffffffff8219e000 000000a0a66b6067 0000000000000000 ffffffff81acd29e
(XEN)    000000010000e030 0000000000010046 ffffffff81a01f48 000000000000e02b
(XEN)    ffffffff81acd267 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 809822011f898975
(XEN)    000206e501200800 0000000000000001 0000000000000000 0000000000000000
(XEN)    0f00000060c0c748 ccccccccccccc305 cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc
(XEN)    cccccccccccccccc cccccccccccccccc cccccccccccccccc cccccccccccccccc

(this is with the debug patch and the guest having 'fix=1' enabled, meaning
it uses the new code path).

Thought looking at the stack more, I see:

ffffffff81acd29e is:

   0xffffffff81acd280 <xen_start_kernel+935>:   mov    $0xffffffff81931558,%rdi
   0xffffffff81acd287 <xen_start_kernel+942>:   xor    %eax,%eax
   0xffffffff81acd289 <xen_start_kernel+944>:   callq  0xffffffff813f5340 <xen_raw_printk>
   0xffffffff81acd28e <xen_start_kernel+949>:   mov    0x1a6f53(%rip),%rsi        # 0xffffffff81c741e8 <xen_start_info>
   0xffffffff81acd295 <xen_start_kernel+956>:   movb   $0x90,0x1aa454(%rip)        # 0xffffffff81c776f0 <boot_params+528>
   0xffffffff81acd29c <xen_start_kernel+963>:   xor    %edx,%edx
   0xffffffff81acd29e <xen_start_kernel+965>:   mov    0x70(%rsi),%rax

which implies that we copied from the 

  xen_start_info something (pt_base? mod_start?) which has the __va

address instead of the __kva one. So the bootup pagetables creation
I think we are OK with and indeed you can put 'Tested-by' tag on it.

I will dig in this a bit more.
> 
> And no, I'm not really concerned about the 32-bit case. The
> analogy with the 64-bit code is sufficient to tell that the change
> (even if just cosmetic) should also be done to the 32-bit variant.

Right.
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] libxc/x86: fix page table creation for huge guests
  2013-08-28 14:44                     ` Konrad Rzeszutek Wilk
@ 2013-08-28 14:58                       ` Jan Beulich
  2013-09-09  8:37                         ` Ping: " Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2013-08-28 14:58 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Ian Campbell

[-- Attachment #1: Type: text/plain, Size: 3230 bytes --]

The switch-over logic from one page directory to the next was wrong;
it needs to be deferred until we actually reach the last page within
a given region, instead of being done when the last entry of a page
directory gets started with.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -251,7 +251,7 @@ static int setup_pgtables_x86_32_pae(str
     l3_pgentry_64_t *l3tab;
     l2_pgentry_64_t *l2tab = NULL;
     l1_pgentry_64_t *l1tab = NULL;
-    unsigned long l3off, l2off, l1off;
+    unsigned long l3off, l2off = 0, l1off;
     xen_vaddr_t addr;
     xen_pfn_t pgpfn;
     xen_pfn_t l3mfn = xc_dom_p2m_guest(dom, l3pfn);
@@ -299,8 +299,6 @@ static int setup_pgtables_x86_32_pae(str
             l2off = l2_table_offset_pae(addr);
             l2tab[l2off] =
                 pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
-            if ( l2off == (L2_PAGETABLE_ENTRIES_PAE - 1) )
-                l2tab = NULL;
             l1pfn++;
         }
 
@@ -312,8 +310,13 @@ static int setup_pgtables_x86_32_pae(str
         if ( (addr >= dom->pgtables_seg.vstart) &&
              (addr < dom->pgtables_seg.vend) )
             l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
+
         if ( l1off == (L1_PAGETABLE_ENTRIES_PAE - 1) )
+        {
             l1tab = NULL;
+            if ( l2off == (L2_PAGETABLE_ENTRIES_PAE - 1) )
+                l2tab = NULL;
+        }
     }
 
     if ( dom->virt_pgtab_end <= 0xc0000000 )
@@ -360,7 +363,7 @@ static int setup_pgtables_x86_64(struct 
     l3_pgentry_64_t *l3tab = NULL;
     l2_pgentry_64_t *l2tab = NULL;
     l1_pgentry_64_t *l1tab = NULL;
-    uint64_t l4off, l3off, l2off, l1off;
+    uint64_t l4off, l3off = 0, l2off = 0, l1off;
     uint64_t addr;
     xen_pfn_t pgpfn;
 
@@ -391,8 +394,6 @@ static int setup_pgtables_x86_64(struct 
             l3off = l3_table_offset_x86_64(addr);
             l3tab[l3off] =
                 pfn_to_paddr(xc_dom_p2m_guest(dom, l2pfn)) | L3_PROT;
-            if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
-                l3tab = NULL;
             l2pfn++;
         }
 
@@ -405,8 +406,6 @@ static int setup_pgtables_x86_64(struct 
             l2off = l2_table_offset_x86_64(addr);
             l2tab[l2off] =
                 pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
-            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
-                l2tab = NULL;
             l1pfn++;
         }
 
@@ -418,8 +417,17 @@ static int setup_pgtables_x86_64(struct 
         if ( (addr >= dom->pgtables_seg.vstart) && 
              (addr < dom->pgtables_seg.vend) )
             l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
+
         if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) )
+        {
             l1tab = NULL;
+            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
+            {
+                l2tab = NULL;
+                if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
+                    l3tab = NULL;
+            }
+        }
     }
     return 0;
 




[-- Attachment #2: libxc-x86-pv-build-pt-large.patch --]
[-- Type: text/plain, Size: 3278 bytes --]

libxc/x86: fix page table creation for huge guests

The switch-over logic from one page directory to the next was wrong;
it needs to be deferred until we actually reach the last page within
a given region, instead of being done when the last entry of a page
directory gets started with.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -251,7 +251,7 @@ static int setup_pgtables_x86_32_pae(str
     l3_pgentry_64_t *l3tab;
     l2_pgentry_64_t *l2tab = NULL;
     l1_pgentry_64_t *l1tab = NULL;
-    unsigned long l3off, l2off, l1off;
+    unsigned long l3off, l2off = 0, l1off;
     xen_vaddr_t addr;
     xen_pfn_t pgpfn;
     xen_pfn_t l3mfn = xc_dom_p2m_guest(dom, l3pfn);
@@ -299,8 +299,6 @@ static int setup_pgtables_x86_32_pae(str
             l2off = l2_table_offset_pae(addr);
             l2tab[l2off] =
                 pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
-            if ( l2off == (L2_PAGETABLE_ENTRIES_PAE - 1) )
-                l2tab = NULL;
             l1pfn++;
         }
 
@@ -312,8 +310,13 @@ static int setup_pgtables_x86_32_pae(str
         if ( (addr >= dom->pgtables_seg.vstart) &&
              (addr < dom->pgtables_seg.vend) )
             l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
+
         if ( l1off == (L1_PAGETABLE_ENTRIES_PAE - 1) )
+        {
             l1tab = NULL;
+            if ( l2off == (L2_PAGETABLE_ENTRIES_PAE - 1) )
+                l2tab = NULL;
+        }
     }
 
     if ( dom->virt_pgtab_end <= 0xc0000000 )
@@ -360,7 +363,7 @@ static int setup_pgtables_x86_64(struct 
     l3_pgentry_64_t *l3tab = NULL;
     l2_pgentry_64_t *l2tab = NULL;
     l1_pgentry_64_t *l1tab = NULL;
-    uint64_t l4off, l3off, l2off, l1off;
+    uint64_t l4off, l3off = 0, l2off = 0, l1off;
     uint64_t addr;
     xen_pfn_t pgpfn;
 
@@ -391,8 +394,6 @@ static int setup_pgtables_x86_64(struct 
             l3off = l3_table_offset_x86_64(addr);
             l3tab[l3off] =
                 pfn_to_paddr(xc_dom_p2m_guest(dom, l2pfn)) | L3_PROT;
-            if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
-                l3tab = NULL;
             l2pfn++;
         }
 
@@ -405,8 +406,6 @@ static int setup_pgtables_x86_64(struct 
             l2off = l2_table_offset_x86_64(addr);
             l2tab[l2off] =
                 pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
-            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
-                l2tab = NULL;
             l1pfn++;
         }
 
@@ -418,8 +417,17 @@ static int setup_pgtables_x86_64(struct 
         if ( (addr >= dom->pgtables_seg.vstart) && 
              (addr < dom->pgtables_seg.vend) )
             l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
+
         if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) )
+        {
             l1tab = NULL;
+            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
+            {
+                l2tab = NULL;
+                if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
+                    l3tab = NULL;
+            }
+        }
     }
     return 0;
 

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Ping: [PATCH] libxc/x86: fix page table creation for huge guests
  2013-08-28 14:58                       ` [PATCH] libxc/x86: fix page table creation for huge guests Jan Beulich
@ 2013-09-09  8:37                         ` Jan Beulich
  2013-09-12 15:38                           ` Ian Jackson
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2013-09-09  8:37 UTC (permalink / raw)
  To: Ian Campbell, Ian Jackson; +Cc: xen-devel

Ping?

>>> On 28.08.13 at 16:58, "Jan Beulich" <JBeulich@suse.com> wrote:
> The switch-over logic from one page directory to the next was wrong;
> it needs to be deferred until we actually reach the last page within
> a given region, instead of being done when the last entry of a page
> directory gets started with.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> --- a/tools/libxc/xc_dom_x86.c
> +++ b/tools/libxc/xc_dom_x86.c
> @@ -251,7 +251,7 @@ static int setup_pgtables_x86_32_pae(str
>      l3_pgentry_64_t *l3tab;
>      l2_pgentry_64_t *l2tab = NULL;
>      l1_pgentry_64_t *l1tab = NULL;
> -    unsigned long l3off, l2off, l1off;
> +    unsigned long l3off, l2off = 0, l1off;
>      xen_vaddr_t addr;
>      xen_pfn_t pgpfn;
>      xen_pfn_t l3mfn = xc_dom_p2m_guest(dom, l3pfn);
> @@ -299,8 +299,6 @@ static int setup_pgtables_x86_32_pae(str
>              l2off = l2_table_offset_pae(addr);
>              l2tab[l2off] =
>                  pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
> -            if ( l2off == (L2_PAGETABLE_ENTRIES_PAE - 1) )
> -                l2tab = NULL;
>              l1pfn++;
>          }
>  
> @@ -312,8 +310,13 @@ static int setup_pgtables_x86_32_pae(str
>          if ( (addr >= dom->pgtables_seg.vstart) &&
>               (addr < dom->pgtables_seg.vend) )
>              l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
> +
>          if ( l1off == (L1_PAGETABLE_ENTRIES_PAE - 1) )
> +        {
>              l1tab = NULL;
> +            if ( l2off == (L2_PAGETABLE_ENTRIES_PAE - 1) )
> +                l2tab = NULL;
> +        }
>      }
>  
>      if ( dom->virt_pgtab_end <= 0xc0000000 )
> @@ -360,7 +363,7 @@ static int setup_pgtables_x86_64(struct 
>      l3_pgentry_64_t *l3tab = NULL;
>      l2_pgentry_64_t *l2tab = NULL;
>      l1_pgentry_64_t *l1tab = NULL;
> -    uint64_t l4off, l3off, l2off, l1off;
> +    uint64_t l4off, l3off = 0, l2off = 0, l1off;
>      uint64_t addr;
>      xen_pfn_t pgpfn;
>  
> @@ -391,8 +394,6 @@ static int setup_pgtables_x86_64(struct 
>              l3off = l3_table_offset_x86_64(addr);
>              l3tab[l3off] =
>                  pfn_to_paddr(xc_dom_p2m_guest(dom, l2pfn)) | L3_PROT;
> -            if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
> -                l3tab = NULL;
>              l2pfn++;
>          }
>  
> @@ -405,8 +406,6 @@ static int setup_pgtables_x86_64(struct 
>              l2off = l2_table_offset_x86_64(addr);
>              l2tab[l2off] =
>                  pfn_to_paddr(xc_dom_p2m_guest(dom, l1pfn)) | L2_PROT;
> -            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
> -                l2tab = NULL;
>              l1pfn++;
>          }
>  
> @@ -418,8 +417,17 @@ static int setup_pgtables_x86_64(struct 
>          if ( (addr >= dom->pgtables_seg.vstart) && 
>               (addr < dom->pgtables_seg.vend) )
>              l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */
> +
>          if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) )
> +        {
>              l1tab = NULL;
> +            if ( l2off == (L2_PAGETABLE_ENTRIES_X86_64 - 1) )
> +            {
> +                l2tab = NULL;
> +                if ( l3off == (L3_PAGETABLE_ENTRIES_X86_64 - 1) )
> +                    l3tab = NULL;
> +            }
> +        }
>      }
>      return 0;
>  

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Ping: [PATCH] libxc/x86: fix page table creation for huge guests
  2013-09-09  8:37                         ` Ping: " Jan Beulich
@ 2013-09-12 15:38                           ` Ian Jackson
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Jackson @ 2013-09-12 15:38 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Ian Campbell

Jan Beulich writes ("Ping: [PATCH] libxc/x86: fix page table creation for huge guests"):
> Ping?
> 
> >>> On 28.08.13 at 16:58, "Jan Beulich" <JBeulich@suse.com> wrote:
> > The switch-over logic from one page directory to the next was wrong;
> > it needs to be deferred until we actually reach the last page within
> > a given region, instead of being done when the last entry of a page
> > directory gets started with.
> > 
> > Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2013-09-12 15:38 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-31 14:43 [PATCH] Boot PV guests with more than 128GB (v2) for 3.7 Konrad Rzeszutek Wilk
2012-07-31 14:43 ` [PATCH 1/6] xen/mmu: use copy_page instead of memcpy Konrad Rzeszutek Wilk
2012-07-31 14:43 ` [PATCH 2/6] xen/mmu: For 64-bit do not call xen_map_identity_early Konrad Rzeszutek Wilk
2012-07-31 14:43 ` [PATCH 3/6] xen/mmu: Recycle the Xen provided L4, L3, and L2 pages Konrad Rzeszutek Wilk
2012-07-31 14:43 ` [PATCH 4/6] xen/p2m: Add logic to revector a P2M tree to use __va leafs Konrad Rzeszutek Wilk
2012-07-31 14:43 ` [PATCH 5/6] xen/mmu: Copy and revector the P2M tree Konrad Rzeszutek Wilk
2012-07-31 14:43 ` [PATCH 6/6] xen/mmu: Remove from __ka space PMD entries for pagetables Konrad Rzeszutek Wilk
2012-08-01 15:50 ` [PATCH] Boot PV guests with more than 128GB (v2) for 3.7 Konrad Rzeszutek Wilk
2012-08-02  9:05   ` Jan Beulich
2012-08-02 14:17     ` Konrad Rzeszutek Wilk
2012-08-02 23:04       ` Mukesh Rathor
2012-08-03 13:30         ` Konrad Rzeszutek Wilk
2012-08-03 13:54           ` Jan Beulich
     [not found]             ` <CAPbh3rsXaqQS9WQQmJ2uQ46LZdyFzkbSodUabGDAyFS+qTEwUg@mail.gmail.com>
2012-08-13  7:54               ` Jan Beulich
2012-09-03  6:33                 ` Ping: " Jan Beulich
2012-09-06 21:03                   ` Konrad Rzeszutek Wilk
2012-09-07  9:01                     ` Jan Beulich
2012-09-07 13:39                       ` Konrad Rzeszutek Wilk
2012-09-07 14:09                         ` Jan Beulich
2012-09-07 14:11                           ` Konrad Rzeszutek Wilk
2013-08-27 20:34                 ` Konrad Rzeszutek Wilk
2013-08-28  7:55                   ` Jan Beulich
2013-08-28 14:44                     ` Konrad Rzeszutek Wilk
2013-08-28 14:58                       ` [PATCH] libxc/x86: fix page table creation for huge guests Jan Beulich
2013-09-09  8:37                         ` Ping: " Jan Beulich
2013-09-12 15:38                           ` Ian Jackson
2012-08-03 18:37           ` [PATCH] Boot PV guests with more than 128GB (v2) for 3.7 Mukesh Rathor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).