* [PATCH v2 0/3] arm64: KVM: High memory guest fixes
@ 2015-03-10 19:06 Marc Zyngier
2015-03-10 19:06 ` [PATCH v2 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting Marc Zyngier
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Marc Zyngier @ 2015-03-10 19:06 UTC (permalink / raw)
To: linux-arm-kernel
I've been playing with a hacked-up kvmtool that places the memory much
higher than usual, in an effort to emulate what some arm64 platforms
do (typically AMD Seattle).
Unexpectedly (hey, what could possibly go wrong...), I've discovered a
couple of mm related issues, ranging from the host exploding to the
guest being stuck on a page fault. Additionally, I noticed a minor
documentation issue.
With these fixes in, I'm able to run a guest that has its memory
located just under the 40bit limit.
* From v1 [1]:
- Reworked patch #1 to use split_page() instead. This leads to much
refactoring, but is slightly nicer (more common code).
- Dropped Mark's Tested-by (too many changes)
[1]: https://lists.cs.columbia.edu/pipermail/kvmarm/2015-February/013672.html
Marc Zyngier (3):
arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting
arm64: KVM: Do not use pgd_index to index stage-2 pgd
arm64: KVM: Fix outdated comment about VTCR_EL2.PS
arch/arm/include/asm/kvm_mmu.h | 12 ++----
arch/arm/kvm/mmu.c | 85 ++++++++++++++++++++++++++++++----------
arch/arm64/include/asm/kvm_arm.h | 5 ++-
arch/arm64/include/asm/kvm_mmu.h | 48 +++--------------------
4 files changed, 77 insertions(+), 73 deletions(-)
--
2.1.4
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v2 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting 2015-03-10 19:06 [PATCH v2 0/3] arm64: KVM: High memory guest fixes Marc Zyngier @ 2015-03-10 19:06 ` Marc Zyngier 2015-03-11 12:16 ` Christoffer Dall 2015-03-10 19:07 ` [PATCH v2 2/3] arm64: KVM: Do not use pgd_index to index stage-2 pgd Marc Zyngier 2015-03-10 19:07 ` [PATCH v2 3/3] arm64: KVM: Fix outdated comment about VTCR_EL2.PS Marc Zyngier 2 siblings, 1 reply; 10+ messages in thread From: Marc Zyngier @ 2015-03-10 19:06 UTC (permalink / raw) To: linux-arm-kernel We're using __get_free_pages with to allocate the guest's stage-2 PGD. The standard behaviour of this function is to return a set of pages where only the head page has a valid refcount. This behaviour gets us into trouble when we're trying to increment the refcount on a non-head page: page:ffff7c00cfb693c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x4000000000000000() page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0) BUG: failure@include/linux/mm.h:548/get_page()! Kernel panic - not syncing: BUG! CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825 Hardware name: APM X-Gene Mustang board (DT) Call trace: [<ffff80000008a09c>] dump_backtrace+0x0/0x13c [<ffff80000008a1e8>] show_stack+0x10/0x1c [<ffff800000691da8>] dump_stack+0x74/0x94 [<ffff800000690d78>] panic+0x100/0x240 [<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc [<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0 [<ffff8000000a420c>] handle_exit+0x58/0x180 [<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c [<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754 [<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8 [<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78 CPU0: stopping A possible approach for this is to split the compound page using split_page() at allocation time, and change the teardown path to free one page at a time. While we're at it, the PGD allocation code is reworked to reduce duplication. This has been tested on an X-Gene platform with a 4kB/48bit-VA host kernel, and kvmtool hacked to place memory in the second page of the hardware PGD (PUD for the host kernel). Also regression-tested on a Cubietruck (Cortex-A7). Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> --- arch/arm/include/asm/kvm_mmu.h | 9 ++--- arch/arm/kvm/mmu.c | 77 +++++++++++++++++++++++++++++++--------- arch/arm64/include/asm/kvm_mmu.h | 46 +++--------------------- 3 files changed, 66 insertions(+), 66 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 0187606..ff56f91 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -162,18 +162,13 @@ static inline bool kvm_page_empty(void *ptr) #define KVM_PREALLOC_LEVEL 0 -static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) -{ - return 0; -} - -static inline void kvm_free_hwpgd(struct kvm *kvm) { } - static inline void *kvm_get_hwpgd(struct kvm *kvm) { return kvm->arch.pgd; } +static inline unsigned int kvm_get_hwpgd_order(void) { return S2_PGD_ORDER; } + struct kvm; #define kvm_flush_dcache_to_poc(a,l) __cpuc_flush_dcache_area((a), (l)) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 69c2b4c..0a5457c 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -634,6 +634,31 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr) __phys_to_pfn(phys_addr), PAGE_HYP_DEVICE); } +/* Free the HW pgd, one page at a time */ +static void kvm_free_hwpgd(unsigned long hwpgd) +{ + int i; + + for (i = 0; i < (1 << kvm_get_hwpgd_order()); i += PAGE_SIZE) + free_page(hwpgd + i); +} + +/* Allocate the HW PGD, making sure that each page gets its own refcount */ +static int kvm_alloc_hwpgd(unsigned long *hwpgdp) +{ + unsigned long hwpgd; + unsigned int order = kvm_get_hwpgd_order(); + + hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); + if (!hwpgd) + return -ENOMEM; + + split_page(virt_to_page((void *)hwpgd), order); + + *hwpgdp = hwpgd; + return 0; +} + /** * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation. * @kvm: The KVM struct pointer for the VM. @@ -649,13 +674,30 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm) { int ret; pgd_t *pgd; + unsigned long hwpgd; if (kvm->arch.pgd != NULL) { kvm_err("kvm_arch already initialized?\n"); return -EINVAL; } + ret = kvm_alloc_hwpgd(&hwpgd); + if (ret) + return ret; + + /* When the kernel uses more levels of page tables than the + * guest, we allocate a fake PGD and pre-populate it to point + * to the next-level page table, which will be the real + * initial page table pointed to by the VTTBR. + * + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for + * the PMD and the kernel will use folded pud. + * When KVM_PREALLOC_LEVEL==1, we allocate 2 consecutive PUD + * pages. + */ if (KVM_PREALLOC_LEVEL > 0) { + int i; + /* * Allocate fake pgd for the page table manipulation macros to * work. This is not used by the hardware and we have no @@ -663,30 +705,32 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm) */ pgd = (pgd_t *)kmalloc(PTRS_PER_S2_PGD * sizeof(pgd_t), GFP_KERNEL | __GFP_ZERO); + + if (!pgd) { + kvm_free_hwpgd(hwpgd); + return -ENOMEM; + } + + /* Plug the HW PGD into the fake one. */ + for (i = 0; i < PTRS_PER_S2_PGD; i++) { + if (KVM_PREALLOC_LEVEL == 1) + pgd_populate(NULL, pgd + i, + (pud_t *)hwpgd + i * PTRS_PER_PUD); + else if (KVM_PREALLOC_LEVEL == 2) + pud_populate(NULL, pud_offset(pgd, 0) + i, + (pmd_t *)hwpgd + i * PTRS_PER_PMD); + } } else { /* * Allocate actual first-level Stage-2 page table used by the * hardware for Stage-2 page table walks. */ - pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, S2_PGD_ORDER); + pgd = (pgd_t *)hwpgd; } - if (!pgd) - return -ENOMEM; - - ret = kvm_prealloc_hwpgd(kvm, pgd); - if (ret) - goto out_err; - kvm_clean_pgd(pgd); kvm->arch.pgd = pgd; return 0; -out_err: - if (KVM_PREALLOC_LEVEL > 0) - kfree(pgd); - else - free_pages((unsigned long)pgd, S2_PGD_ORDER); - return ret; } /** @@ -787,11 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm) return; unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE); - kvm_free_hwpgd(kvm); + kvm_free_hwpgd((unsigned long)kvm_get_hwpgd(kvm)); if (KVM_PREALLOC_LEVEL > 0) kfree(kvm->arch.pgd); - else - free_pages((unsigned long)kvm->arch.pgd, S2_PGD_ORDER); + kvm->arch.pgd = NULL; } diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index ab1313f..8354152 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -173,43 +173,6 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd) #define KVM_PREALLOC_LEVEL (0) #endif -/** - * kvm_prealloc_hwpgd - allocate inital table for VTTBR - * @kvm: The KVM struct pointer for the VM. - * @pgd: The kernel pseudo pgd - * - * When the kernel uses more levels of page tables than the guest, we allocate - * a fake PGD and pre-populate it to point to the next-level page table, which - * will be the real initial page table pointed to by the VTTBR. - * - * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and - * the kernel will use folded pud. When KVM_PREALLOC_LEVEL==1, we - * allocate 2 consecutive PUD pages. - */ -static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) -{ - unsigned int i; - unsigned long hwpgd; - - if (KVM_PREALLOC_LEVEL == 0) - return 0; - - hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, PTRS_PER_S2_PGD_SHIFT); - if (!hwpgd) - return -ENOMEM; - - for (i = 0; i < PTRS_PER_S2_PGD; i++) { - if (KVM_PREALLOC_LEVEL == 1) - pgd_populate(NULL, pgd + i, - (pud_t *)hwpgd + i * PTRS_PER_PUD); - else if (KVM_PREALLOC_LEVEL == 2) - pud_populate(NULL, pud_offset(pgd, 0) + i, - (pmd_t *)hwpgd + i * PTRS_PER_PMD); - } - - return 0; -} - static inline void *kvm_get_hwpgd(struct kvm *kvm) { pgd_t *pgd = kvm->arch.pgd; @@ -226,12 +189,11 @@ static inline void *kvm_get_hwpgd(struct kvm *kvm) return pmd_offset(pud, 0); } -static inline void kvm_free_hwpgd(struct kvm *kvm) +static inline unsigned int kvm_get_hwpgd_order(void) { - if (KVM_PREALLOC_LEVEL > 0) { - unsigned long hwpgd = (unsigned long)kvm_get_hwpgd(kvm); - free_pages(hwpgd, PTRS_PER_S2_PGD_SHIFT); - } + if (KVM_PREALLOC_LEVEL > 0) + return PTRS_PER_S2_PGD_SHIFT; + return S2_PGD_ORDER; } static inline bool kvm_page_empty(void *ptr) -- 2.1.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting 2015-03-10 19:06 ` [PATCH v2 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting Marc Zyngier @ 2015-03-11 12:16 ` Christoffer Dall 2015-03-11 13:13 ` Marc Zyngier 0 siblings, 1 reply; 10+ messages in thread From: Christoffer Dall @ 2015-03-11 12:16 UTC (permalink / raw) To: linux-arm-kernel Hi Marc, On Tue, Mar 10, 2015 at 07:06:59PM +0000, Marc Zyngier wrote: > We're using __get_free_pages with to allocate the guest's stage-2 > PGD. The standard behaviour of this function is to return a set of > pages where only the head page has a valid refcount. > > This behaviour gets us into trouble when we're trying to increment > the refcount on a non-head page: > > page:ffff7c00cfb693c0 count:0 mapcount:0 mapping: (null) index:0x0 > flags: 0x4000000000000000() > page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0) > BUG: failure at include/linux/mm.h:548/get_page()! > Kernel panic - not syncing: BUG! > CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825 > Hardware name: APM X-Gene Mustang board (DT) > Call trace: > [<ffff80000008a09c>] dump_backtrace+0x0/0x13c > [<ffff80000008a1e8>] show_stack+0x10/0x1c > [<ffff800000691da8>] dump_stack+0x74/0x94 > [<ffff800000690d78>] panic+0x100/0x240 > [<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc > [<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0 > [<ffff8000000a420c>] handle_exit+0x58/0x180 > [<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c > [<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754 > [<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8 > [<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78 > CPU0: stopping > > A possible approach for this is to split the compound page using > split_page() at allocation time, and change the teardown path to > free one page at a time. > > While we're at it, the PGD allocation code is reworked to reduce > duplication. > > This has been tested on an X-Gene platform with a 4kB/48bit-VA host > kernel, and kvmtool hacked to place memory in the second page of > the hardware PGD (PUD for the host kernel). Also regression-tested > on a Cubietruck (Cortex-A7). > > Reported-by: Mark Rutland <mark.rutland@arm.com> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > --- > arch/arm/include/asm/kvm_mmu.h | 9 ++--- > arch/arm/kvm/mmu.c | 77 +++++++++++++++++++++++++++++++--------- > arch/arm64/include/asm/kvm_mmu.h | 46 +++--------------------- > 3 files changed, 66 insertions(+), 66 deletions(-) > > diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h > index 0187606..ff56f91 100644 > --- a/arch/arm/include/asm/kvm_mmu.h > +++ b/arch/arm/include/asm/kvm_mmu.h > @@ -162,18 +162,13 @@ static inline bool kvm_page_empty(void *ptr) > > #define KVM_PREALLOC_LEVEL 0 > > -static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > -{ > - return 0; > -} > - > -static inline void kvm_free_hwpgd(struct kvm *kvm) { } > - > static inline void *kvm_get_hwpgd(struct kvm *kvm) > { > return kvm->arch.pgd; > } > > +static inline unsigned int kvm_get_hwpgd_order(void) { return S2_PGD_ORDER; } > + > struct kvm; > > #define kvm_flush_dcache_to_poc(a,l) __cpuc_flush_dcache_area((a), (l)) > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > index 69c2b4c..0a5457c 100644 > --- a/arch/arm/kvm/mmu.c > +++ b/arch/arm/kvm/mmu.c > @@ -634,6 +634,31 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr) > __phys_to_pfn(phys_addr), PAGE_HYP_DEVICE); > } > > +/* Free the HW pgd, one page at a time */ > +static void kvm_free_hwpgd(unsigned long hwpgd) > +{ > + int i; > + > + for (i = 0; i < (1 << kvm_get_hwpgd_order()); i += PAGE_SIZE) > + free_page(hwpgd + i); > +} > + > +/* Allocate the HW PGD, making sure that each page gets its own refcount */ > +static int kvm_alloc_hwpgd(unsigned long *hwpgdp) I think this can be simplified somewhat by just returning an unsigned long that can be 0 in the error case which will make the caller look a little nicer too. > +{ > + unsigned long hwpgd; > + unsigned int order = kvm_get_hwpgd_order(); > + > + hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); > + if (!hwpgd) > + return -ENOMEM; > + > + split_page(virt_to_page((void *)hwpgd), order); nit: alloc_pages_exact() and free_pages_exact() seem to do this for us, is it worth using those instead? It would look something like this on top of your changes: arch/arm/include/asm/kvm_mmu.h | 5 ++++- arch/arm/kvm/mmu.c | 32 ++++++++++---------------------- arch/arm64/include/asm/kvm_mmu.h | 6 +++--- 3 files changed, 17 insertions(+), 26 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index fc05ba8..4cf48c3 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -168,7 +168,10 @@ static inline void *kvm_get_hwpgd(struct kvm *kvm) return kvm->arch.pgd; } -static inline unsigned int kvm_get_hwpgd_order(void) { return S2_PGD_ORDER; } +static inline unsigned int kvm_get_hwpgd_size(void) +{ + return PTRS_PER_S2_PGD * sizeof(pgd_t); +} struct kvm; diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 8e91bea3..5656d79 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -633,28 +633,17 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr) } /* Free the HW pgd, one page at a time */ -static void kvm_free_hwpgd(unsigned long hwpgd) +static void kvm_free_hwpgd(void *hwpgd) { - int i; - - for (i = 0; i < (1 << kvm_get_hwpgd_order()); i += PAGE_SIZE) - free_page(hwpgd + i); + free_pages_exact(hwpgd, kvm_get_hwpgd_size()); } /* Allocate the HW PGD, making sure that each page gets its own refcount */ -static int kvm_alloc_hwpgd(unsigned long *hwpgdp) +static void *kvm_alloc_hwpgd(void) { - unsigned long hwpgd; - unsigned int order = kvm_get_hwpgd_order(); - - hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); - if (!hwpgd) - return -ENOMEM; - - split_page(virt_to_page((void *)hwpgd), order); + unsigned int size = kvm_get_hwpgd_size(); - *hwpgdp = hwpgd; - return 0; + return alloc_pages_exact(size, GFP_KERNEL | __GFP_ZERO); } /** @@ -670,18 +659,17 @@ static int kvm_alloc_hwpgd(unsigned long *hwpgdp) */ int kvm_alloc_stage2_pgd(struct kvm *kvm) { - int ret; pgd_t *pgd; - unsigned long hwpgd; + void *hwpgd; if (kvm->arch.pgd != NULL) { kvm_err("kvm_arch already initialized?\n"); return -EINVAL; } - ret = kvm_alloc_hwpgd(&hwpgd); - if (ret) - return ret; + hwpgd = kvm_alloc_hwpgd(); + if (!hwpgd) + return -ENOMEM; /* When the kernel uses more levels of page tables than the * guest, we allocate a fake PGD and pre-populate it to point @@ -829,7 +817,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm) return; unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE); - kvm_free_hwpgd((unsigned long)kvm_get_hwpgd(kvm)); + kvm_free_hwpgd(kvm_get_hwpgd(kvm)); if (KVM_PREALLOC_LEVEL > 0) kfree(kvm->arch.pgd); diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 3668110..bbfb600 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -189,11 +189,11 @@ static inline void *kvm_get_hwpgd(struct kvm *kvm) return pmd_offset(pud, 0); } -static inline unsigned int kvm_get_hwpgd_order(void) +static inline unsigned int kvm_get_hwpgd_size(void) { if (KVM_PREALLOC_LEVEL > 0) - return PTRS_PER_S2_PGD_SHIFT; - return S2_PGD_ORDER; + return PTRS_PER_S2_PGD * PAGE_SIZE; + return PTRS_PER_S2_PGD * sizeof(pgd_t); } static inline bool kvm_page_empty(void *ptr) ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting 2015-03-11 12:16 ` Christoffer Dall @ 2015-03-11 13:13 ` Marc Zyngier 2015-03-11 13:20 ` Christoffer Dall 0 siblings, 1 reply; 10+ messages in thread From: Marc Zyngier @ 2015-03-11 13:13 UTC (permalink / raw) To: linux-arm-kernel Hi Christoffer, On 11/03/15 12:16, Christoffer Dall wrote: > Hi Marc, > > On Tue, Mar 10, 2015 at 07:06:59PM +0000, Marc Zyngier wrote: >> We're using __get_free_pages with to allocate the guest's stage-2 >> PGD. The standard behaviour of this function is to return a set of >> pages where only the head page has a valid refcount. >> >> This behaviour gets us into trouble when we're trying to increment >> the refcount on a non-head page: >> >> page:ffff7c00cfb693c0 count:0 mapcount:0 mapping: (null) index:0x0 >> flags: 0x4000000000000000() >> page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0) >> BUG: failure at include/linux/mm.h:548/get_page()! >> Kernel panic - not syncing: BUG! >> CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825 >> Hardware name: APM X-Gene Mustang board (DT) >> Call trace: >> [<ffff80000008a09c>] dump_backtrace+0x0/0x13c >> [<ffff80000008a1e8>] show_stack+0x10/0x1c >> [<ffff800000691da8>] dump_stack+0x74/0x94 >> [<ffff800000690d78>] panic+0x100/0x240 >> [<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc >> [<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0 >> [<ffff8000000a420c>] handle_exit+0x58/0x180 >> [<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c >> [<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754 >> [<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8 >> [<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78 >> CPU0: stopping >> >> A possible approach for this is to split the compound page using >> split_page() at allocation time, and change the teardown path to >> free one page at a time. >> >> While we're at it, the PGD allocation code is reworked to reduce >> duplication. >> >> This has been tested on an X-Gene platform with a 4kB/48bit-VA host >> kernel, and kvmtool hacked to place memory in the second page of >> the hardware PGD (PUD for the host kernel). Also regression-tested >> on a Cubietruck (Cortex-A7). >> >> Reported-by: Mark Rutland <mark.rutland@arm.com> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> >> --- >> arch/arm/include/asm/kvm_mmu.h | 9 ++--- >> arch/arm/kvm/mmu.c | 77 +++++++++++++++++++++++++++++++--------- >> arch/arm64/include/asm/kvm_mmu.h | 46 +++--------------------- >> 3 files changed, 66 insertions(+), 66 deletions(-) >> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h >> index 0187606..ff56f91 100644 >> --- a/arch/arm/include/asm/kvm_mmu.h >> +++ b/arch/arm/include/asm/kvm_mmu.h >> @@ -162,18 +162,13 @@ static inline bool kvm_page_empty(void *ptr) >> >> #define KVM_PREALLOC_LEVEL 0 >> >> -static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) >> -{ >> - return 0; >> -} >> - >> -static inline void kvm_free_hwpgd(struct kvm *kvm) { } >> - >> static inline void *kvm_get_hwpgd(struct kvm *kvm) >> { >> return kvm->arch.pgd; >> } >> >> +static inline unsigned int kvm_get_hwpgd_order(void) { return S2_PGD_ORDER; } >> + >> struct kvm; >> >> #define kvm_flush_dcache_to_poc(a,l) __cpuc_flush_dcache_area((a), (l)) >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> index 69c2b4c..0a5457c 100644 >> --- a/arch/arm/kvm/mmu.c >> +++ b/arch/arm/kvm/mmu.c >> @@ -634,6 +634,31 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr) >> __phys_to_pfn(phys_addr), PAGE_HYP_DEVICE); >> } >> >> +/* Free the HW pgd, one page at a time */ >> +static void kvm_free_hwpgd(unsigned long hwpgd) >> +{ >> + int i; >> + >> + for (i = 0; i < (1 << kvm_get_hwpgd_order()); i += PAGE_SIZE) >> + free_page(hwpgd + i); >> +} >> + >> +/* Allocate the HW PGD, making sure that each page gets its own refcount */ >> +static int kvm_alloc_hwpgd(unsigned long *hwpgdp) > > I think this can be simplified somewhat by just returning an unsigned > long that can be 0 in the error case which will make the caller look > a little nicer too. Sure. >> +{ >> + unsigned long hwpgd; >> + unsigned int order = kvm_get_hwpgd_order(); >> + >> + hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); >> + if (!hwpgd) >> + return -ENOMEM; >> + >> + split_page(virt_to_page((void *)hwpgd), order); > > nit: alloc_pages_exact() and free_pages_exact() seem to do this for us, > is it worth using those instead? Ah, I didn't know about these. > It would look something like this on top of your changes: > > > arch/arm/include/asm/kvm_mmu.h | 5 ++++- > arch/arm/kvm/mmu.c | 32 ++++++++++---------------------- > arch/arm64/include/asm/kvm_mmu.h | 6 +++--- > 3 files changed, 17 insertions(+), 26 deletions(-) > > diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h > index fc05ba8..4cf48c3 100644 > --- a/arch/arm/include/asm/kvm_mmu.h > +++ b/arch/arm/include/asm/kvm_mmu.h > @@ -168,7 +168,10 @@ static inline void *kvm_get_hwpgd(struct kvm *kvm) > return kvm->arch.pgd; > } > > -static inline unsigned int kvm_get_hwpgd_order(void) { return S2_PGD_ORDER; } > +static inline unsigned int kvm_get_hwpgd_size(void) > +{ > + return PTRS_PER_S2_PGD * sizeof(pgd_t); > +} > > struct kvm; > > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > index 8e91bea3..5656d79 100644 > --- a/arch/arm/kvm/mmu.c > +++ b/arch/arm/kvm/mmu.c > @@ -633,28 +633,17 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr) > } > > /* Free the HW pgd, one page at a time */ > -static void kvm_free_hwpgd(unsigned long hwpgd) > +static void kvm_free_hwpgd(void *hwpgd) > { > - int i; > - > - for (i = 0; i < (1 << kvm_get_hwpgd_order()); i += PAGE_SIZE) > - free_page(hwpgd + i); > + free_pages_exact(hwpgd, kvm_get_hwpgd_size()); > } > > /* Allocate the HW PGD, making sure that each page gets its own refcount */ > -static int kvm_alloc_hwpgd(unsigned long *hwpgdp) > +static void *kvm_alloc_hwpgd(void) > { > - unsigned long hwpgd; > - unsigned int order = kvm_get_hwpgd_order(); > - > - hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); > - if (!hwpgd) > - return -ENOMEM; > - > - split_page(virt_to_page((void *)hwpgd), order); > + unsigned int size = kvm_get_hwpgd_size(); > > - *hwpgdp = hwpgd; > - return 0; > + return alloc_pages_exact(size, GFP_KERNEL | __GFP_ZERO); > } > > /** > @@ -670,18 +659,17 @@ static int kvm_alloc_hwpgd(unsigned long *hwpgdp) > */ > int kvm_alloc_stage2_pgd(struct kvm *kvm) > { > - int ret; > pgd_t *pgd; > - unsigned long hwpgd; > + void *hwpgd; > > if (kvm->arch.pgd != NULL) { > kvm_err("kvm_arch already initialized?\n"); > return -EINVAL; > } > > - ret = kvm_alloc_hwpgd(&hwpgd); > - if (ret) > - return ret; > + hwpgd = kvm_alloc_hwpgd(); > + if (!hwpgd) > + return -ENOMEM; > > /* When the kernel uses more levels of page tables than the > * guest, we allocate a fake PGD and pre-populate it to point > @@ -829,7 +817,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm) > return; > > unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE); > - kvm_free_hwpgd((unsigned long)kvm_get_hwpgd(kvm)); > + kvm_free_hwpgd(kvm_get_hwpgd(kvm)); > if (KVM_PREALLOC_LEVEL > 0) > kfree(kvm->arch.pgd); > > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h > index 3668110..bbfb600 100644 > --- a/arch/arm64/include/asm/kvm_mmu.h > +++ b/arch/arm64/include/asm/kvm_mmu.h > @@ -189,11 +189,11 @@ static inline void *kvm_get_hwpgd(struct kvm *kvm) > return pmd_offset(pud, 0); > } > > -static inline unsigned int kvm_get_hwpgd_order(void) > +static inline unsigned int kvm_get_hwpgd_size(void) > { > if (KVM_PREALLOC_LEVEL > 0) > - return PTRS_PER_S2_PGD_SHIFT; > - return S2_PGD_ORDER; > + return PTRS_PER_S2_PGD * PAGE_SIZE; > + return PTRS_PER_S2_PGD * sizeof(pgd_t); > } > > static inline bool kvm_page_empty(void *ptr) Looks good to me. I'll fold that into the patch, amend the commit message and resend. Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting 2015-03-11 13:13 ` Marc Zyngier @ 2015-03-11 13:20 ` Christoffer Dall 2015-03-11 13:56 ` Marc Zyngier 2015-03-11 14:03 ` Andrew Jones 0 siblings, 2 replies; 10+ messages in thread From: Christoffer Dall @ 2015-03-11 13:20 UTC (permalink / raw) To: linux-arm-kernel On Wed, Mar 11, 2015 at 2:13 PM, Marc Zyngier <marc.zyngier@arm.com> wrote: > Hi Christoffer, > > On 11/03/15 12:16, Christoffer Dall wrote: >> Hi Marc, >> >> On Tue, Mar 10, 2015 at 07:06:59PM +0000, Marc Zyngier wrote: >>> We're using __get_free_pages with to allocate the guest's stage-2 >>> PGD. The standard behaviour of this function is to return a set of >>> pages where only the head page has a valid refcount. >>> >>> This behaviour gets us into trouble when we're trying to increment >>> the refcount on a non-head page: >>> >>> page:ffff7c00cfb693c0 count:0 mapcount:0 mapping: (null) index:0x0 >>> flags: 0x4000000000000000() >>> page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0) >>> BUG: failure at include/linux/mm.h:548/get_page()! >>> Kernel panic - not syncing: BUG! >>> CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825 >>> Hardware name: APM X-Gene Mustang board (DT) >>> Call trace: >>> [<ffff80000008a09c>] dump_backtrace+0x0/0x13c >>> [<ffff80000008a1e8>] show_stack+0x10/0x1c >>> [<ffff800000691da8>] dump_stack+0x74/0x94 >>> [<ffff800000690d78>] panic+0x100/0x240 >>> [<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc >>> [<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0 >>> [<ffff8000000a420c>] handle_exit+0x58/0x180 >>> [<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c >>> [<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754 >>> [<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8 >>> [<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78 >>> CPU0: stopping >>> >>> A possible approach for this is to split the compound page using >>> split_page() at allocation time, and change the teardown path to >>> free one page at a time. >>> >>> While we're at it, the PGD allocation code is reworked to reduce >>> duplication. >>> >>> This has been tested on an X-Gene platform with a 4kB/48bit-VA host >>> kernel, and kvmtool hacked to place memory in the second page of >>> the hardware PGD (PUD for the host kernel). Also regression-tested >>> on a Cubietruck (Cortex-A7). >>> >>> Reported-by: Mark Rutland <mark.rutland@arm.com> >>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> >>> --- >>> arch/arm/include/asm/kvm_mmu.h | 9 ++--- >>> arch/arm/kvm/mmu.c | 77 +++++++++++++++++++++++++++++++--------- >>> arch/arm64/include/asm/kvm_mmu.h | 46 +++--------------------- >>> 3 files changed, 66 insertions(+), 66 deletions(-) >>> >>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h >>> index 0187606..ff56f91 100644 >>> --- a/arch/arm/include/asm/kvm_mmu.h >>> +++ b/arch/arm/include/asm/kvm_mmu.h >>> @@ -162,18 +162,13 @@ static inline bool kvm_page_empty(void *ptr) >>> >>> #define KVM_PREALLOC_LEVEL 0 >>> >>> -static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) >>> -{ >>> - return 0; >>> -} >>> - >>> -static inline void kvm_free_hwpgd(struct kvm *kvm) { } >>> - >>> static inline void *kvm_get_hwpgd(struct kvm *kvm) >>> { >>> return kvm->arch.pgd; >>> } >>> >>> +static inline unsigned int kvm_get_hwpgd_order(void) { return S2_PGD_ORDER; } >>> + >>> struct kvm; >>> >>> #define kvm_flush_dcache_to_poc(a,l) __cpuc_flush_dcache_area((a), (l)) >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >>> index 69c2b4c..0a5457c 100644 >>> --- a/arch/arm/kvm/mmu.c >>> +++ b/arch/arm/kvm/mmu.c >>> @@ -634,6 +634,31 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr) >>> __phys_to_pfn(phys_addr), PAGE_HYP_DEVICE); >>> } >>> >>> +/* Free the HW pgd, one page at a time */ >>> +static void kvm_free_hwpgd(unsigned long hwpgd) >>> +{ >>> + int i; >>> + >>> + for (i = 0; i < (1 << kvm_get_hwpgd_order()); i += PAGE_SIZE) >>> + free_page(hwpgd + i); >>> +} >>> + >>> +/* Allocate the HW PGD, making sure that each page gets its own refcount */ >>> +static int kvm_alloc_hwpgd(unsigned long *hwpgdp) >> >> I think this can be simplified somewhat by just returning an unsigned >> long that can be 0 in the error case which will make the caller look >> a little nicer too. > > Sure. > >>> +{ >>> + unsigned long hwpgd; >>> + unsigned int order = kvm_get_hwpgd_order(); >>> + >>> + hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); >>> + if (!hwpgd) >>> + return -ENOMEM; >>> + >>> + split_page(virt_to_page((void *)hwpgd), order); >> >> nit: alloc_pages_exact() and free_pages_exact() seem to do this for us, >> is it worth using those instead? > > Ah, I didn't know about these. > >> It would look something like this on top of your changes: >> >> >> arch/arm/include/asm/kvm_mmu.h | 5 ++++- >> arch/arm/kvm/mmu.c | 32 ++++++++++---------------------- >> arch/arm64/include/asm/kvm_mmu.h | 6 +++--- >> 3 files changed, 17 insertions(+), 26 deletions(-) >> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h >> index fc05ba8..4cf48c3 100644 >> --- a/arch/arm/include/asm/kvm_mmu.h >> +++ b/arch/arm/include/asm/kvm_mmu.h >> @@ -168,7 +168,10 @@ static inline void *kvm_get_hwpgd(struct kvm *kvm) >> return kvm->arch.pgd; >> } >> >> -static inline unsigned int kvm_get_hwpgd_order(void) { return S2_PGD_ORDER; } >> +static inline unsigned int kvm_get_hwpgd_size(void) >> +{ >> + return PTRS_PER_S2_PGD * sizeof(pgd_t); >> +} >> >> struct kvm; >> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> index 8e91bea3..5656d79 100644 >> --- a/arch/arm/kvm/mmu.c >> +++ b/arch/arm/kvm/mmu.c >> @@ -633,28 +633,17 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr) >> } >> >> /* Free the HW pgd, one page at a time */ >> -static void kvm_free_hwpgd(unsigned long hwpgd) >> +static void kvm_free_hwpgd(void *hwpgd) >> { >> - int i; >> - >> - for (i = 0; i < (1 << kvm_get_hwpgd_order()); i += PAGE_SIZE) >> - free_page(hwpgd + i); >> + free_pages_exact(hwpgd, kvm_get_hwpgd_size()); >> } >> >> /* Allocate the HW PGD, making sure that each page gets its own refcount */ >> -static int kvm_alloc_hwpgd(unsigned long *hwpgdp) >> +static void *kvm_alloc_hwpgd(void) >> { >> - unsigned long hwpgd; >> - unsigned int order = kvm_get_hwpgd_order(); >> - >> - hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); >> - if (!hwpgd) >> - return -ENOMEM; >> - >> - split_page(virt_to_page((void *)hwpgd), order); >> + unsigned int size = kvm_get_hwpgd_size(); >> >> - *hwpgdp = hwpgd; >> - return 0; >> + return alloc_pages_exact(size, GFP_KERNEL | __GFP_ZERO); >> } >> >> /** >> @@ -670,18 +659,17 @@ static int kvm_alloc_hwpgd(unsigned long *hwpgdp) >> */ >> int kvm_alloc_stage2_pgd(struct kvm *kvm) >> { >> - int ret; >> pgd_t *pgd; >> - unsigned long hwpgd; >> + void *hwpgd; >> >> if (kvm->arch.pgd != NULL) { >> kvm_err("kvm_arch already initialized?\n"); >> return -EINVAL; >> } >> >> - ret = kvm_alloc_hwpgd(&hwpgd); >> - if (ret) >> - return ret; >> + hwpgd = kvm_alloc_hwpgd(); >> + if (!hwpgd) >> + return -ENOMEM; >> >> /* When the kernel uses more levels of page tables than the >> * guest, we allocate a fake PGD and pre-populate it to point >> @@ -829,7 +817,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm) >> return; >> >> unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE); >> - kvm_free_hwpgd((unsigned long)kvm_get_hwpgd(kvm)); >> + kvm_free_hwpgd(kvm_get_hwpgd(kvm)); >> if (KVM_PREALLOC_LEVEL > 0) >> kfree(kvm->arch.pgd); >> >> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h >> index 3668110..bbfb600 100644 >> --- a/arch/arm64/include/asm/kvm_mmu.h >> +++ b/arch/arm64/include/asm/kvm_mmu.h >> @@ -189,11 +189,11 @@ static inline void *kvm_get_hwpgd(struct kvm *kvm) >> return pmd_offset(pud, 0); >> } >> >> -static inline unsigned int kvm_get_hwpgd_order(void) >> +static inline unsigned int kvm_get_hwpgd_size(void) >> { >> if (KVM_PREALLOC_LEVEL > 0) >> - return PTRS_PER_S2_PGD_SHIFT; >> - return S2_PGD_ORDER; >> + return PTRS_PER_S2_PGD * PAGE_SIZE; >> + return PTRS_PER_S2_PGD * sizeof(pgd_t); >> } >> >> static inline bool kvm_page_empty(void *ptr) > > Looks good to me. I'll fold that into the patch, amend the commit > message and resend. > I've done that here: https://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git/shortlog/refs/heads/pgd-fixes-alt (except for the commit message, which I can also adjust). I gave this a quick spin on Juno and TC2 (but couldn't test 64K pages because I don't have a platform for that handy this very second)... -Christoffer ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting 2015-03-11 13:20 ` Christoffer Dall @ 2015-03-11 13:56 ` Marc Zyngier 2015-03-11 14:03 ` Andrew Jones 1 sibling, 0 replies; 10+ messages in thread From: Marc Zyngier @ 2015-03-11 13:56 UTC (permalink / raw) To: linux-arm-kernel On 11/03/15 13:20, Christoffer Dall wrote: > On Wed, Mar 11, 2015 at 2:13 PM, Marc Zyngier <marc.zyngier@arm.com> wrote: >> Looks good to me. I'll fold that into the patch, amend the commit >> message and resend. >> > > I've done that here: > https://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git/shortlog/refs/heads/pgd-fixes-alt > > (except for the commit message, which I can also adjust). Looks great, thanks. > I gave this a quick spin on Juno and TC2 (but couldn't test 64K pages > because I don't have a platform for that handy this very second)... I can do that if you want. M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting 2015-03-11 13:20 ` Christoffer Dall 2015-03-11 13:56 ` Marc Zyngier @ 2015-03-11 14:03 ` Andrew Jones 2015-03-11 14:25 ` Christoffer Dall 1 sibling, 1 reply; 10+ messages in thread From: Andrew Jones @ 2015-03-11 14:03 UTC (permalink / raw) To: linux-arm-kernel On Wed, Mar 11, 2015 at 02:20:56PM +0100, Christoffer Dall wrote: > On Wed, Mar 11, 2015 at 2:13 PM, Marc Zyngier <marc.zyngier@arm.com> wrote: > > Hi Christoffer, > > > > On 11/03/15 12:16, Christoffer Dall wrote: > >> Hi Marc, > >> > >> On Tue, Mar 10, 2015 at 07:06:59PM +0000, Marc Zyngier wrote: > >>> We're using __get_free_pages with to allocate the guest's stage-2 > >>> PGD. The standard behaviour of this function is to return a set of > >>> pages where only the head page has a valid refcount. > >>> > >>> This behaviour gets us into trouble when we're trying to increment > >>> the refcount on a non-head page: > >>> > >>> page:ffff7c00cfb693c0 count:0 mapcount:0 mapping: (null) index:0x0 > >>> flags: 0x4000000000000000() > >>> page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0) > >>> BUG: failure at include/linux/mm.h:548/get_page()! > >>> Kernel panic - not syncing: BUG! > >>> CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825 > >>> Hardware name: APM X-Gene Mustang board (DT) > >>> Call trace: > >>> [<ffff80000008a09c>] dump_backtrace+0x0/0x13c > >>> [<ffff80000008a1e8>] show_stack+0x10/0x1c > >>> [<ffff800000691da8>] dump_stack+0x74/0x94 > >>> [<ffff800000690d78>] panic+0x100/0x240 > >>> [<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc > >>> [<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0 > >>> [<ffff8000000a420c>] handle_exit+0x58/0x180 > >>> [<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c > >>> [<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754 > >>> [<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8 > >>> [<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78 > >>> CPU0: stopping > >>> > >>> A possible approach for this is to split the compound page using > >>> split_page() at allocation time, and change the teardown path to > >>> free one page at a time. > >>> > >>> While we're at it, the PGD allocation code is reworked to reduce > >>> duplication. > >>> > >>> This has been tested on an X-Gene platform with a 4kB/48bit-VA host > >>> kernel, and kvmtool hacked to place memory in the second page of > >>> the hardware PGD (PUD for the host kernel). Also regression-tested > >>> on a Cubietruck (Cortex-A7). > >>> > >>> Reported-by: Mark Rutland <mark.rutland@arm.com> > >>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > >>> --- > >>> arch/arm/include/asm/kvm_mmu.h | 9 ++--- > >>> arch/arm/kvm/mmu.c | 77 +++++++++++++++++++++++++++++++--------- > >>> arch/arm64/include/asm/kvm_mmu.h | 46 +++--------------------- > >>> 3 files changed, 66 insertions(+), 66 deletions(-) > >>> > >>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h > >>> index 0187606..ff56f91 100644 > >>> --- a/arch/arm/include/asm/kvm_mmu.h > >>> +++ b/arch/arm/include/asm/kvm_mmu.h > >>> @@ -162,18 +162,13 @@ static inline bool kvm_page_empty(void *ptr) > >>> > >>> #define KVM_PREALLOC_LEVEL 0 > >>> > >>> -static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > >>> -{ > >>> - return 0; > >>> -} > >>> - > >>> -static inline void kvm_free_hwpgd(struct kvm *kvm) { } > >>> - > >>> static inline void *kvm_get_hwpgd(struct kvm *kvm) > >>> { > >>> return kvm->arch.pgd; > >>> } > >>> > >>> +static inline unsigned int kvm_get_hwpgd_order(void) { return S2_PGD_ORDER; } > >>> + > >>> struct kvm; > >>> > >>> #define kvm_flush_dcache_to_poc(a,l) __cpuc_flush_dcache_area((a), (l)) > >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > >>> index 69c2b4c..0a5457c 100644 > >>> --- a/arch/arm/kvm/mmu.c > >>> +++ b/arch/arm/kvm/mmu.c > >>> @@ -634,6 +634,31 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr) > >>> __phys_to_pfn(phys_addr), PAGE_HYP_DEVICE); > >>> } > >>> > >>> +/* Free the HW pgd, one page at a time */ > >>> +static void kvm_free_hwpgd(unsigned long hwpgd) > >>> +{ > >>> + int i; > >>> + > >>> + for (i = 0; i < (1 << kvm_get_hwpgd_order()); i += PAGE_SIZE) > >>> + free_page(hwpgd + i); > >>> +} > >>> + > >>> +/* Allocate the HW PGD, making sure that each page gets its own refcount */ > >>> +static int kvm_alloc_hwpgd(unsigned long *hwpgdp) > >> > >> I think this can be simplified somewhat by just returning an unsigned > >> long that can be 0 in the error case which will make the caller look > >> a little nicer too. > > > > Sure. > > > >>> +{ > >>> + unsigned long hwpgd; > >>> + unsigned int order = kvm_get_hwpgd_order(); > >>> + > >>> + hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); > >>> + if (!hwpgd) > >>> + return -ENOMEM; > >>> + > >>> + split_page(virt_to_page((void *)hwpgd), order); > >> > >> nit: alloc_pages_exact() and free_pages_exact() seem to do this for us, > >> is it worth using those instead? > > > > Ah, I didn't know about these. > > > >> It would look something like this on top of your changes: > >> > >> > >> arch/arm/include/asm/kvm_mmu.h | 5 ++++- > >> arch/arm/kvm/mmu.c | 32 ++++++++++---------------------- > >> arch/arm64/include/asm/kvm_mmu.h | 6 +++--- > >> 3 files changed, 17 insertions(+), 26 deletions(-) > >> > >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h > >> index fc05ba8..4cf48c3 100644 > >> --- a/arch/arm/include/asm/kvm_mmu.h > >> +++ b/arch/arm/include/asm/kvm_mmu.h > >> @@ -168,7 +168,10 @@ static inline void *kvm_get_hwpgd(struct kvm *kvm) > >> return kvm->arch.pgd; > >> } > >> > >> -static inline unsigned int kvm_get_hwpgd_order(void) { return S2_PGD_ORDER; } > >> +static inline unsigned int kvm_get_hwpgd_size(void) > >> +{ > >> + return PTRS_PER_S2_PGD * sizeof(pgd_t); > >> +} > >> > >> struct kvm; > >> > >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > >> index 8e91bea3..5656d79 100644 > >> --- a/arch/arm/kvm/mmu.c > >> +++ b/arch/arm/kvm/mmu.c > >> @@ -633,28 +633,17 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr) > >> } > >> > >> /* Free the HW pgd, one page at a time */ > >> -static void kvm_free_hwpgd(unsigned long hwpgd) > >> +static void kvm_free_hwpgd(void *hwpgd) > >> { > >> - int i; > >> - > >> - for (i = 0; i < (1 << kvm_get_hwpgd_order()); i += PAGE_SIZE) > >> - free_page(hwpgd + i); > >> + free_pages_exact(hwpgd, kvm_get_hwpgd_size()); > >> } > >> > >> /* Allocate the HW PGD, making sure that each page gets its own refcount */ > >> -static int kvm_alloc_hwpgd(unsigned long *hwpgdp) > >> +static void *kvm_alloc_hwpgd(void) > >> { > >> - unsigned long hwpgd; > >> - unsigned int order = kvm_get_hwpgd_order(); > >> - > >> - hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); > >> - if (!hwpgd) > >> - return -ENOMEM; > >> - > >> - split_page(virt_to_page((void *)hwpgd), order); > >> + unsigned int size = kvm_get_hwpgd_size(); > >> > >> - *hwpgdp = hwpgd; > >> - return 0; > >> + return alloc_pages_exact(size, GFP_KERNEL | __GFP_ZERO); > >> } > >> > >> /** > >> @@ -670,18 +659,17 @@ static int kvm_alloc_hwpgd(unsigned long *hwpgdp) > >> */ > >> int kvm_alloc_stage2_pgd(struct kvm *kvm) > >> { > >> - int ret; > >> pgd_t *pgd; > >> - unsigned long hwpgd; > >> + void *hwpgd; > >> > >> if (kvm->arch.pgd != NULL) { > >> kvm_err("kvm_arch already initialized?\n"); > >> return -EINVAL; > >> } > >> > >> - ret = kvm_alloc_hwpgd(&hwpgd); > >> - if (ret) > >> - return ret; > >> + hwpgd = kvm_alloc_hwpgd(); > >> + if (!hwpgd) > >> + return -ENOMEM; > >> > >> /* When the kernel uses more levels of page tables than the > >> * guest, we allocate a fake PGD and pre-populate it to point > >> @@ -829,7 +817,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm) > >> return; > >> > >> unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE); > >> - kvm_free_hwpgd((unsigned long)kvm_get_hwpgd(kvm)); > >> + kvm_free_hwpgd(kvm_get_hwpgd(kvm)); > >> if (KVM_PREALLOC_LEVEL > 0) > >> kfree(kvm->arch.pgd); > >> > >> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h > >> index 3668110..bbfb600 100644 > >> --- a/arch/arm64/include/asm/kvm_mmu.h > >> +++ b/arch/arm64/include/asm/kvm_mmu.h > >> @@ -189,11 +189,11 @@ static inline void *kvm_get_hwpgd(struct kvm *kvm) > >> return pmd_offset(pud, 0); > >> } > >> > >> -static inline unsigned int kvm_get_hwpgd_order(void) > >> +static inline unsigned int kvm_get_hwpgd_size(void) > >> { > >> if (KVM_PREALLOC_LEVEL > 0) > >> - return PTRS_PER_S2_PGD_SHIFT; > >> - return S2_PGD_ORDER; > >> + return PTRS_PER_S2_PGD * PAGE_SIZE; > >> + return PTRS_PER_S2_PGD * sizeof(pgd_t); > >> } > >> > >> static inline bool kvm_page_empty(void *ptr) > > > > Looks good to me. I'll fold that into the patch, amend the commit > > message and resend. > > > > I've done that here: > https://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git/shortlog/refs/heads/pgd-fixes-alt > > (except for the commit message, which I can also adjust). > > I gave this a quick spin on Juno and TC2 (but couldn't test 64K pages > because I don't have a platform for that handy this very second)... Tested with 64K config. Works for me (test was just a guest boot) drew > > -Christoffer > _______________________________________________ > kvmarm mailing list > kvmarm at lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting 2015-03-11 14:03 ` Andrew Jones @ 2015-03-11 14:25 ` Christoffer Dall 0 siblings, 0 replies; 10+ messages in thread From: Christoffer Dall @ 2015-03-11 14:25 UTC (permalink / raw) To: linux-arm-kernel On Wed, Mar 11, 2015 at 3:03 PM, Andrew Jones <drjones@redhat.com> wrote: > On Wed, Mar 11, 2015 at 02:20:56PM +0100, Christoffer Dall wrote: >> On Wed, Mar 11, 2015 at 2:13 PM, Marc Zyngier <marc.zyngier@arm.com> wrote: >> > Hi Christoffer, >> > >> > On 11/03/15 12:16, Christoffer Dall wrote: >> >> Hi Marc, >> >> >> >> On Tue, Mar 10, 2015 at 07:06:59PM +0000, Marc Zyngier wrote: >> >>> We're using __get_free_pages with to allocate the guest's stage-2 >> >>> PGD. The standard behaviour of this function is to return a set of >> >>> pages where only the head page has a valid refcount. >> >>> >> >>> This behaviour gets us into trouble when we're trying to increment >> >>> the refcount on a non-head page: >> >>> >> >>> page:ffff7c00cfb693c0 count:0 mapcount:0 mapping: (null) index:0x0 >> >>> flags: 0x4000000000000000() >> >>> page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0) >> >>> BUG: failure at include/linux/mm.h:548/get_page()! >> >>> Kernel panic - not syncing: BUG! >> >>> CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825 >> >>> Hardware name: APM X-Gene Mustang board (DT) >> >>> Call trace: >> >>> [<ffff80000008a09c>] dump_backtrace+0x0/0x13c >> >>> [<ffff80000008a1e8>] show_stack+0x10/0x1c >> >>> [<ffff800000691da8>] dump_stack+0x74/0x94 >> >>> [<ffff800000690d78>] panic+0x100/0x240 >> >>> [<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc >> >>> [<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0 >> >>> [<ffff8000000a420c>] handle_exit+0x58/0x180 >> >>> [<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c >> >>> [<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754 >> >>> [<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8 >> >>> [<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78 >> >>> CPU0: stopping >> >>> >> >>> A possible approach for this is to split the compound page using >> >>> split_page() at allocation time, and change the teardown path to >> >>> free one page at a time. >> >>> >> >>> While we're at it, the PGD allocation code is reworked to reduce >> >>> duplication. >> >>> >> >>> This has been tested on an X-Gene platform with a 4kB/48bit-VA host >> >>> kernel, and kvmtool hacked to place memory in the second page of >> >>> the hardware PGD (PUD for the host kernel). Also regression-tested >> >>> on a Cubietruck (Cortex-A7). >> >>> >> >>> Reported-by: Mark Rutland <mark.rutland@arm.com> >> >>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> >> >>> --- >> >>> arch/arm/include/asm/kvm_mmu.h | 9 ++--- >> >>> arch/arm/kvm/mmu.c | 77 +++++++++++++++++++++++++++++++--------- >> >>> arch/arm64/include/asm/kvm_mmu.h | 46 +++--------------------- >> >>> 3 files changed, 66 insertions(+), 66 deletions(-) >> >>> >> >>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h >> >>> index 0187606..ff56f91 100644 >> >>> --- a/arch/arm/include/asm/kvm_mmu.h >> >>> +++ b/arch/arm/include/asm/kvm_mmu.h >> >>> @@ -162,18 +162,13 @@ static inline bool kvm_page_empty(void *ptr) >> >>> >> >>> #define KVM_PREALLOC_LEVEL 0 >> >>> >> >>> -static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) >> >>> -{ >> >>> - return 0; >> >>> -} >> >>> - >> >>> -static inline void kvm_free_hwpgd(struct kvm *kvm) { } >> >>> - >> >>> static inline void *kvm_get_hwpgd(struct kvm *kvm) >> >>> { >> >>> return kvm->arch.pgd; >> >>> } >> >>> >> >>> +static inline unsigned int kvm_get_hwpgd_order(void) { return S2_PGD_ORDER; } >> >>> + >> >>> struct kvm; >> >>> >> >>> #define kvm_flush_dcache_to_poc(a,l) __cpuc_flush_dcache_area((a), (l)) >> >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> >>> index 69c2b4c..0a5457c 100644 >> >>> --- a/arch/arm/kvm/mmu.c >> >>> +++ b/arch/arm/kvm/mmu.c >> >>> @@ -634,6 +634,31 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr) >> >>> __phys_to_pfn(phys_addr), PAGE_HYP_DEVICE); >> >>> } >> >>> >> >>> +/* Free the HW pgd, one page at a time */ >> >>> +static void kvm_free_hwpgd(unsigned long hwpgd) >> >>> +{ >> >>> + int i; >> >>> + >> >>> + for (i = 0; i < (1 << kvm_get_hwpgd_order()); i += PAGE_SIZE) >> >>> + free_page(hwpgd + i); >> >>> +} >> >>> + >> >>> +/* Allocate the HW PGD, making sure that each page gets its own refcount */ >> >>> +static int kvm_alloc_hwpgd(unsigned long *hwpgdp) >> >> >> >> I think this can be simplified somewhat by just returning an unsigned >> >> long that can be 0 in the error case which will make the caller look >> >> a little nicer too. >> > >> > Sure. >> > >> >>> +{ >> >>> + unsigned long hwpgd; >> >>> + unsigned int order = kvm_get_hwpgd_order(); >> >>> + >> >>> + hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); >> >>> + if (!hwpgd) >> >>> + return -ENOMEM; >> >>> + >> >>> + split_page(virt_to_page((void *)hwpgd), order); >> >> >> >> nit: alloc_pages_exact() and free_pages_exact() seem to do this for us, >> >> is it worth using those instead? >> > >> > Ah, I didn't know about these. >> > >> >> It would look something like this on top of your changes: >> >> >> >> >> >> arch/arm/include/asm/kvm_mmu.h | 5 ++++- >> >> arch/arm/kvm/mmu.c | 32 ++++++++++---------------------- >> >> arch/arm64/include/asm/kvm_mmu.h | 6 +++--- >> >> 3 files changed, 17 insertions(+), 26 deletions(-) >> >> >> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h >> >> index fc05ba8..4cf48c3 100644 >> >> --- a/arch/arm/include/asm/kvm_mmu.h >> >> +++ b/arch/arm/include/asm/kvm_mmu.h >> >> @@ -168,7 +168,10 @@ static inline void *kvm_get_hwpgd(struct kvm *kvm) >> >> return kvm->arch.pgd; >> >> } >> >> >> >> -static inline unsigned int kvm_get_hwpgd_order(void) { return S2_PGD_ORDER; } >> >> +static inline unsigned int kvm_get_hwpgd_size(void) >> >> +{ >> >> + return PTRS_PER_S2_PGD * sizeof(pgd_t); >> >> +} >> >> >> >> struct kvm; >> >> >> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> >> index 8e91bea3..5656d79 100644 >> >> --- a/arch/arm/kvm/mmu.c >> >> +++ b/arch/arm/kvm/mmu.c >> >> @@ -633,28 +633,17 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr) >> >> } >> >> >> >> /* Free the HW pgd, one page at a time */ >> >> -static void kvm_free_hwpgd(unsigned long hwpgd) >> >> +static void kvm_free_hwpgd(void *hwpgd) >> >> { >> >> - int i; >> >> - >> >> - for (i = 0; i < (1 << kvm_get_hwpgd_order()); i += PAGE_SIZE) >> >> - free_page(hwpgd + i); >> >> + free_pages_exact(hwpgd, kvm_get_hwpgd_size()); >> >> } >> >> >> >> /* Allocate the HW PGD, making sure that each page gets its own refcount */ >> >> -static int kvm_alloc_hwpgd(unsigned long *hwpgdp) >> >> +static void *kvm_alloc_hwpgd(void) >> >> { >> >> - unsigned long hwpgd; >> >> - unsigned int order = kvm_get_hwpgd_order(); >> >> - >> >> - hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); >> >> - if (!hwpgd) >> >> - return -ENOMEM; >> >> - >> >> - split_page(virt_to_page((void *)hwpgd), order); >> >> + unsigned int size = kvm_get_hwpgd_size(); >> >> >> >> - *hwpgdp = hwpgd; >> >> - return 0; >> >> + return alloc_pages_exact(size, GFP_KERNEL | __GFP_ZERO); >> >> } >> >> >> >> /** >> >> @@ -670,18 +659,17 @@ static int kvm_alloc_hwpgd(unsigned long *hwpgdp) >> >> */ >> >> int kvm_alloc_stage2_pgd(struct kvm *kvm) >> >> { >> >> - int ret; >> >> pgd_t *pgd; >> >> - unsigned long hwpgd; >> >> + void *hwpgd; >> >> >> >> if (kvm->arch.pgd != NULL) { >> >> kvm_err("kvm_arch already initialized?\n"); >> >> return -EINVAL; >> >> } >> >> >> >> - ret = kvm_alloc_hwpgd(&hwpgd); >> >> - if (ret) >> >> - return ret; >> >> + hwpgd = kvm_alloc_hwpgd(); >> >> + if (!hwpgd) >> >> + return -ENOMEM; >> >> >> >> /* When the kernel uses more levels of page tables than the >> >> * guest, we allocate a fake PGD and pre-populate it to point >> >> @@ -829,7 +817,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm) >> >> return; >> >> >> >> unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE); >> >> - kvm_free_hwpgd((unsigned long)kvm_get_hwpgd(kvm)); >> >> + kvm_free_hwpgd(kvm_get_hwpgd(kvm)); >> >> if (KVM_PREALLOC_LEVEL > 0) >> >> kfree(kvm->arch.pgd); >> >> >> >> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h >> >> index 3668110..bbfb600 100644 >> >> --- a/arch/arm64/include/asm/kvm_mmu.h >> >> +++ b/arch/arm64/include/asm/kvm_mmu.h >> >> @@ -189,11 +189,11 @@ static inline void *kvm_get_hwpgd(struct kvm *kvm) >> >> return pmd_offset(pud, 0); >> >> } >> >> >> >> -static inline unsigned int kvm_get_hwpgd_order(void) >> >> +static inline unsigned int kvm_get_hwpgd_size(void) >> >> { >> >> if (KVM_PREALLOC_LEVEL > 0) >> >> - return PTRS_PER_S2_PGD_SHIFT; >> >> - return S2_PGD_ORDER; >> >> + return PTRS_PER_S2_PGD * PAGE_SIZE; >> >> + return PTRS_PER_S2_PGD * sizeof(pgd_t); >> >> } >> >> >> >> static inline bool kvm_page_empty(void *ptr) >> > >> > Looks good to me. I'll fold that into the patch, amend the commit >> > message and resend. >> > >> >> I've done that here: >> https://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git/shortlog/refs/heads/pgd-fixes-alt >> >> (except for the commit message, which I can also adjust). >> >> I gave this a quick spin on Juno and TC2 (but couldn't test 64K pages >> because I don't have a platform for that handy this very second)... > > Tested with 64K config. Works for me (test was just a guest boot) > Thanks, applied. -Christoffer ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 2/3] arm64: KVM: Do not use pgd_index to index stage-2 pgd 2015-03-10 19:06 [PATCH v2 0/3] arm64: KVM: High memory guest fixes Marc Zyngier 2015-03-10 19:06 ` [PATCH v2 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting Marc Zyngier @ 2015-03-10 19:07 ` Marc Zyngier 2015-03-10 19:07 ` [PATCH v2 3/3] arm64: KVM: Fix outdated comment about VTCR_EL2.PS Marc Zyngier 2 siblings, 0 replies; 10+ messages in thread From: Marc Zyngier @ 2015-03-10 19:07 UTC (permalink / raw) To: linux-arm-kernel The kernel's pgd_index macro is designed to index a normal, page sized array. KVM is a bit diffferent, as we can use concatenated pages to have a bigger address space (for example 40bit IPA with 4kB pages gives us an 8kB PGD. In the above case, the use of pgd_index will always return an index inside the first 4kB, which makes a guest that has memory above 0x8000000000 rather unhappy, as it spins forever in a page fault, whist the host happilly corrupts the lower pgd. The obvious fix is to get our own kvm_pgd_index that does the right thing(tm). Tested on X-Gene with a hacked kvmtool that put memory at a stupidly high address. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> --- arch/arm/include/asm/kvm_mmu.h | 3 ++- arch/arm/kvm/mmu.c | 8 ++++---- arch/arm64/include/asm/kvm_mmu.h | 2 ++ 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index ff56f91..9486a7a 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -149,13 +149,14 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd) (__boundary - 1 < (end) - 1)? __boundary: (end); \ }) +#define kvm_pgd_index(addr) pgd_index(addr) + static inline bool kvm_page_empty(void *ptr) { struct page *ptr_page = virt_to_page(ptr); return page_count(ptr_page) == 1; } - #define kvm_pte_table_empty(kvm, ptep) kvm_page_empty(ptep) #define kvm_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp) #define kvm_pud_table_empty(kvm, pudp) (0) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 0a5457c..f99543d 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -290,7 +290,7 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp, phys_addr_t addr = start, end = start + size; phys_addr_t next; - pgd = pgdp + pgd_index(addr); + pgd = pgdp + kvm_pgd_index(addr); do { next = kvm_pgd_addr_end(addr, end); if (!pgd_none(*pgd)) @@ -355,7 +355,7 @@ static void stage2_flush_memslot(struct kvm *kvm, phys_addr_t next; pgd_t *pgd; - pgd = kvm->arch.pgd + pgd_index(addr); + pgd = kvm->arch.pgd + kvm_pgd_index(addr); do { next = kvm_pgd_addr_end(addr, end); stage2_flush_puds(kvm, pgd, addr, next); @@ -844,7 +844,7 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache pgd_t *pgd; pud_t *pud; - pgd = kvm->arch.pgd + pgd_index(addr); + pgd = kvm->arch.pgd + kvm_pgd_index(addr); if (WARN_ON(pgd_none(*pgd))) { if (!cache) return NULL; @@ -1134,7 +1134,7 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) pgd_t *pgd; phys_addr_t next; - pgd = kvm->arch.pgd + pgd_index(addr); + pgd = kvm->arch.pgd + kvm_pgd_index(addr); do { /* * Release kvm_mmu_lock periodically if the memory region is diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 8354152..d8f809f 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -160,6 +160,8 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd) #define PTRS_PER_S2_PGD (1 << PTRS_PER_S2_PGD_SHIFT) #define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) +#define kvm_pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_S2_PGD - 1)) + /* * If we are concatenating first level stage-2 page tables, we would have less * than or equal to 16 pointers in the fake PGD, because that's what the -- 2.1.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 3/3] arm64: KVM: Fix outdated comment about VTCR_EL2.PS 2015-03-10 19:06 [PATCH v2 0/3] arm64: KVM: High memory guest fixes Marc Zyngier 2015-03-10 19:06 ` [PATCH v2 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting Marc Zyngier 2015-03-10 19:07 ` [PATCH v2 2/3] arm64: KVM: Do not use pgd_index to index stage-2 pgd Marc Zyngier @ 2015-03-10 19:07 ` Marc Zyngier 2 siblings, 0 replies; 10+ messages in thread From: Marc Zyngier @ 2015-03-10 19:07 UTC (permalink / raw) To: linux-arm-kernel Commit 87366d8cf7b3 ("arm64: Add boot time configuration of Intermediate Physical Address size") removed the hardcoded setting of VTCR_EL2.PS to use ID_AA64MMFR0_EL1.PARange instead, but didn't remove the (now rather misleading) comment. Fix the comments to match reality (at least for the next few minutes). Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> --- arch/arm64/include/asm/kvm_arm.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index 94674eb..54bb4ba 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -129,6 +129,9 @@ * 40 bits wide (T0SZ = 24). Systems with a PARange smaller than 40 bits are * not known to exist and will break with this configuration. * + * VTCR_EL2.PS is extracted from ID_AA64MMFR0_EL1.PARange at boot time + * (see hyp-init.S). + * * Note that when using 4K pages, we concatenate two first level page tables * together. * @@ -138,7 +141,6 @@ #ifdef CONFIG_ARM64_64K_PAGES /* * Stage2 translation configuration: - * 40bits output (PS = 2) * 40bits input (T0SZ = 24) * 64kB pages (TG0 = 1) * 2 level page tables (SL = 1) @@ -150,7 +152,6 @@ #else /* * Stage2 translation configuration: - * 40bits output (PS = 2) * 40bits input (T0SZ = 24) * 4kB pages (TG0 = 0) * 3 level page tables (SL = 1) -- 2.1.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-03-11 14:25 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-03-10 19:06 [PATCH v2 0/3] arm64: KVM: High memory guest fixes Marc Zyngier 2015-03-10 19:06 ` [PATCH v2 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting Marc Zyngier 2015-03-11 12:16 ` Christoffer Dall 2015-03-11 13:13 ` Marc Zyngier 2015-03-11 13:20 ` Christoffer Dall 2015-03-11 13:56 ` Marc Zyngier 2015-03-11 14:03 ` Andrew Jones 2015-03-11 14:25 ` Christoffer Dall 2015-03-10 19:07 ` [PATCH v2 2/3] arm64: KVM: Do not use pgd_index to index stage-2 pgd Marc Zyngier 2015-03-10 19:07 ` [PATCH v2 3/3] arm64: KVM: Fix outdated comment about VTCR_EL2.PS Marc Zyngier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).