* [PATCH 0/3] arm64: KVM: High memory guest fixes
@ 2015-02-25 16:55 Marc Zyngier
2015-02-25 16:55 ` [PATCH 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting Marc Zyngier
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Marc Zyngier @ 2015-02-25 16:55 UTC (permalink / raw)
To: linux-arm-kernel
I've been playing with a hacked-up kvmtool that places the memory much
higher than usual, in an effort to emulate what some arm64 platforms
do (typically AMD Seattle).
Unexpectedly (hey, what could possibly go wrong...), I've discovered a
couple of mm related issues, ranging from the host exploding to the
guest being stuck on a page fault. Additionally, I noticed a minor
documentation issue.
With these fixes in, I'm able to run a guest that has its memory
located just under the 40bit limit.
Marc Zyngier (3):
arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting
arm64: KVM: Do not use pgd_index to index stage-2 pgd
arm64: KVM: Fix outdated comment about VTCR_EL2.PS
arch/arm/include/asm/kvm_mmu.h | 10 +++++++++-
arch/arm/kvm/mmu.c | 10 +++++-----
arch/arm64/include/asm/kvm_arm.h | 5 +++--
arch/arm64/include/asm/kvm_mmu.h | 11 ++++++++++-
4 files changed, 27 insertions(+), 9 deletions(-)
--
2.1.4
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting 2015-02-25 16:55 [PATCH 0/3] arm64: KVM: High memory guest fixes Marc Zyngier @ 2015-02-25 16:55 ` Marc Zyngier 2015-03-02 18:27 ` Christoffer Dall 2015-02-25 16:55 ` [PATCH 2/3] arm64: KVM: Do not use pgd_index to index stage-2 pgd Marc Zyngier 2015-02-25 16:55 ` [PATCH 3/3] arm64: KVM: Fix outdated comment about VTCR_EL2.PS Marc Zyngier 2 siblings, 1 reply; 8+ messages in thread From: Marc Zyngier @ 2015-02-25 16:55 UTC (permalink / raw) To: linux-arm-kernel We're using __get_free_pages with to allocate the guest's stage-2 PGD. The standard behaviour of this function is to return a set of pages where only the head page has a valid refcount. This behaviour gets us into trouble when we're trying to increment the refount on a non-head page: page:ffff7c00cfb693c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x4000000000000000() page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0) BUG: failure@include/linux/mm.h:548/get_page()! Kernel panic - not syncing: BUG! CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825 Hardware name: APM X-Gene Mustang board (DT) Call trace: [<ffff80000008a09c>] dump_backtrace+0x0/0x13c [<ffff80000008a1e8>] show_stack+0x10/0x1c [<ffff800000691da8>] dump_stack+0x74/0x94 [<ffff800000690d78>] panic+0x100/0x240 [<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc [<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0 [<ffff8000000a420c>] handle_exit+0x58/0x180 [<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c [<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754 [<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8 [<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78 CPU0: stopping Passing the (unintuitively named) __GFP_COMP flag to __get_free_pages forces the allocator to maintain a per-page refcount, which is exactly what we need. This has been tested on an X-Gene platform with a 4kB/48bit-VA host kernel, and kvmtool hacked to place memory in the second page of the hardware PGD (PUD for the host kernel). Reported-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> --- arch/arm/include/asm/kvm_mmu.h | 7 +++++++ arch/arm/kvm/mmu.c | 2 +- arch/arm64/include/asm/kvm_mmu.h | 9 ++++++++- 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 37ca2a4..1cac89b 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -162,6 +162,13 @@ static inline bool kvm_page_empty(void *ptr) #define KVM_PREALLOC_LEVEL 0 +/* + * We need to ensure that stage-2 PGDs are allocated with a per-page + * refcount, as we fiddle with the refcounts of non-head pages. + * __GFP_COMP forces the allocator to do what we want. + */ +#define KVM_GFP_S2_PGD (GFP_KERNEL | __GFP_ZERO | __GFP_COMP) + static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) { return 0; diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 3e6859b..a6a8252 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -666,7 +666,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm) * Allocate actual first-level Stage-2 page table used by the * hardware for Stage-2 page table walks. */ - pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, S2_PGD_ORDER); + pgd = (pgd_t *)__get_free_pages(KVM_GFP_S2_PGD, S2_PGD_ORDER); } if (!pgd) diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 6458b53..06c733a 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -171,6 +171,13 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd) #define KVM_PREALLOC_LEVEL (0) #endif +/* + * We need to ensure that stage-2 PGDs are allocated with a per-page + * refcount, as we fiddle with the refcounts of non-head pages. + * __GFP_COMP forces the allocator to do what we want. + */ +#define KVM_GFP_S2_PGD (GFP_KERNEL | __GFP_ZERO | __GFP_COMP) + /** * kvm_prealloc_hwpgd - allocate inital table for VTTBR * @kvm: The KVM struct pointer for the VM. @@ -192,7 +199,7 @@ static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) if (KVM_PREALLOC_LEVEL == 0) return 0; - hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, PTRS_PER_S2_PGD_SHIFT); + hwpgd = __get_free_pages(KVM_GFP_S2_PGD, PTRS_PER_S2_PGD_SHIFT); if (!hwpgd) return -ENOMEM; -- 2.1.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting 2015-02-25 16:55 ` [PATCH 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting Marc Zyngier @ 2015-03-02 18:27 ` Christoffer Dall 2015-03-05 13:58 ` Marc Zyngier 0 siblings, 1 reply; 8+ messages in thread From: Christoffer Dall @ 2015-03-02 18:27 UTC (permalink / raw) To: linux-arm-kernel On Wed, Feb 25, 2015 at 04:55:38PM +0000, Marc Zyngier wrote: > We're using __get_free_pages with to allocate the guest's stage-2 > PGD. The standard behaviour of this function is to return a set of > pages where only the head page has a valid refcount. > > This behaviour gets us into trouble when we're trying to increment > the refount on a non-head page: > > page:ffff7c00cfb693c0 count:0 mapcount:0 mapping: (null) index:0x0 > flags: 0x4000000000000000() > page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0) > BUG: failure at include/linux/mm.h:548/get_page()! > Kernel panic - not syncing: BUG! > CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825 > Hardware name: APM X-Gene Mustang board (DT) > Call trace: > [<ffff80000008a09c>] dump_backtrace+0x0/0x13c > [<ffff80000008a1e8>] show_stack+0x10/0x1c > [<ffff800000691da8>] dump_stack+0x74/0x94 > [<ffff800000690d78>] panic+0x100/0x240 > [<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc > [<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0 > [<ffff8000000a420c>] handle_exit+0x58/0x180 > [<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c > [<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754 > [<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8 > [<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78 > CPU0: stopping > > Passing the (unintuitively named) __GFP_COMP flag to __get_free_pages > forces the allocator to maintain a per-page refcount, which is exactly > what we need. There's a concerning comment with what seems to be the same kind of scenario in arch/tile/mm/pgtable.c:248. I'm a little confused here, because it looks like prep_new_page() calls prep_compound_page(), which is the effect you're looking for, but it sets the page count of all tail pages to 0, and I think our code expects all page table pages with all entries clear to have a page count of 1, i.e. the allocation count itself, so don't we still have a discrepency between head and tail pages for compound allocations? So I'm wondering if we should simply call split_page() on the allocation or if there's some other way of doing individual multiple contigous page allocations? Am I seeing ghosts? > > This has been tested on an X-Gene platform with a 4kB/48bit-VA host > kernel, and kvmtool hacked to place memory in the second page of > the hardware PGD (PUD for the host kernel). > > Reported-by: Mark Rutland <mark.rutland@arm.com> > Tested-by: Mark Rutland <mark.rutland@arm.com> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > --- > arch/arm/include/asm/kvm_mmu.h | 7 +++++++ > arch/arm/kvm/mmu.c | 2 +- > arch/arm64/include/asm/kvm_mmu.h | 9 ++++++++- > 3 files changed, 16 insertions(+), 2 deletions(-) > > diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h > index 37ca2a4..1cac89b 100644 > --- a/arch/arm/include/asm/kvm_mmu.h > +++ b/arch/arm/include/asm/kvm_mmu.h > @@ -162,6 +162,13 @@ static inline bool kvm_page_empty(void *ptr) > > #define KVM_PREALLOC_LEVEL 0 > > +/* > + * We need to ensure that stage-2 PGDs are allocated with a per-page > + * refcount, as we fiddle with the refcounts of non-head pages. > + * __GFP_COMP forces the allocator to do what we want. > + */ > +#define KVM_GFP_S2_PGD (GFP_KERNEL | __GFP_ZERO | __GFP_COMP) > + > static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > { what about the case for kvm_prealloc_hwpgd, why doesn't it need to use the new GFP mask? > return 0; > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > index 3e6859b..a6a8252 100644 > --- a/arch/arm/kvm/mmu.c > +++ b/arch/arm/kvm/mmu.c > @@ -666,7 +666,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm) > * Allocate actual first-level Stage-2 page table used by the > * hardware for Stage-2 page table walks. > */ > - pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, S2_PGD_ORDER); > + pgd = (pgd_t *)__get_free_pages(KVM_GFP_S2_PGD, S2_PGD_ORDER); > } > > if (!pgd) > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h > index 6458b53..06c733a 100644 > --- a/arch/arm64/include/asm/kvm_mmu.h > +++ b/arch/arm64/include/asm/kvm_mmu.h > @@ -171,6 +171,13 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd) > #define KVM_PREALLOC_LEVEL (0) > #endif > > +/* > + * We need to ensure that stage-2 PGDs are allocated with a per-page > + * refcount, as we fiddle with the refcounts of non-head pages. > + * __GFP_COMP forces the allocator to do what we want. > + */ > +#define KVM_GFP_S2_PGD (GFP_KERNEL | __GFP_ZERO | __GFP_COMP) > + > /** > * kvm_prealloc_hwpgd - allocate inital table for VTTBR > * @kvm: The KVM struct pointer for the VM. > @@ -192,7 +199,7 @@ static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > if (KVM_PREALLOC_LEVEL == 0) > return 0; > > - hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, PTRS_PER_S2_PGD_SHIFT); > + hwpgd = __get_free_pages(KVM_GFP_S2_PGD, PTRS_PER_S2_PGD_SHIFT); > if (!hwpgd) > return -ENOMEM; > > -- > 2.1.4 > Thanks, -Christoffer ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting 2015-03-02 18:27 ` Christoffer Dall @ 2015-03-05 13:58 ` Marc Zyngier 0 siblings, 0 replies; 8+ messages in thread From: Marc Zyngier @ 2015-03-05 13:58 UTC (permalink / raw) To: linux-arm-kernel On 02/03/15 18:27, Christoffer Dall wrote: > On Wed, Feb 25, 2015 at 04:55:38PM +0000, Marc Zyngier wrote: >> We're using __get_free_pages with to allocate the guest's stage-2 >> PGD. The standard behaviour of this function is to return a set of >> pages where only the head page has a valid refcount. >> >> This behaviour gets us into trouble when we're trying to increment >> the refount on a non-head page: >> >> page:ffff7c00cfb693c0 count:0 mapcount:0 mapping: (null) index:0x0 >> flags: 0x4000000000000000() >> page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0) >> BUG: failure at include/linux/mm.h:548/get_page()! >> Kernel panic - not syncing: BUG! >> CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825 >> Hardware name: APM X-Gene Mustang board (DT) >> Call trace: >> [<ffff80000008a09c>] dump_backtrace+0x0/0x13c >> [<ffff80000008a1e8>] show_stack+0x10/0x1c >> [<ffff800000691da8>] dump_stack+0x74/0x94 >> [<ffff800000690d78>] panic+0x100/0x240 >> [<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc >> [<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0 >> [<ffff8000000a420c>] handle_exit+0x58/0x180 >> [<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c >> [<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754 >> [<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8 >> [<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78 >> CPU0: stopping >> >> Passing the (unintuitively named) __GFP_COMP flag to __get_free_pages >> forces the allocator to maintain a per-page refcount, which is exactly >> what we need. > > There's a concerning comment with what seems to be the same kind of > scenario in arch/tile/mm/pgtable.c:248. > > I'm a little confused here, because it looks like prep_new_page() calls > prep_compound_page(), which is the effect you're looking for, but it > sets the page count of all tail pages to 0, and I think our code expects > all page table pages with all entries clear to have a page count of 1, > i.e. the allocation count itself, so don't we still have a discrepency > between head and tail pages for compound allocations? I wondered about that, but couldn't decide either way. If that was the case, we'd trigger some nasty warnings when tearing down a VM, and I couldn't see any. > So I'm wondering if we should simply call split_page() on the allocation > or if there's some other way of doing individual multiple contigous page > allocations? Am I seeing ghosts? Could well be a nicer solution. At least I can understand the code. I'll give it a go. >> >> This has been tested on an X-Gene platform with a 4kB/48bit-VA host >> kernel, and kvmtool hacked to place memory in the second page of >> the hardware PGD (PUD for the host kernel). >> >> Reported-by: Mark Rutland <mark.rutland@arm.com> >> Tested-by: Mark Rutland <mark.rutland@arm.com> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> >> --- >> arch/arm/include/asm/kvm_mmu.h | 7 +++++++ >> arch/arm/kvm/mmu.c | 2 +- >> arch/arm64/include/asm/kvm_mmu.h | 9 ++++++++- >> 3 files changed, 16 insertions(+), 2 deletions(-) >> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h >> index 37ca2a4..1cac89b 100644 >> --- a/arch/arm/include/asm/kvm_mmu.h >> +++ b/arch/arm/include/asm/kvm_mmu.h >> @@ -162,6 +162,13 @@ static inline bool kvm_page_empty(void *ptr) >> >> #define KVM_PREALLOC_LEVEL 0 >> >> +/* >> + * We need to ensure that stage-2 PGDs are allocated with a per-page >> + * refcount, as we fiddle with the refcounts of non-head pages. >> + * __GFP_COMP forces the allocator to do what we want. >> + */ >> +#define KVM_GFP_S2_PGD (GFP_KERNEL | __GFP_ZERO | __GFP_COMP) >> + >> static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) >> { > > what about the case for kvm_prealloc_hwpgd, why doesn't it need to use > the new GFP mask? I don't think so. It is always at most a page. >> return 0; >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> index 3e6859b..a6a8252 100644 >> --- a/arch/arm/kvm/mmu.c >> +++ b/arch/arm/kvm/mmu.c >> @@ -666,7 +666,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm) >> * Allocate actual first-level Stage-2 page table used by the >> * hardware for Stage-2 page table walks. >> */ >> - pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, S2_PGD_ORDER); >> + pgd = (pgd_t *)__get_free_pages(KVM_GFP_S2_PGD, S2_PGD_ORDER); >> } >> >> if (!pgd) >> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h >> index 6458b53..06c733a 100644 >> --- a/arch/arm64/include/asm/kvm_mmu.h >> +++ b/arch/arm64/include/asm/kvm_mmu.h >> @@ -171,6 +171,13 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd) >> #define KVM_PREALLOC_LEVEL (0) >> #endif >> >> +/* >> + * We need to ensure that stage-2 PGDs are allocated with a per-page >> + * refcount, as we fiddle with the refcounts of non-head pages. >> + * __GFP_COMP forces the allocator to do what we want. >> + */ >> +#define KVM_GFP_S2_PGD (GFP_KERNEL | __GFP_ZERO | __GFP_COMP) >> + >> /** >> * kvm_prealloc_hwpgd - allocate inital table for VTTBR >> * @kvm: The KVM struct pointer for the VM. >> @@ -192,7 +199,7 @@ static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) >> if (KVM_PREALLOC_LEVEL == 0) >> return 0; >> >> - hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, PTRS_PER_S2_PGD_SHIFT); >> + hwpgd = __get_free_pages(KVM_GFP_S2_PGD, PTRS_PER_S2_PGD_SHIFT); >> if (!hwpgd) >> return -ENOMEM; >> >> -- >> 2.1.4 >> Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/3] arm64: KVM: Do not use pgd_index to index stage-2 pgd 2015-02-25 16:55 [PATCH 0/3] arm64: KVM: High memory guest fixes Marc Zyngier 2015-02-25 16:55 ` [PATCH 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting Marc Zyngier @ 2015-02-25 16:55 ` Marc Zyngier 2015-03-02 18:45 ` Christoffer Dall 2015-02-25 16:55 ` [PATCH 3/3] arm64: KVM: Fix outdated comment about VTCR_EL2.PS Marc Zyngier 2 siblings, 1 reply; 8+ messages in thread From: Marc Zyngier @ 2015-02-25 16:55 UTC (permalink / raw) To: linux-arm-kernel The kernel's pgd_index macro is designed to index a normal, page sized array. KVM is a bit diffferent, as we can use concatenated pages to have a bigger address space (for example 40bit IPA with 4kB pages gives us an 8kB PGD. In the above case, the use of pgd_index will always return an index inside the first 4kB, which makes a guest that has memory above 0x8000000000 rather unhappy, as it spins forever in a page fault, whist the host happilly corrupts the lower pgd. The obvious fix is to get our own kvm_pgd_index that does the right thing(tm). Tested on X-Gene with a hacked kvmtool that put memory at a stupidly high address. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> --- arch/arm/include/asm/kvm_mmu.h | 3 ++- arch/arm/kvm/mmu.c | 8 ++++---- arch/arm64/include/asm/kvm_mmu.h | 2 ++ 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 1cac89b..ec669a6 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -149,13 +149,14 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd) (__boundary - 1 < (end) - 1)? __boundary: (end); \ }) +#define kvm_pgd_index(addr) pgd_index(addr) + static inline bool kvm_page_empty(void *ptr) { struct page *ptr_page = virt_to_page(ptr); return page_count(ptr_page) == 1; } - #define kvm_pte_table_empty(kvm, ptep) kvm_page_empty(ptep) #define kvm_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp) #define kvm_pud_table_empty(kvm, pudp) (0) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index a6a8252..39a0903 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -290,7 +290,7 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp, phys_addr_t addr = start, end = start + size; phys_addr_t next; - pgd = pgdp + pgd_index(addr); + pgd = pgdp + kvm_pgd_index(addr); do { next = kvm_pgd_addr_end(addr, end); if (!pgd_none(*pgd)) @@ -355,7 +355,7 @@ static void stage2_flush_memslot(struct kvm *kvm, phys_addr_t next; pgd_t *pgd; - pgd = kvm->arch.pgd + pgd_index(addr); + pgd = kvm->arch.pgd + kvm_pgd_index(addr); do { next = kvm_pgd_addr_end(addr, end); stage2_flush_puds(kvm, pgd, addr, next); @@ -799,7 +799,7 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache pgd_t *pgd; pud_t *pud; - pgd = kvm->arch.pgd + pgd_index(addr); + pgd = kvm->arch.pgd + kvm_pgd_index(addr); if (WARN_ON(pgd_none(*pgd))) { if (!cache) return NULL; @@ -1089,7 +1089,7 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) pgd_t *pgd; phys_addr_t next; - pgd = kvm->arch.pgd + pgd_index(addr); + pgd = kvm->arch.pgd + kvm_pgd_index(addr); do { /* * Release kvm_mmu_lock periodically if the memory region is diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 06c733a..c6300fd 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -158,6 +158,8 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd) #define PTRS_PER_S2_PGD (1 << PTRS_PER_S2_PGD_SHIFT) #define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) +#define kvm_pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_S2_PGD - 1)) + /* * If we are concatenating first level stage-2 page tables, we would have less * than or equal to 16 pointers in the fake PGD, because that's what the -- 2.1.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] arm64: KVM: Do not use pgd_index to index stage-2 pgd 2015-02-25 16:55 ` [PATCH 2/3] arm64: KVM: Do not use pgd_index to index stage-2 pgd Marc Zyngier @ 2015-03-02 18:45 ` Christoffer Dall 0 siblings, 0 replies; 8+ messages in thread From: Christoffer Dall @ 2015-03-02 18:45 UTC (permalink / raw) To: linux-arm-kernel On Wed, Feb 25, 2015 at 04:55:39PM +0000, Marc Zyngier wrote: > The kernel's pgd_index macro is designed to index a normal, page > sized array. KVM is a bit diffferent, as we can use concatenated > pages to have a bigger address space (for example 40bit IPA with > 4kB pages gives us an 8kB PGD. > > In the above case, the use of pgd_index will always return an index > inside the first 4kB, which makes a guest that has memory above > 0x8000000000 rather unhappy, as it spins forever in a page fault, > whist the host happilly corrupts the lower pgd. > > The obvious fix is to get our own kvm_pgd_index that does the right > thing(tm). > > Tested on X-Gene with a hacked kvmtool that put memory at a stupidly > high address. > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > --- > arch/arm/include/asm/kvm_mmu.h | 3 ++- > arch/arm/kvm/mmu.c | 8 ++++---- > arch/arm64/include/asm/kvm_mmu.h | 2 ++ > 3 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h > index 1cac89b..ec669a6 100644 > --- a/arch/arm/include/asm/kvm_mmu.h > +++ b/arch/arm/include/asm/kvm_mmu.h > @@ -149,13 +149,14 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd) > (__boundary - 1 < (end) - 1)? __boundary: (end); \ > }) > > +#define kvm_pgd_index(addr) pgd_index(addr) > + > static inline bool kvm_page_empty(void *ptr) > { > struct page *ptr_page = virt_to_page(ptr); > return page_count(ptr_page) == 1; > } > > - > #define kvm_pte_table_empty(kvm, ptep) kvm_page_empty(ptep) > #define kvm_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp) > #define kvm_pud_table_empty(kvm, pudp) (0) > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > index a6a8252..39a0903 100644 > --- a/arch/arm/kvm/mmu.c > +++ b/arch/arm/kvm/mmu.c > @@ -290,7 +290,7 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp, > phys_addr_t addr = start, end = start + size; > phys_addr_t next; > > - pgd = pgdp + pgd_index(addr); > + pgd = pgdp + kvm_pgd_index(addr); > do { > next = kvm_pgd_addr_end(addr, end); > if (!pgd_none(*pgd)) > @@ -355,7 +355,7 @@ static void stage2_flush_memslot(struct kvm *kvm, > phys_addr_t next; > pgd_t *pgd; > > - pgd = kvm->arch.pgd + pgd_index(addr); > + pgd = kvm->arch.pgd + kvm_pgd_index(addr); > do { > next = kvm_pgd_addr_end(addr, end); > stage2_flush_puds(kvm, pgd, addr, next); > @@ -799,7 +799,7 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache > pgd_t *pgd; > pud_t *pud; > > - pgd = kvm->arch.pgd + pgd_index(addr); > + pgd = kvm->arch.pgd + kvm_pgd_index(addr); > if (WARN_ON(pgd_none(*pgd))) { > if (!cache) > return NULL; > @@ -1089,7 +1089,7 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) > pgd_t *pgd; > phys_addr_t next; > > - pgd = kvm->arch.pgd + pgd_index(addr); > + pgd = kvm->arch.pgd + kvm_pgd_index(addr); > do { > /* > * Release kvm_mmu_lock periodically if the memory region is > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h > index 06c733a..c6300fd 100644 > --- a/arch/arm64/include/asm/kvm_mmu.h > +++ b/arch/arm64/include/asm/kvm_mmu.h > @@ -158,6 +158,8 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd) > #define PTRS_PER_S2_PGD (1 << PTRS_PER_S2_PGD_SHIFT) > #define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) > > +#define kvm_pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_S2_PGD - 1)) > + > /* > * If we are concatenating first level stage-2 page tables, we would have less > * than or equal to 16 pointers in the fake PGD, because that's what the > -- > 2.1.4 > Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 3/3] arm64: KVM: Fix outdated comment about VTCR_EL2.PS 2015-02-25 16:55 [PATCH 0/3] arm64: KVM: High memory guest fixes Marc Zyngier 2015-02-25 16:55 ` [PATCH 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting Marc Zyngier 2015-02-25 16:55 ` [PATCH 2/3] arm64: KVM: Do not use pgd_index to index stage-2 pgd Marc Zyngier @ 2015-02-25 16:55 ` Marc Zyngier 2015-03-02 18:52 ` Christoffer Dall 2 siblings, 1 reply; 8+ messages in thread From: Marc Zyngier @ 2015-02-25 16:55 UTC (permalink / raw) To: linux-arm-kernel Commit 87366d8cf7b3 ("arm64: Add boot time configuration of Intermediate Physical Address size") removed the hardcoded setting of VTCR_EL2.PS to use ID_AA64MMFR0_EL1.PARange instead, but didn't remove the (now rather misleading) comment. Fix the comments to match reality (at least for the next few minutes). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> --- arch/arm64/include/asm/kvm_arm.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index 94674eb..54bb4ba 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -129,6 +129,9 @@ * 40 bits wide (T0SZ = 24). Systems with a PARange smaller than 40 bits are * not known to exist and will break with this configuration. * + * VTCR_EL2.PS is extracted from ID_AA64MMFR0_EL1.PARange at boot time + * (see hyp-init.S). + * * Note that when using 4K pages, we concatenate two first level page tables * together. * @@ -138,7 +141,6 @@ #ifdef CONFIG_ARM64_64K_PAGES /* * Stage2 translation configuration: - * 40bits output (PS = 2) * 40bits input (T0SZ = 24) * 64kB pages (TG0 = 1) * 2 level page tables (SL = 1) @@ -150,7 +152,6 @@ #else /* * Stage2 translation configuration: - * 40bits output (PS = 2) * 40bits input (T0SZ = 24) * 4kB pages (TG0 = 0) * 3 level page tables (SL = 1) -- 2.1.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] arm64: KVM: Fix outdated comment about VTCR_EL2.PS 2015-02-25 16:55 ` [PATCH 3/3] arm64: KVM: Fix outdated comment about VTCR_EL2.PS Marc Zyngier @ 2015-03-02 18:52 ` Christoffer Dall 0 siblings, 0 replies; 8+ messages in thread From: Christoffer Dall @ 2015-03-02 18:52 UTC (permalink / raw) To: linux-arm-kernel On Wed, Feb 25, 2015 at 04:55:40PM +0000, Marc Zyngier wrote: > Commit 87366d8cf7b3 ("arm64: Add boot time configuration of > Intermediate Physical Address size") removed the hardcoded setting > of VTCR_EL2.PS to use ID_AA64MMFR0_EL1.PARange instead, but didn't > remove the (now rather misleading) comment. > > Fix the comments to match reality (at least for the next few minutes). > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-03-05 13:58 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-02-25 16:55 [PATCH 0/3] arm64: KVM: High memory guest fixes Marc Zyngier 2015-02-25 16:55 ` [PATCH 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting Marc Zyngier 2015-03-02 18:27 ` Christoffer Dall 2015-03-05 13:58 ` Marc Zyngier 2015-02-25 16:55 ` [PATCH 2/3] arm64: KVM: Do not use pgd_index to index stage-2 pgd Marc Zyngier 2015-03-02 18:45 ` Christoffer Dall 2015-02-25 16:55 ` [PATCH 3/3] arm64: KVM: Fix outdated comment about VTCR_EL2.PS Marc Zyngier 2015-03-02 18:52 ` Christoffer Dall
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).