* [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly
@ 2020-08-22 2:44 Gavin Shan
2020-08-22 10:01 ` Marc Zyngier
2020-09-02 10:59 ` Alexandru Elisei
0 siblings, 2 replies; 12+ messages in thread
From: Gavin Shan @ 2020-08-22 2:44 UTC (permalink / raw)
To: kvmarm; +Cc: shan.gavin, maz
Depending on the kernel configuration, PUD_SIZE could be equal to
PMD_SIZE. For example, both of them are 512MB with the following
kernel configuration. In this case, both PUD and PMD are folded
to PGD.
CONFIG_ARM64_64K_PAGES y
CONFIG_ARM64_VA_BITS 42
CONFIG_PGTABLE_LEVELS 2
With the above configuration, the stage2 PUD is used to backup the
512MB huge page when the stage2 mapping is built. During the mapping,
the PUD and its subordinate levels of page table entries are unmapped
if the PUD is present and not huge page sensitive in stage2_set_pud_huge().
Unfornately, the @addr isn't aligned to S2_PUD_SIZE and wrong page table
entries are zapped. It eventually leads to PUD's present bit can't be
cleared successfully and infinite loop in stage2_set_pud_huge().
This fixes the issue by checking with S2_{PUD, PMD}_SIZE instead of
{PUD, PMD}_SIZE to determine if stage2 PUD or PMD is used to back the
huge page. For this particular case, the stage2 PMD entry should be
used to backup the 512MB huge page with stage2_set_pmd_huge().
Fixes: 3c3736cd32bf ("KVM: arm/arm64: Fix handling of stage2 huge mappings")
Cc: stable@vger.kernel.org # v5.1+
Reported-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
arch/arm64/kvm/mmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 0121ef2c7c8d..deb1819ba9e2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1964,7 +1964,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
(fault_status == FSC_PERM &&
stage2_is_exec(mmu, fault_ipa, vma_pagesize));
- if (vma_pagesize == PUD_SIZE) {
+ if (vma_pagesize == S2_PUD_SIZE) {
pud_t new_pud = kvm_pfn_pud(pfn, mem_type);
new_pud = kvm_pud_mkhuge(new_pud);
@@ -1975,7 +1975,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
new_pud = kvm_s2pud_mkexec(new_pud);
ret = stage2_set_pud_huge(mmu, memcache, fault_ipa, &new_pud);
- } else if (vma_pagesize == PMD_SIZE) {
+ } else if (vma_pagesize == S2_PMD_SIZE) {
pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type);
new_pmd = kvm_pmd_mkhuge(new_pmd);
--
2.23.0
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly 2020-08-22 2:44 [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly Gavin Shan @ 2020-08-22 10:01 ` Marc Zyngier 2020-08-22 23:59 ` Gavin Shan 2020-09-02 10:59 ` Alexandru Elisei 1 sibling, 1 reply; 12+ messages in thread From: Marc Zyngier @ 2020-08-22 10:01 UTC (permalink / raw) To: Gavin Shan; +Cc: shan.gavin, Will Deacon, kvmarm Hi Gavin, Adding the usual suspects to the Cc list. On Sat, 22 Aug 2020 03:44:44 +0100, Gavin Shan <gshan@redhat.com> wrote: > > Depending on the kernel configuration, PUD_SIZE could be equal to > PMD_SIZE. For example, both of them are 512MB with the following > kernel configuration. In this case, both PUD and PMD are folded > to PGD. > > CONFIG_ARM64_64K_PAGES y > CONFIG_ARM64_VA_BITS 42 > CONFIG_PGTABLE_LEVELS 2 > > With the above configuration, the stage2 PUD is used to backup the > 512MB huge page when the stage2 mapping is built. During the mapping, > the PUD and its subordinate levels of page table entries are unmapped > if the PUD is present and not huge page sensitive in stage2_set_pud_huge(). > Unfornately, the @addr isn't aligned to S2_PUD_SIZE and wrong page table > entries are zapped. It eventually leads to PUD's present bit can't be > cleared successfully and infinite loop in stage2_set_pud_huge(). > > This fixes the issue by checking with S2_{PUD, PMD}_SIZE instead of > {PUD, PMD}_SIZE to determine if stage2 PUD or PMD is used to back the > huge page. For this particular case, the stage2 PMD entry should be > used to backup the 512MB huge page with stage2_set_pmd_huge(). It isn't obvious to me how S2_PMD_SIZE can be different from PMD_SIZE, and the current code certainly expects both to be equal (just look at how often S2_*_SIZE is used in the current code to convince yourself). My guess is that some lesser tested configurations (such as 64k pages) break that assumption, and result in the wrong thing happening. Could you please shed some light on it? > Fixes: 3c3736cd32bf ("KVM: arm/arm64: Fix handling of stage2 huge mappings") This commit doesn't seem to match the code your changing (it doesn't even come near user_mem_abort()). Are there any intermediate commits that would better explain the problem? > Cc: stable@vger.kernel.org # v5.1+ > Reported-by: Eric Auger <eric.auger@redhat.com> > Signed-off-by: Gavin Shan <gshan@redhat.com> > Tested-by: Eric Auger <eric.auger@redhat.com> > Reviewed-by: Eric Auger <eric.auger@redhat.com> > --- > arch/arm64/kvm/mmu.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 0121ef2c7c8d..deb1819ba9e2 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1964,7 +1964,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > (fault_status == FSC_PERM && > stage2_is_exec(mmu, fault_ipa, vma_pagesize)); > > - if (vma_pagesize == PUD_SIZE) { > + if (vma_pagesize == S2_PUD_SIZE) { > pud_t new_pud = kvm_pfn_pud(pfn, mem_type); > > new_pud = kvm_pud_mkhuge(new_pud); > @@ -1975,7 +1975,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > new_pud = kvm_s2pud_mkexec(new_pud); > > ret = stage2_set_pud_huge(mmu, memcache, fault_ipa, &new_pud); > - } else if (vma_pagesize == PMD_SIZE) { > + } else if (vma_pagesize == S2_PMD_SIZE) { > pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type); > > new_pmd = kvm_pmd_mkhuge(new_pmd); > -- > 2.23.0 > > Thanks, M. -- Without deviation from the norm, progress is not possible. _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly 2020-08-22 10:01 ` Marc Zyngier @ 2020-08-22 23:59 ` Gavin Shan 2020-09-01 16:50 ` Marc Zyngier 0 siblings, 1 reply; 12+ messages in thread From: Gavin Shan @ 2020-08-22 23:59 UTC (permalink / raw) To: Marc Zyngier; +Cc: shan.gavin, Will Deacon, kvmarm Hi Marc, On 8/22/20 8:01 PM, Marc Zyngier wrote: > On Sat, 22 Aug 2020 03:44:44 +0100, > Gavin Shan <gshan@redhat.com> wrote: >> >> Depending on the kernel configuration, PUD_SIZE could be equal to >> PMD_SIZE. For example, both of them are 512MB with the following >> kernel configuration. In this case, both PUD and PMD are folded >> to PGD. >> >> CONFIG_ARM64_64K_PAGES y >> CONFIG_ARM64_VA_BITS 42 >> CONFIG_PGTABLE_LEVELS 2 >> >> With the above configuration, the stage2 PUD is used to backup the >> 512MB huge page when the stage2 mapping is built. During the mapping, >> the PUD and its subordinate levels of page table entries are unmapped >> if the PUD is present and not huge page sensitive in stage2_set_pud_huge(). >> Unfornately, the @addr isn't aligned to S2_PUD_SIZE and wrong page table >> entries are zapped. It eventually leads to PUD's present bit can't be >> cleared successfully and infinite loop in stage2_set_pud_huge(). >> >> This fixes the issue by checking with S2_{PUD, PMD}_SIZE instead of >> {PUD, PMD}_SIZE to determine if stage2 PUD or PMD is used to back the >> huge page. For this particular case, the stage2 PMD entry should be >> used to backup the 512MB huge page with stage2_set_pmd_huge(). > > It isn't obvious to me how S2_PMD_SIZE can be different from PMD_SIZE, > and the current code certainly expects both to be equal (just look at > how often S2_*_SIZE is used in the current code to convince yourself). > > My guess is that some lesser tested configurations (such as 64k pages) > break that assumption, and result in the wrong thing happening. Could > you please shed some light on it? > With the following kernel configuration, PUD_SIZE and PMD_SIZE are equal and both of them are 512MB because P4D/PUD/PMD are folded into PGD. CONFIG_ARM64_64K_PAGES y CONFIG_ARM64_VA_BITS 42 CONFIG_PGTABLE_LEVELS 2 PMD_SIZE 512MB (include/asm-generic/pgtable-no-pud.h) PUD_SIZE 512MB (include/asm-generic/pgtable-no-pmd.h) P4D_SIZE 512MB (include/asm-generic/pgtable-nop4d.h) S2_PMD_SIZE 512MB (stage2_pgtable.h) S2_PUD_SIZE 4TB (stage2_pgtable.h) For this particular case, S2_PMD_SIZE and PMD_SIZE are equal and both of them are 512MB. However, the issue is wrong size (PMD_SIZE/PUD_SIZE) is checked to call stage2_set_pud_huge() or stage2_set_pmd_huge() in user_mem_abort(). It causes stage2_set_pud_huge() is called to map the 512MB huge page, meaning stage2 PUD is used to back the 512MB huge page. In stage2_set_pud_huge(), the S2 page table entries are zapped for the range ((addr & S2_PUD_MASK), S2_PUD_SIZE), whose size is 4TB. However, we're mapping 512MB huge page (block). It means unrelated page table entries are cleared. static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_memory_slot *memslot, unsigned long hva, unsigned long fault_status) { : if (vma_pagesize == PUD_SIZE) { /* PUD_SIZE == 512MB */ ret = stage2_set_pud_huge(mmu, memcache, fault_ipa, &new_pud); } else if (vma_pagesize == PMD_SIZE) { /* PMD_SIZE == 512MB */ ret = stage2_set_pmd_huge(mmu, memcache, fault_ipa, &new_pmd); } else { ret = stage2_set_pte(mmu, memcache, fault_ipa, &new_pte, flags); } : } The issue was initially reported by Eric and it can be reproduced on upstream kernel/qemu with the configurations to enable 64KB page size and 42-bits VA bits (CONFIG_ARM64_VA_BITS). Here is the command I used to reproduce the issue. Note that the IPA limit reported from the machine where I reproduced the issue is 44-bits, but qemu just uses 40-bits. It means the stage2 pagetable has 3 levels. start_vm_aarch64_hugetlbfs() { echo 32 > /sys/kernel/mm/hugepages/hugepages-524288kB/nr_hugepages /home/gavin/sandbox/qemu.main/aarch64-softmmu/qemu-system-aarch64 \ --enable-kvm -machine virt,gic-version=host \ -cpu host -smp 8,sockets=8,cores=1,threads=1 \ -m 8G -mem-prealloc -mem-path /dev/hugepages \ -monitor none -serial mon:stdio -nographic -s \ -bios /home/gavin/sandbox/qemu.main/pc-bios/edk2-aarch64-code.fd \ -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image \ -initrd /home/gavin/sandbox/images/rootfs.cpio.xz \ -append "earlycon=pl011,mmio,0x9000000" \ -device virtio-net-pci,netdev=unet,mac=52:54:00:f1:26:a6 \ -netdev user,id=unet,hostfwd=tcp::50959-:22 \ -drive file=/home/gavin/sandbox/images/vm.img,if=none,format=raw,id=nvme0 \ -device nvme,drive=nvme0,serial=foo \ -drive file=/home/gavin/sandbox/images/vm1.img,if=none,format=raw,id=nvme1 \ -device nvme,drive=nvme1,serial=foo1 } >> Fixes: 3c3736cd32bf ("KVM: arm/arm64: Fix handling of stage2 huge mappings") > > This commit doesn't seem to match the code your changing (it doesn't > even come near user_mem_abort()). Are there any intermediate commits > that would better explain the problem? > When stage2_set_pud_huge() is called to map 512MB huge page, we run into infinite loop to retry unmapping the memory range (S2_PUD_SIZE). With this reverted, we didn't reproduce the issue. The commit is identified by "git bisect". >> Cc: stable@vger.kernel.org # v5.1+ >> Reported-by: Eric Auger <eric.auger@redhat.com> >> Signed-off-by: Gavin Shan <gshan@redhat.com> >> Tested-by: Eric Auger <eric.auger@redhat.com> >> Reviewed-by: Eric Auger <eric.auger@redhat.com> >> --- >> arch/arm64/kvm/mmu.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >> index 0121ef2c7c8d..deb1819ba9e2 100644 >> --- a/arch/arm64/kvm/mmu.c >> +++ b/arch/arm64/kvm/mmu.c >> @@ -1964,7 +1964,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >> (fault_status == FSC_PERM && >> stage2_is_exec(mmu, fault_ipa, vma_pagesize)); >> >> - if (vma_pagesize == PUD_SIZE) { >> + if (vma_pagesize == S2_PUD_SIZE) { >> pud_t new_pud = kvm_pfn_pud(pfn, mem_type); >> >> new_pud = kvm_pud_mkhuge(new_pud); >> @@ -1975,7 +1975,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >> new_pud = kvm_s2pud_mkexec(new_pud); >> >> ret = stage2_set_pud_huge(mmu, memcache, fault_ipa, &new_pud); >> - } else if (vma_pagesize == PMD_SIZE) { >> + } else if (vma_pagesize == S2_PMD_SIZE) { >> pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type); >> >> new_pmd = kvm_pmd_mkhuge(new_pmd); >> -- >> 2.23.0 >> >> Thanks, Gavin _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly 2020-08-22 23:59 ` Gavin Shan @ 2020-09-01 16:50 ` Marc Zyngier 0 siblings, 0 replies; 12+ messages in thread From: Marc Zyngier @ 2020-09-01 16:50 UTC (permalink / raw) To: Gavin Shan; +Cc: shan.gavin, Will Deacon, kvmarm Hi Gavin, On 2020-08-23 00:59, Gavin Shan wrote: > Hi Marc, > > On 8/22/20 8:01 PM, Marc Zyngier wrote: >> On Sat, 22 Aug 2020 03:44:44 +0100, >> Gavin Shan <gshan@redhat.com> wrote: >>> >>> Depending on the kernel configuration, PUD_SIZE could be equal to >>> PMD_SIZE. For example, both of them are 512MB with the following >>> kernel configuration. In this case, both PUD and PMD are folded >>> to PGD. >>> >>> CONFIG_ARM64_64K_PAGES y >>> CONFIG_ARM64_VA_BITS 42 >>> CONFIG_PGTABLE_LEVELS 2 >>> >>> With the above configuration, the stage2 PUD is used to backup the >>> 512MB huge page when the stage2 mapping is built. During the mapping, >>> the PUD and its subordinate levels of page table entries are unmapped >>> if the PUD is present and not huge page sensitive in >>> stage2_set_pud_huge(). >>> Unfornately, the @addr isn't aligned to S2_PUD_SIZE and wrong page >>> table >>> entries are zapped. It eventually leads to PUD's present bit can't be >>> cleared successfully and infinite loop in stage2_set_pud_huge(). >>> >>> This fixes the issue by checking with S2_{PUD, PMD}_SIZE instead of >>> {PUD, PMD}_SIZE to determine if stage2 PUD or PMD is used to back the >>> huge page. For this particular case, the stage2 PMD entry should be >>> used to backup the 512MB huge page with stage2_set_pmd_huge(). >> >> It isn't obvious to me how S2_PMD_SIZE can be different from PMD_SIZE, >> and the current code certainly expects both to be equal (just look at >> how often S2_*_SIZE is used in the current code to convince yourself). >> >> My guess is that some lesser tested configurations (such as 64k pages) >> break that assumption, and result in the wrong thing happening. Could >> you please shed some light on it? >> > > With the following kernel configuration, PUD_SIZE and PMD_SIZE are > equal > and both of them are 512MB because P4D/PUD/PMD are folded into PGD. > > CONFIG_ARM64_64K_PAGES y > CONFIG_ARM64_VA_BITS 42 > CONFIG_PGTABLE_LEVELS 2 > PMD_SIZE 512MB > (include/asm-generic/pgtable-no-pud.h) > PUD_SIZE 512MB > (include/asm-generic/pgtable-no-pmd.h) > P4D_SIZE 512MB > (include/asm-generic/pgtable-nop4d.h) > S2_PMD_SIZE 512MB (stage2_pgtable.h) > S2_PUD_SIZE 4TB (stage2_pgtable.h) For a start, this last value makes no sense at all. If that we only have 2 levels, S2_PUD_SIZE is wrong, and should either be ignored or be the same as S2_PMD_SIZE (4TB represents the whole of the guest's IPA space). > For this particular case, S2_PMD_SIZE and PMD_SIZE are equal and both > of them are 512MB. However, the issue is wrong size (PMD_SIZE/PUD_SIZE) > is checked to call stage2_set_pud_huge() or stage2_set_pmd_huge() in > user_mem_abort(). It causes stage2_set_pud_huge() is called to map the > 512MB huge page, meaning stage2 PUD is used to back the 512MB huge > page. Which should be fine, because that's all we can map. 2 levels, remember? The real bug is that we consider that mapping a PUD is valid, while there is no real PUD in this context. > > In stage2_set_pud_huge(), the S2 page table entries are zapped for the > range ((addr & S2_PUD_MASK), S2_PUD_SIZE), whose size is 4TB. However, > we're mapping 512MB huge page (block). It means unrelated page table > entries are cleared. We can't map 4TB. That's not possible for such configuration (and I doubt you have more than 4TB of contiguous memory on your system). > > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > struct kvm_memory_slot *memslot, unsigned > long hva, > unsigned long fault_status) > { > : > if (vma_pagesize == PUD_SIZE) { /* PUD_SIZE == 512MB */ > ret = stage2_set_pud_huge(mmu, memcache, fault_ipa, &new_pud); > } else if (vma_pagesize == PMD_SIZE) { /* PMD_SIZE == 512MB */ > ret = stage2_set_pmd_huge(mmu, memcache, fault_ipa, &new_pmd); > } else { > ret = stage2_set_pte(mmu, memcache, fault_ipa, &new_pte, > flags); > } > : > } > > The issue was initially reported by Eric and it can be reproduced on > upstream kernel/qemu with the configurations to enable 64KB page size > and 42-bits VA bits (CONFIG_ARM64_VA_BITS). Here is the command I used > to reproduce the issue. Note that the IPA limit reported from the > machine > where I reproduced the issue is 44-bits, but qemu just uses 40-bits. It > means the stage2 pagetable has 3 levels. No. KVM, in its current form, cannot have more levels at stage-2 than it has at stage-1 (that's one the things we are fixing with Will's patches). > > start_vm_aarch64_hugetlbfs() { > echo 32 > /sys/kernel/mm/hugepages/hugepages-524288kB/nr_hugepages Ah, hugetlb. I wonder if we have something funky here. Alexandru (cc'd) was looking at something in this area (see [1]). > > /home/gavin/sandbox/qemu.main/aarch64-softmmu/qemu-system-aarch64 > \ > --enable-kvm -machine virt,gic-version=host > \ > -cpu host -smp 8,sockets=8,cores=1,threads=1 > \ > -m 8G -mem-prealloc -mem-path /dev/hugepages > \ > -monitor none -serial mon:stdio -nographic -s > \ > -bios /home/gavin/sandbox/qemu.main/pc-bios/edk2-aarch64-code.fd > \ > -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image > \ > -initrd /home/gavin/sandbox/images/rootfs.cpio.xz > \ > -append "earlycon=pl011,mmio,0x9000000" > \ > -device virtio-net-pci,netdev=unet,mac=52:54:00:f1:26:a6 > \ > -netdev user,id=unet,hostfwd=tcp::50959-:22 > \ > -drive > file=/home/gavin/sandbox/images/vm.img,if=none,format=raw,id=nvme0 \ > -device nvme,drive=nvme0,serial=foo > \ > -drive > file=/home/gavin/sandbox/images/vm1.img,if=none,format=raw,id=nvme1 \ > -device nvme,drive=nvme1,serial=foo1 > } I'll try to reproduce this. M. [1] https://lore.kernel.org/r/20200901133357.52640-1-alexandru.elisei@arm.com -- Jazz is not dead. It just smells funny... _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly 2020-08-22 2:44 [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly Gavin Shan 2020-08-22 10:01 ` Marc Zyngier @ 2020-09-02 10:59 ` Alexandru Elisei 2020-09-02 11:10 ` Marc Zyngier 1 sibling, 1 reply; 12+ messages in thread From: Alexandru Elisei @ 2020-09-02 10:59 UTC (permalink / raw) To: Gavin Shan, kvmarm; +Cc: maz, shan.gavin Hi, On 8/22/20 3:44 AM, Gavin Shan wrote: > Depending on the kernel configuration, PUD_SIZE could be equal to > PMD_SIZE. For example, both of them are 512MB with the following > kernel configuration. In this case, both PUD and PMD are folded > to PGD. > > CONFIG_ARM64_64K_PAGES y > CONFIG_ARM64_VA_BITS 42 > CONFIG_PGTABLE_LEVELS 2 > > With the above configuration, the stage2 PUD is used to backup the > 512MB huge page when the stage2 mapping is built. During the mapping, > the PUD and its subordinate levels of page table entries are unmapped > if the PUD is present and not huge page sensitive in stage2_set_pud_huge(). > Unfornately, the @addr isn't aligned to S2_PUD_SIZE and wrong page table > entries are zapped. It eventually leads to PUD's present bit can't be > cleared successfully and infinite loop in stage2_set_pud_huge(). > > This fixes the issue by checking with S2_{PUD, PMD}_SIZE instead of > {PUD, PMD}_SIZE to determine if stage2 PUD or PMD is used to back the > huge page. For this particular case, the stage2 PMD entry should be > used to backup the 512MB huge page with stage2_set_pmd_huge(). I can reproduce this on my rockpro64 using kvmtool. I see two issues here: first, PUD_SIZE = 512MB, but S2_PUD_SIZE = 4TB (checked using printk), and second, stage2_set_pud_huge() hangs. I'm working on debugging them. Thanks, Alex _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly 2020-09-02 10:59 ` Alexandru Elisei @ 2020-09-02 11:10 ` Marc Zyngier 2020-09-02 11:53 ` Alexandru Elisei ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Marc Zyngier @ 2020-09-02 11:10 UTC (permalink / raw) To: Alexandru Elisei; +Cc: kvmarm, shan.gavin On 2020-09-02 11:59, Alexandru Elisei wrote: > Hi, > > On 8/22/20 3:44 AM, Gavin Shan wrote: >> Depending on the kernel configuration, PUD_SIZE could be equal to >> PMD_SIZE. For example, both of them are 512MB with the following >> kernel configuration. In this case, both PUD and PMD are folded >> to PGD. >> >> CONFIG_ARM64_64K_PAGES y >> CONFIG_ARM64_VA_BITS 42 >> CONFIG_PGTABLE_LEVELS 2 >> >> With the above configuration, the stage2 PUD is used to backup the >> 512MB huge page when the stage2 mapping is built. During the mapping, >> the PUD and its subordinate levels of page table entries are unmapped >> if the PUD is present and not huge page sensitive in >> stage2_set_pud_huge(). >> Unfornately, the @addr isn't aligned to S2_PUD_SIZE and wrong page >> table >> entries are zapped. It eventually leads to PUD's present bit can't be >> cleared successfully and infinite loop in stage2_set_pud_huge(). >> >> This fixes the issue by checking with S2_{PUD, PMD}_SIZE instead of >> {PUD, PMD}_SIZE to determine if stage2 PUD or PMD is used to back the >> huge page. For this particular case, the stage2 PMD entry should be >> used to backup the 512MB huge page with stage2_set_pmd_huge(). > > I can reproduce this on my rockpro64 using kvmtool. > > I see two issues here: first, PUD_SIZE = 512MB, but S2_PUD_SIZE = 4TB > (checked > using printk), and second, stage2_set_pud_huge() hangs. I'm working on > debugging them. I have this as an immediate fix for the set_pud_huge hang, tested on Seattle with 64k/42bits. I can't wait to see the back of this code... M. From 2a345a826a47f9061bb37045a1d89ea54b51fb80 Mon Sep 17 00:00:00 2001 From: Marc Zyngier <maz@kernel.org> Date: Wed, 2 Sep 2020 11:18:29 +0100 Subject: [PATCH] KVM: arm64: Do not try to map PUDs when they are folded into PMD For the obscure cases where PMD and PUD are the same size (64kB pages with 42bit VA, for example, which results in only two levels of page tables), we can't map anything as a PUD, because there is... erm... no PUD to speak of. Everything is either a PMD or a PTE. So let's only try and map a PUD when its size is different from that of a PMD. Cc: stable@vger.kernel.org Fixes: b8e0ba7c8bea ("KVM: arm64: Add support for creating PUD hugepages at stage 2") Reported-by: Gavin Shan <gshan@redhat.com> Reported-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> --- arch/arm64/kvm/mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index ba00bcc0c884..c3a92fa537fd 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1970,7 +1970,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, (fault_status == FSC_PERM && stage2_is_exec(mmu, fault_ipa, vma_pagesize)); - if (vma_pagesize == PUD_SIZE) { + if (PUD_SIZE != PMD_SIZE && vma_pagesize == PUD_SIZE) { pud_t new_pud = kvm_pfn_pud(pfn, mem_type); new_pud = kvm_pud_mkhuge(new_pud); -- 2.28.0 -- Jazz is not dead. It just smells funny... _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly 2020-09-02 11:10 ` Marc Zyngier @ 2020-09-02 11:53 ` Alexandru Elisei 2020-09-02 11:56 ` Alexandru Elisei 2020-09-02 12:04 ` Marc Zyngier 2020-09-02 17:38 ` Auger Eric 2020-09-02 23:55 ` Gavin Shan 2 siblings, 2 replies; 12+ messages in thread From: Alexandru Elisei @ 2020-09-02 11:53 UTC (permalink / raw) To: Marc Zyngier; +Cc: kvmarm, shan.gavin Hi Marc, On 9/2/20 12:10 PM, Marc Zyngier wrote: > On 2020-09-02 11:59, Alexandru Elisei wrote: >> Hi, >> >> On 8/22/20 3:44 AM, Gavin Shan wrote: >>> Depending on the kernel configuration, PUD_SIZE could be equal to >>> PMD_SIZE. For example, both of them are 512MB with the following >>> kernel configuration. In this case, both PUD and PMD are folded >>> to PGD. >>> >>> CONFIG_ARM64_64K_PAGES y >>> CONFIG_ARM64_VA_BITS 42 >>> CONFIG_PGTABLE_LEVELS 2 >>> >>> With the above configuration, the stage2 PUD is used to backup the >>> 512MB huge page when the stage2 mapping is built. During the mapping, >>> the PUD and its subordinate levels of page table entries are unmapped >>> if the PUD is present and not huge page sensitive in stage2_set_pud_huge(). >>> Unfornately, the @addr isn't aligned to S2_PUD_SIZE and wrong page table >>> entries are zapped. It eventually leads to PUD's present bit can't be >>> cleared successfully and infinite loop in stage2_set_pud_huge(). >>> >>> This fixes the issue by checking with S2_{PUD, PMD}_SIZE instead of >>> {PUD, PMD}_SIZE to determine if stage2 PUD or PMD is used to back the >>> huge page. For this particular case, the stage2 PMD entry should be >>> used to backup the 512MB huge page with stage2_set_pmd_huge(). >> >> I can reproduce this on my rockpro64 using kvmtool. >> >> I see two issues here: first, PUD_SIZE = 512MB, but S2_PUD_SIZE = 4TB (checked >> using printk), and second, stage2_set_pud_huge() hangs. I'm working on >> debugging them. > > I have this as an immediate fix for the set_pud_huge hang, tested > on Seattle with 64k/42bits. > > I can't wait to see the back of this code... The problem is in stage2_set_pud_huge(), because kvm_stage2_has_pmd() returns false (CONFIG_PGTABLE_LEVELS = 2): pudp = stage2_get_pud(mmu, cache, addr); VM_BUG_ON(!pudp); old_pud = *pudp; [..] // Returns 1 because !kvm_stage2_has_pmd() if (stage2_pud_present(kvm, old_pud)) { /* * If we already have table level mapping for this block, unmap * the range for this block and retry. */ if (!stage2_pud_huge(kvm, old_pud)) { // Always true because !kvm_stage2_has_pmd() unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE); goto retry; } And we end up jumping back to retry forever. IMO, in user_mem_abort(), if PUD_SIZE == PMD_SIZE, we should try to map PMD_SIZE instead of PUD_SIZE. Maybe something like this? diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index ba00bcc0c884..178267dec511 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1886,8 +1886,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * As for PUD huge maps, we must make sure that we have at least * 3 levels, i.e, PMD is not folded. */ - if (vma_pagesize == PMD_SIZE || - (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) + if (vma_pagesize == PUD_SIZE && !kvm_stage2_has_pmd(kvm)) + vma_pagesize = PMD_SIZE; + + if (vma_pagesize == PUD_SIZE || vma_pagesize == PUD_SIZE) gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; mmap_read_unlock(current->mm); Thanks, Alex _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly 2020-09-02 11:53 ` Alexandru Elisei @ 2020-09-02 11:56 ` Alexandru Elisei 2020-09-02 12:04 ` Marc Zyngier 1 sibling, 0 replies; 12+ messages in thread From: Alexandru Elisei @ 2020-09-02 11:56 UTC (permalink / raw) To: Marc Zyngier; +Cc: kvmarm, shan.gavin Hi, On 9/2/20 12:53 PM, Alexandru Elisei wrote: > [..] > And we end up jumping back to retry forever. IMO, in user_mem_abort(), if PUD_SIZE > == PMD_SIZE, we should try to map PMD_SIZE instead of PUD_SIZE. Maybe something > like this? > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index ba00bcc0c884..178267dec511 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1886,8 +1886,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > phys_addr_t fault_ipa, > * As for PUD huge maps, we must make sure that we have at least > * 3 levels, i.e, PMD is not folded. > */ > - if (vma_pagesize == PMD_SIZE || > - (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) > + if (vma_pagesize == PUD_SIZE && !kvm_stage2_has_pmd(kvm)) > + vma_pagesize = PMD_SIZE; > + > + if (vma_pagesize == PUD_SIZE || vma_pagesize == PUD_SIZE) > gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; > mmap_read_unlock(current->mm); Erm... goes without saying, completely untested and the check should have been: + if (vma_pagesize == PUD_SIZE || vma_pagesize == PMD_SIZE) Thanks, Alex _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly 2020-09-02 11:53 ` Alexandru Elisei 2020-09-02 11:56 ` Alexandru Elisei @ 2020-09-02 12:04 ` Marc Zyngier 2020-09-02 13:58 ` Alexandru Elisei 1 sibling, 1 reply; 12+ messages in thread From: Marc Zyngier @ 2020-09-02 12:04 UTC (permalink / raw) To: Alexandru Elisei; +Cc: kvmarm, shan.gavin On 2020-09-02 12:53, Alexandru Elisei wrote: > Hi Marc, > > On 9/2/20 12:10 PM, Marc Zyngier wrote: >> On 2020-09-02 11:59, Alexandru Elisei wrote: >>> Hi, >>> >>> On 8/22/20 3:44 AM, Gavin Shan wrote: >>>> Depending on the kernel configuration, PUD_SIZE could be equal to >>>> PMD_SIZE. For example, both of them are 512MB with the following >>>> kernel configuration. In this case, both PUD and PMD are folded >>>> to PGD. >>>> >>>> CONFIG_ARM64_64K_PAGES y >>>> CONFIG_ARM64_VA_BITS 42 >>>> CONFIG_PGTABLE_LEVELS 2 >>>> >>>> With the above configuration, the stage2 PUD is used to backup the >>>> 512MB huge page when the stage2 mapping is built. During the >>>> mapping, >>>> the PUD and its subordinate levels of page table entries are >>>> unmapped >>>> if the PUD is present and not huge page sensitive in >>>> stage2_set_pud_huge(). >>>> Unfornately, the @addr isn't aligned to S2_PUD_SIZE and wrong page >>>> table >>>> entries are zapped. It eventually leads to PUD's present bit can't >>>> be >>>> cleared successfully and infinite loop in stage2_set_pud_huge(). >>>> >>>> This fixes the issue by checking with S2_{PUD, PMD}_SIZE instead of >>>> {PUD, PMD}_SIZE to determine if stage2 PUD or PMD is used to back >>>> the >>>> huge page. For this particular case, the stage2 PMD entry should be >>>> used to backup the 512MB huge page with stage2_set_pmd_huge(). >>> >>> I can reproduce this on my rockpro64 using kvmtool. >>> >>> I see two issues here: first, PUD_SIZE = 512MB, but S2_PUD_SIZE = 4TB >>> (checked >>> using printk), and second, stage2_set_pud_huge() hangs. I'm working >>> on >>> debugging them. >> >> I have this as an immediate fix for the set_pud_huge hang, tested >> on Seattle with 64k/42bits. >> >> I can't wait to see the back of this code... > > The problem is in stage2_set_pud_huge(), because kvm_stage2_has_pmd() > returns > false (CONFIG_PGTABLE_LEVELS = 2): > > pudp = stage2_get_pud(mmu, cache, addr); > VM_BUG_ON(!pudp); > > old_pud = *pudp; > > [..] > > // Returns 1 because !kvm_stage2_has_pmd() > if (stage2_pud_present(kvm, old_pud)) { > /* > * If we already have table level mapping for this block, unmap > * the range for this block and retry. > */ > if (!stage2_pud_huge(kvm, old_pud)) { // Always true because > !kvm_stage2_has_pmd() > unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE); > goto retry; > } > > And we end up jumping back to retry forever. IMO, in user_mem_abort(), > if PUD_SIZE > == PMD_SIZE, we should try to map PMD_SIZE instead of PUD_SIZE. Maybe > something > like this? Err... If PUD_SIZE == PMD_SIZE, what difference does it make to map one or the other? > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index ba00bcc0c884..178267dec511 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1886,8 +1886,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > phys_addr_t fault_ipa, > * As for PUD huge maps, we must make sure that we have at > least > * 3 levels, i.e, PMD is not folded. > */ > - if (vma_pagesize == PMD_SIZE || > - (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) > + if (vma_pagesize == PUD_SIZE && !kvm_stage2_has_pmd(kvm)) > + vma_pagesize = PMD_SIZE; > + > + if (vma_pagesize == PUD_SIZE || vma_pagesize == PUD_SIZE) > gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> > PAGE_SHIFT; > mmap_read_unlock(current->mm); I don't think this solves anything in the 2 level case. The gist of the issue is that if we go on the PUD path, we end-up computing the wrong offset for the entry and either loop or do something silly. In either case, we read/write something we have no business touching. Gavin sidesteps the problem by using a fantasy PUD size, while I catch it by explicitly checking for the PMD == PUD case. M. -- Jazz is not dead. It just smells funny... _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly 2020-09-02 12:04 ` Marc Zyngier @ 2020-09-02 13:58 ` Alexandru Elisei 0 siblings, 0 replies; 12+ messages in thread From: Alexandru Elisei @ 2020-09-02 13:58 UTC (permalink / raw) To: Marc Zyngier; +Cc: kvmarm, shan.gavin Hi Marc, On 9/2/20 1:04 PM, Marc Zyngier wrote: > On 2020-09-02 12:53, Alexandru Elisei wrote: >> Hi Marc, >> >> On 9/2/20 12:10 PM, Marc Zyngier wrote: >>> On 2020-09-02 11:59, Alexandru Elisei wrote: >>>> Hi, >>>> >>>> On 8/22/20 3:44 AM, Gavin Shan wrote: >>>>> Depending on the kernel configuration, PUD_SIZE could be equal to >>>>> PMD_SIZE. For example, both of them are 512MB with the following >>>>> kernel configuration. In this case, both PUD and PMD are folded >>>>> to PGD. >>>>> >>>>> CONFIG_ARM64_64K_PAGES y >>>>> CONFIG_ARM64_VA_BITS 42 >>>>> CONFIG_PGTABLE_LEVELS 2 >>>>> >>>>> With the above configuration, the stage2 PUD is used to backup the >>>>> 512MB huge page when the stage2 mapping is built. During the mapping, >>>>> the PUD and its subordinate levels of page table entries are unmapped >>>>> if the PUD is present and not huge page sensitive in stage2_set_pud_huge(). >>>>> Unfornately, the @addr isn't aligned to S2_PUD_SIZE and wrong page table >>>>> entries are zapped. It eventually leads to PUD's present bit can't be >>>>> cleared successfully and infinite loop in stage2_set_pud_huge(). >>>>> >>>>> This fixes the issue by checking with S2_{PUD, PMD}_SIZE instead of >>>>> {PUD, PMD}_SIZE to determine if stage2 PUD or PMD is used to back the >>>>> huge page. For this particular case, the stage2 PMD entry should be >>>>> used to backup the 512MB huge page with stage2_set_pmd_huge(). >>>> >>>> I can reproduce this on my rockpro64 using kvmtool. >>>> >>>> I see two issues here: first, PUD_SIZE = 512MB, but S2_PUD_SIZE = 4TB (checked >>>> using printk), and second, stage2_set_pud_huge() hangs. I'm working on >>>> debugging them. >>> >>> I have this as an immediate fix for the set_pud_huge hang, tested >>> on Seattle with 64k/42bits. >>> >>> I can't wait to see the back of this code... >> >> The problem is in stage2_set_pud_huge(), because kvm_stage2_has_pmd() returns >> false (CONFIG_PGTABLE_LEVELS = 2): >> >> pudp = stage2_get_pud(mmu, cache, addr); >> VM_BUG_ON(!pudp); >> >> old_pud = *pudp; >> >> [..] >> >> // Returns 1 because !kvm_stage2_has_pmd() >> if (stage2_pud_present(kvm, old_pud)) { >> /* >> * If we already have table level mapping for this block, unmap >> * the range for this block and retry. >> */ >> if (!stage2_pud_huge(kvm, old_pud)) { // Always true because >> !kvm_stage2_has_pmd() >> unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE); >> goto retry; >> } >> >> And we end up jumping back to retry forever. IMO, in user_mem_abort(), >> if PUD_SIZE >> == PMD_SIZE, we should try to map PMD_SIZE instead of PUD_SIZE. Maybe something >> like this? > > Err... If PUD_SIZE == PMD_SIZE, what difference does it make to map > one or the other? > >> >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >> index ba00bcc0c884..178267dec511 100644 >> --- a/arch/arm64/kvm/mmu.c >> +++ b/arch/arm64/kvm/mmu.c >> @@ -1886,8 +1886,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> * As for PUD huge maps, we must make sure that we have at least >> * 3 levels, i.e, PMD is not folded. >> */ >> - if (vma_pagesize == PMD_SIZE || >> - (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) >> + if (vma_pagesize == PUD_SIZE && !kvm_stage2_has_pmd(kvm)) >> + vma_pagesize = PMD_SIZE; >> + >> + if (vma_pagesize == PUD_SIZE || vma_pagesize == PUD_SIZE) >> gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> >> PAGE_SHIFT; >> mmap_read_unlock(current->mm); > > I don't think this solves anything in the 2 level case. The gist of > the issue is that if we go on the PUD path, we end-up computing the > wrong offset for the entry and either loop or do something silly. > In either case, we read/write something we have no business touching. In my testing, I seem to be getting the right offset. For fault_ipa 0x8000 0000, in stage2_set_pud_huge I get pudp = mmu->pgd + 0x20, which looks like the correct address for the table entry (IPA[41:29] = 4). That address is valid because in kvm_init_stage2_mmu we allocate stage2_pgd_size = 16K (40 bits IPA so we only need to map bits [39:29] in the level 2 table). The reason why I am seeing the hang is what I tried to explained above: inkvm_stage2_has_pmd() is always false, which means that in stage2_set_pud_huge, stage2_pud_present is always true and stage2_pud_huge is always false, which leads to jumping to the retry label in an infinite loop. > > Gavin sidesteps the problem by using a fantasy PUD size, while I > catch it by explicitly checking for the PMD == PUD case. My mistake, I shouldn't have posted a diff without testing it. You're right, my change fixes nothing because vma_pagesize still equals PUD_SIZE == PMD_SIZE. I tested your patch and the hang goes away. If you decide to keep your patch as-is: Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> As an aside, a comment explaining why the check for PUD_SIZE != PMD_SIZE could make it easier for other people to understand what is going on. Thanks, Alex _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly 2020-09-02 11:10 ` Marc Zyngier 2020-09-02 11:53 ` Alexandru Elisei @ 2020-09-02 17:38 ` Auger Eric 2020-09-02 23:55 ` Gavin Shan 2 siblings, 0 replies; 12+ messages in thread From: Auger Eric @ 2020-09-02 17:38 UTC (permalink / raw) To: Marc Zyngier, Alexandru Elisei; +Cc: kvmarm, shan.gavin Hi Marc, On 9/2/20 1:10 PM, Marc Zyngier wrote: > On 2020-09-02 11:59, Alexandru Elisei wrote: >> Hi, >> >> On 8/22/20 3:44 AM, Gavin Shan wrote: >>> Depending on the kernel configuration, PUD_SIZE could be equal to >>> PMD_SIZE. For example, both of them are 512MB with the following >>> kernel configuration. In this case, both PUD and PMD are folded >>> to PGD. >>> >>> CONFIG_ARM64_64K_PAGES y >>> CONFIG_ARM64_VA_BITS 42 >>> CONFIG_PGTABLE_LEVELS 2 >>> >>> With the above configuration, the stage2 PUD is used to backup the >>> 512MB huge page when the stage2 mapping is built. During the mapping, >>> the PUD and its subordinate levels of page table entries are unmapped >>> if the PUD is present and not huge page sensitive in >>> stage2_set_pud_huge(). >>> Unfornately, the @addr isn't aligned to S2_PUD_SIZE and wrong page table >>> entries are zapped. It eventually leads to PUD's present bit can't be >>> cleared successfully and infinite loop in stage2_set_pud_huge(). >>> >>> This fixes the issue by checking with S2_{PUD, PMD}_SIZE instead of >>> {PUD, PMD}_SIZE to determine if stage2 PUD or PMD is used to back the >>> huge page. For this particular case, the stage2 PMD entry should be >>> used to backup the 512MB huge page with stage2_set_pmd_huge(). >> >> I can reproduce this on my rockpro64 using kvmtool. >> >> I see two issues here: first, PUD_SIZE = 512MB, but S2_PUD_SIZE = 4TB >> (checked >> using printk), and second, stage2_set_pud_huge() hangs. I'm working on >> debugging them. > > I have this as an immediate fix for the set_pud_huge hang, tested > on Seattle with 64k/42bits. > > I can't wait to see the back of this code... > > M. > > From 2a345a826a47f9061bb37045a1d89ea54b51fb80 Mon Sep 17 00:00:00 2001 > From: Marc Zyngier <maz@kernel.org> > Date: Wed, 2 Sep 2020 11:18:29 +0100 > Subject: [PATCH] KVM: arm64: Do not try to map PUDs when they are folded > into > PMD > > For the obscure cases where PMD and PUD are the same size > (64kB pages with 42bit VA, for example, which results in only > two levels of page tables), we can't map anything as a PUD, > because there is... erm... no PUD to speak of. Everything is > either a PMD or a PTE. > > So let's only try and map a PUD when its size is different from > that of a PMD. > > Cc: stable@vger.kernel.org > Fixes: b8e0ba7c8bea ("KVM: arm64: Add support for creating PUD hugepages > at stage 2") > Reported-by: Gavin Shan <gshan@redhat.com> > Reported-by: Eric Auger <eric.auger@redhat.com> > Signed-off-by: Marc Zyngier <maz@kernel.org> > --- > arch/arm64/kvm/mmu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index ba00bcc0c884..c3a92fa537fd 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1970,7 +1970,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > phys_addr_t fault_ipa, > (fault_status == FSC_PERM && > stage2_is_exec(mmu, fault_ipa, vma_pagesize)); > > - if (vma_pagesize == PUD_SIZE) { > + if (PUD_SIZE != PMD_SIZE && vma_pagesize == PUD_SIZE) { > pud_t new_pud = kvm_pfn_pud(pfn, mem_type); > > new_pud = kvm_pud_mkhuge(new_pud); also works for me Tested-by: Eric Auger <eric.auger@redhat.com> Thanks Eric _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly 2020-09-02 11:10 ` Marc Zyngier 2020-09-02 11:53 ` Alexandru Elisei 2020-09-02 17:38 ` Auger Eric @ 2020-09-02 23:55 ` Gavin Shan 2 siblings, 0 replies; 12+ messages in thread From: Gavin Shan @ 2020-09-02 23:55 UTC (permalink / raw) To: Marc Zyngier, Alexandru Elisei; +Cc: kvmarm, shan.gavin Hi Marc, On 9/2/20 9:10 PM, Marc Zyngier wrote: > On 2020-09-02 11:59, Alexandru Elisei wrote: [...] > > From 2a345a826a47f9061bb37045a1d89ea54b51fb80 Mon Sep 17 00:00:00 2001 > From: Marc Zyngier <maz@kernel.org> > Date: Wed, 2 Sep 2020 11:18:29 +0100 > Subject: [PATCH] KVM: arm64: Do not try to map PUDs when they are folded into > PMD > > For the obscure cases where PMD and PUD are the same size > (64kB pages with 42bit VA, for example, which results in only > two levels of page tables), we can't map anything as a PUD, > because there is... erm... no PUD to speak of. Everything is > either a PMD or a PTE. > > So let's only try and map a PUD when its size is different from > that of a PMD. > > Cc: stable@vger.kernel.org > Fixes: b8e0ba7c8bea ("KVM: arm64: Add support for creating PUD hugepages at stage 2") > Reported-by: Gavin Shan <gshan@redhat.com> > Reported-by: Eric Auger <eric.auger@redhat.com> > Signed-off-by: Marc Zyngier <maz@kernel.org> > --- > arch/arm64/kvm/mmu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > It worked for me either: Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Gavin Shan <gshan@redhat.com> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index ba00bcc0c884..c3a92fa537fd 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1970,7 +1970,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > (fault_status == FSC_PERM && > stage2_is_exec(mmu, fault_ipa, vma_pagesize)); > > - if (vma_pagesize == PUD_SIZE) { > + if (PUD_SIZE != PMD_SIZE && vma_pagesize == PUD_SIZE) { > pud_t new_pud = kvm_pfn_pud(pfn, mem_type); > > new_pud = kvm_pud_mkhuge(new_pud); Thanks, Gavin _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2020-09-02 23:55 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-08-22 2:44 [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly Gavin Shan 2020-08-22 10:01 ` Marc Zyngier 2020-08-22 23:59 ` Gavin Shan 2020-09-01 16:50 ` Marc Zyngier 2020-09-02 10:59 ` Alexandru Elisei 2020-09-02 11:10 ` Marc Zyngier 2020-09-02 11:53 ` Alexandru Elisei 2020-09-02 11:56 ` Alexandru Elisei 2020-09-02 12:04 ` Marc Zyngier 2020-09-02 13:58 ` Alexandru Elisei 2020-09-02 17:38 ` Auger Eric 2020-09-02 23:55 ` Gavin Shan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox