* LM regression: fce886a60207 KVM: arm64: Plumb the pKVM MMU in KVM @ 2025-05-05 10:52 Sebastian Ott 2025-05-05 11:34 ` Marc Zyngier 0 siblings, 1 reply; 6+ messages in thread From: Sebastian Ott @ 2025-05-05 10:52 UTC (permalink / raw) To: Marc Zyngier, Oliver Upton, Quentin Perret, Fuad Tabba Cc: Catalin Marinas, Will Deacon, Mark Brown, Mark Rutland, linux-arm-kernel, kvmarm, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1370 bytes --] Hey, Doing back and forth migrations currently fails on arm after a couple iterations. During the failing migration KVM_RUN exits via guest_abort and returns -ENOMEM. I can reliably reproduce this by migrating between 2 qemu instances on an ampere altra machine. This fails after < 5 iterations. In this case qemu would spit out smth like this (other than that - nothing in the logs): error: kvm run failed Cannot allocate memory PC=0000aaaae7d48590 X00=0000aaaae80a2e00 X01=0000aaaae7ea2fc0 X02=0000000001d3a5d0 X03=0000aaaae7eace8c X04=000000003b9aca00 X05=000000000000004a X06=000000000000004a X07=0000000028000000 X08=0000000000001d70 X09=0000000000000018 X10=000144b7d0000000 X11=00ffffffffffffff X12=000000008378f367 X13=0000aaab1a202d70 X14=0000000000000000 X15=0000000000000000 X16=0000ffffa2e2f7a8 X17=0000ffffa2541f20 X18=000000000000a000 X19=84bfda6288cf2dd6 X20=0000aaab1a1f1ce0 X21=000000007fffffff X22=0000ffffc5431788 X23=0000aaab1a17db60 X24=0000ffffc5431770 X25=0000000100000000 X26=0000004100000000 X27=0000000000000001 X28=0000aaab1a1f1c20 X29=0000ffffc54316d0 X30=0000aaaae7f8cd24 SP=0000ffffc5431650 PSTATE=20001000 --C- EL0t Guest and host are otherwise idle, kvm is in normal VHE mode. Git bisect points to (fce886a60207 "KVM: arm64: Plumb the pKVM MMU in KVM") I also double checked that by reverting this commit on top of 6.14. Sebastian [-- Attachment #2: Type: text/plain, Size: 2821 bytes --] git bisect log # bad: [38fec10eb60d687e30c8c6b5420d86e8149f7557] Linux 6.14 # good: [ffd294d346d185b70e28b1a28abe367bbfe53c04] Linux 6.13 git bisect start 'bad' 'good' # good: [2c8d2a510c15c003749e43ac2b8e1bc79a7a00d6] Merge tag 'sound-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound git bisect good 2c8d2a510c15c003749e43ac2b8e1bc79a7a00d6 # good: [9ff28f2fad67e173ed25b8c3a183b15da5445d2d] Merge tag 'loongarch-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson git bisect good 9ff28f2fad67e173ed25b8c3a183b15da5445d2d # bad: [243899076c3efdf98d8e922a802896424a597580] Merge tag 'rust-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux git bisect bad 243899076c3efdf98d8e922a802896424a597580 # bad: [95d7e8226106e3445b0d877015f4192c47d23637] Merge tag 'ata-6.14-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux git bisect bad 95d7e8226106e3445b0d877015f4192c47d23637 # bad: [7c775c6056d07eb777f37c7ac1340115b27dc9f8] Merge tag 'dmaengine-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine git bisect bad 7c775c6056d07eb777f37c7ac1340115b27dc9f8 # bad: [7c775c6056d07eb777f37c7ac1340115b27dc9f8] Merge tag 'dmaengine-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine git bisect bad 7c775c6056d07eb777f37c7ac1340115b27dc9f8 # bad: [f785692ff545aecb517d2609864e1c6d838329e6] Merge tag 'stop-machine.2025.01.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu git bisect bad f785692ff545aecb517d2609864e1c6d838329e6 # bad: [080612b2942ab7947303029e1fa33117b5280ece] Merge branch kvm-arm64/nv-timers into kvmarm-master/next git bisect bad 080612b2942ab7947303029e1fa33117b5280ece # bad: [d0670128d42fa170bf8ba878cd23504c5c5cccc7] Merge branch kvm-arm64/pkvm-np-guest into kvmarm-master/next git bisect bad d0670128d42fa170bf8ba878cd23504c5c5cccc7 # good: [c4a6ed85455979ef3fbadc2f1bdf18734b0ecea6] KVM: arm64: Promote guest ownership for DBGxVR/DBGxCR reads git bisect good c4a6ed85455979ef3fbadc2f1bdf18734b0ecea6 # good: [d0bd3e6570aee42766e7bd884734ae078667ea1e] KVM: arm64: Introduce __pkvm_host_share_guest() git bisect good d0bd3e6570aee42766e7bd884734ae078667ea1e # good: [76f0b18b3db57868fb0cabe691201aad3085b712] KVM: arm64: Introduce __pkvm_host_mkyoung_guest() git bisect good 76f0b18b3db57868fb0cabe691201aad3085b712 # good: [e912efed485a4c50bdc3934ae647e257ef568ef6] KVM: arm64: Introduce the EL1 pKVM MMU git bisect good e912efed485a4c50bdc3934ae647e257ef568ef6 # bad: [fce886a6020734d6253c2c5a3bc285e385cc5496] KVM: arm64: Plumb the pKVM MMU in KVM git bisect bad fce886a6020734d6253c2c5a3bc285e385cc5496 # first bad commit: [fce886a6020734d6253c2c5a3bc285e385cc5496] KVM: arm64: Plumb the pKVM MMU in KVM ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: LM regression: fce886a60207 KVM: arm64: Plumb the pKVM MMU in KVM 2025-05-05 10:52 LM regression: fce886a60207 KVM: arm64: Plumb the pKVM MMU in KVM Sebastian Ott @ 2025-05-05 11:34 ` Marc Zyngier 2025-05-05 14:01 ` Sebastian Ott 0 siblings, 1 reply; 6+ messages in thread From: Marc Zyngier @ 2025-05-05 11:34 UTC (permalink / raw) To: Sebastian Ott Cc: Oliver Upton, Quentin Perret, Fuad Tabba, Catalin Marinas, Will Deacon, Mark Brown, Mark Rutland, linux-arm-kernel, kvmarm, linux-kernel On Mon, 05 May 2025 11:52:00 +0100, Sebastian Ott <sebott@redhat.com> wrote: > > Hey, > > Doing back and forth migrations currently fails on arm after a couple iterations. > During the failing migration KVM_RUN exits via guest_abort and returns -ENOMEM. > I can reliably reproduce this by migrating between 2 qemu instances on an ampere > altra machine. This fails after < 5 iterations. In this case qemu would spit out > smth like this (other than that - nothing in the logs): > > error: kvm run failed Cannot allocate memory > PC=0000aaaae7d48590 X00=0000aaaae80a2e00 X01=0000aaaae7ea2fc0 > X02=0000000001d3a5d0 X03=0000aaaae7eace8c X04=000000003b9aca00 > X05=000000000000004a X06=000000000000004a X07=0000000028000000 > X08=0000000000001d70 X09=0000000000000018 X10=000144b7d0000000 > X11=00ffffffffffffff X12=000000008378f367 X13=0000aaab1a202d70 > X14=0000000000000000 X15=0000000000000000 X16=0000ffffa2e2f7a8 > X17=0000ffffa2541f20 X18=000000000000a000 X19=84bfda6288cf2dd6 > X20=0000aaab1a1f1ce0 X21=000000007fffffff X22=0000ffffc5431788 > X23=0000aaab1a17db60 X24=0000ffffc5431770 X25=0000000100000000 > X26=0000004100000000 X27=0000000000000001 X28=0000aaab1a1f1c20 > X29=0000ffffc54316d0 X30=0000aaaae7f8cd24 SP=0000ffffc5431650 > PSTATE=20001000 --C- EL0t > > Guest and host are otherwise idle, kvm is in normal VHE mode. > > Git bisect points to (fce886a60207 "KVM: arm64: Plumb the pKVM MMU in KVM") > I also double checked that by reverting this commit on top of 6.14. Thanks for find the triggering commit. Can you further identify *what* causes the -ENOMEM? The only new -ENOMEM in that patch is the one added to topup_hyp_memcache(), which shouldn't be called. Also, a failure to allocate would leave some nastygram in the kernel log, so it is unlikely to be an actual failure to allocate. Is it the first KVM_RUN that fails after migration? M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: LM regression: fce886a60207 KVM: arm64: Plumb the pKVM MMU in KVM 2025-05-05 11:34 ` Marc Zyngier @ 2025-05-05 14:01 ` Sebastian Ott 2025-05-05 14:59 ` Marc Zyngier 0 siblings, 1 reply; 6+ messages in thread From: Sebastian Ott @ 2025-05-05 14:01 UTC (permalink / raw) To: Marc Zyngier Cc: Oliver Upton, Quentin Perret, Fuad Tabba, Catalin Marinas, Will Deacon, Mark Brown, Mark Rutland, linux-arm-kernel, kvmarm, linux-kernel On Mon, 5 May 2025, Marc Zyngier wrote: > On Mon, 05 May 2025 11:52:00 +0100, > Sebastian Ott <sebott@redhat.com> wrote: >> Doing back and forth migrations currently fails on arm after a couple iterations. >> During the failing migration KVM_RUN exits via guest_abort and returns -ENOMEM. >> I can reliably reproduce this by migrating between 2 qemu instances on an ampere >> altra machine. This fails after < 5 iterations. In this case qemu would spit out >> smth like this (other than that - nothing in the logs): >> >> error: kvm run failed Cannot allocate memory >> PC=0000aaaae7d48590 X00=0000aaaae80a2e00 X01=0000aaaae7ea2fc0 >> X02=0000000001d3a5d0 X03=0000aaaae7eace8c X04=000000003b9aca00 >> X05=000000000000004a X06=000000000000004a X07=0000000028000000 >> X08=0000000000001d70 X09=0000000000000018 X10=000144b7d0000000 >> X11=00ffffffffffffff X12=000000008378f367 X13=0000aaab1a202d70 >> X14=0000000000000000 X15=0000000000000000 X16=0000ffffa2e2f7a8 >> X17=0000ffffa2541f20 X18=000000000000a000 X19=84bfda6288cf2dd6 >> X20=0000aaab1a1f1ce0 X21=000000007fffffff X22=0000ffffc5431788 >> X23=0000aaab1a17db60 X24=0000ffffc5431770 X25=0000000100000000 >> X26=0000004100000000 X27=0000000000000001 X28=0000aaab1a1f1c20 >> X29=0000ffffc54316d0 X30=0000aaaae7f8cd24 SP=0000ffffc5431650 >> PSTATE=20001000 --C- EL0t >> >> Guest and host are otherwise idle, kvm is in normal VHE mode. >> >> Git bisect points to (fce886a60207 "KVM: arm64: Plumb the pKVM MMU in KVM") >> I also double checked that by reverting this commit on top of 6.14. > > Thanks for find the triggering commit. Can you further identify *what* > causes the -ENOMEM? The only new -ENOMEM in that patch is the one > added to topup_hyp_memcache(), which shouldn't be called. The kvm_pgtable_stage2_map() call in user_mem_abort() returns -ENOMEM because the memcache pointer was not initialized! It looks like smth like this without other conditions could do the trick: if (!is_protected_kvm_enabled()) memcache = &vcpu->arch.mmu_page_cache; else memcache = &vcpu->arch.pkvm_memcache; (I'll try this now) > Also, a failure to allocate would leave some nastygram in the kernel > log, so it is unlikely to be an actual failure to allocate. > > Is it the first KVM_RUN that fails after migration? Nope, it happens on the side that triggers the migration. Sebastian ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: LM regression: fce886a60207 KVM: arm64: Plumb the pKVM MMU in KVM 2025-05-05 14:01 ` Sebastian Ott @ 2025-05-05 14:59 ` Marc Zyngier 2025-05-05 16:05 ` Sebastian Ott 0 siblings, 1 reply; 6+ messages in thread From: Marc Zyngier @ 2025-05-05 14:59 UTC (permalink / raw) To: Sebastian Ott Cc: Oliver Upton, Quentin Perret, Fuad Tabba, Catalin Marinas, Will Deacon, Mark Brown, Mark Rutland, linux-arm-kernel, kvmarm, linux-kernel On Mon, 05 May 2025 15:01:24 +0100, Sebastian Ott <sebott@redhat.com> wrote: > > On Mon, 5 May 2025, Marc Zyngier wrote: > > On Mon, 05 May 2025 11:52:00 +0100, > > Sebastian Ott <sebott@redhat.com> wrote: > >> Doing back and forth migrations currently fails on arm after a couple iterations. > >> During the failing migration KVM_RUN exits via guest_abort and returns -ENOMEM. > >> I can reliably reproduce this by migrating between 2 qemu instances on an ampere > >> altra machine. This fails after < 5 iterations. In this case qemu would spit out > >> smth like this (other than that - nothing in the logs): > >> > >> error: kvm run failed Cannot allocate memory > >> PC=0000aaaae7d48590 X00=0000aaaae80a2e00 X01=0000aaaae7ea2fc0 > >> X02=0000000001d3a5d0 X03=0000aaaae7eace8c X04=000000003b9aca00 > >> X05=000000000000004a X06=000000000000004a X07=0000000028000000 > >> X08=0000000000001d70 X09=0000000000000018 X10=000144b7d0000000 > >> X11=00ffffffffffffff X12=000000008378f367 X13=0000aaab1a202d70 > >> X14=0000000000000000 X15=0000000000000000 X16=0000ffffa2e2f7a8 > >> X17=0000ffffa2541f20 X18=000000000000a000 X19=84bfda6288cf2dd6 > >> X20=0000aaab1a1f1ce0 X21=000000007fffffff X22=0000ffffc5431788 > >> X23=0000aaab1a17db60 X24=0000ffffc5431770 X25=0000000100000000 > >> X26=0000004100000000 X27=0000000000000001 X28=0000aaab1a1f1c20 > >> X29=0000ffffc54316d0 X30=0000aaaae7f8cd24 SP=0000ffffc5431650 > >> PSTATE=20001000 --C- EL0t > >> > >> Guest and host are otherwise idle, kvm is in normal VHE mode. > >> > >> Git bisect points to (fce886a60207 "KVM: arm64: Plumb the pKVM MMU in KVM") > >> I also double checked that by reverting this commit on top of 6.14. > > > > Thanks for find the triggering commit. Can you further identify *what* > > causes the -ENOMEM? The only new -ENOMEM in that patch is the one > > added to topup_hyp_memcache(), which shouldn't be called. > > The kvm_pgtable_stage2_map() call in user_mem_abort() returns -ENOMEM > because the memcache pointer was not initialized! > > It looks like smth like this without other conditions could do the trick: > > if (!is_protected_kvm_enabled()) > memcache = &vcpu->arch.mmu_page_cache; > else > memcache = &vcpu->arch.pkvm_memcache; > > (I'll try this now) Right, we end-up with an uninitialised memcache variable. Why isn't the compiler screaming? I think you can indeed simply hoist the init of memcache very early on, which should solve hopefully solve the issue. > > > Also, a failure to allocate would leave some nastygram in the kernel > > log, so it is unlikely to be an actual failure to allocate. > > > > Is it the first KVM_RUN that fails after migration? > > Nope, it happens on the side that triggers the migration. Probably because splitting pages requires allocating some memory, and all of a sudden you trigger the allocation from the memcache. Boo. Thanks for spotting this, I'm looking forward to the fix! Thanks, M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: LM regression: fce886a60207 KVM: arm64: Plumb the pKVM MMU in KVM 2025-05-05 14:59 ` Marc Zyngier @ 2025-05-05 16:05 ` Sebastian Ott 2025-05-05 16:26 ` Marc Zyngier 0 siblings, 1 reply; 6+ messages in thread From: Sebastian Ott @ 2025-05-05 16:05 UTC (permalink / raw) To: Marc Zyngier Cc: Oliver Upton, Quentin Perret, Fuad Tabba, Catalin Marinas, Will Deacon, Mark Brown, Mark Rutland, linux-arm-kernel, kvmarm, linux-kernel On Mon, 5 May 2025, Marc Zyngier wrote: > On Mon, 05 May 2025 15:01:24 +0100, > Sebastian Ott <sebott@redhat.com> wrote: >> >> On Mon, 5 May 2025, Marc Zyngier wrote: >>> On Mon, 05 May 2025 11:52:00 +0100, >>> Sebastian Ott <sebott@redhat.com> wrote: >>>> Doing back and forth migrations currently fails on arm after a couple iterations. >>>> During the failing migration KVM_RUN exits via guest_abort and returns -ENOMEM. >>>> I can reliably reproduce this by migrating between 2 qemu instances on an ampere >>>> altra machine. This fails after < 5 iterations. In this case qemu would spit out >>>> smth like this (other than that - nothing in the logs): >>>> >>>> error: kvm run failed Cannot allocate memory >>>> PC=0000aaaae7d48590 X00=0000aaaae80a2e00 X01=0000aaaae7ea2fc0 >>>> X02=0000000001d3a5d0 X03=0000aaaae7eace8c X04=000000003b9aca00 >>>> X05=000000000000004a X06=000000000000004a X07=0000000028000000 >>>> X08=0000000000001d70 X09=0000000000000018 X10=000144b7d0000000 >>>> X11=00ffffffffffffff X12=000000008378f367 X13=0000aaab1a202d70 >>>> X14=0000000000000000 X15=0000000000000000 X16=0000ffffa2e2f7a8 >>>> X17=0000ffffa2541f20 X18=000000000000a000 X19=84bfda6288cf2dd6 >>>> X20=0000aaab1a1f1ce0 X21=000000007fffffff X22=0000ffffc5431788 >>>> X23=0000aaab1a17db60 X24=0000ffffc5431770 X25=0000000100000000 >>>> X26=0000004100000000 X27=0000000000000001 X28=0000aaab1a1f1c20 >>>> X29=0000ffffc54316d0 X30=0000aaaae7f8cd24 SP=0000ffffc5431650 >>>> PSTATE=20001000 --C- EL0t >>>> >>>> Guest and host are otherwise idle, kvm is in normal VHE mode. >>>> >>>> Git bisect points to (fce886a60207 "KVM: arm64: Plumb the pKVM MMU in KVM") >>>> I also double checked that by reverting this commit on top of 6.14. >>> >>> Thanks for find the triggering commit. Can you further identify *what* >>> causes the -ENOMEM? The only new -ENOMEM in that patch is the one >>> added to topup_hyp_memcache(), which shouldn't be called. >> >> The kvm_pgtable_stage2_map() call in user_mem_abort() returns -ENOMEM >> because the memcache pointer was not initialized! >> >> It looks like smth like this without other conditions could do the trick: >> >> if (!is_protected_kvm_enabled()) >> memcache = &vcpu->arch.mmu_page_cache; >> else >> memcache = &vcpu->arch.pkvm_memcache; >> >> (I'll try this now) > > Right, we end-up with an uninitialised memcache variable. Why isn't > the compiler screaming? Yea. Also I was under the impression that these kind of warnings tend to over indicate.. > I think you can indeed simply hoist the init of memcache very early > on, which should solve hopefully solve the issue. It solves the issue for me. Please note that in this case topup cache is not called - I hope that this is not an issue (but it was also the case before commit fce886a60207). >> >>> Also, a failure to allocate would leave some nastygram in the kernel >>> log, so it is unlikely to be an actual failure to allocate. >>> >>> Is it the first KVM_RUN that fails after migration? >> >> Nope, it happens on the side that triggers the migration. > > Probably because splitting pages requires allocating some memory, and > all of a sudden you trigger the allocation from the memcache. Boo. > > Thanks for spotting this, I'm looking forward to the fix! ------->8 From c594bbf9c3c4186594b798734ea9b1779be3b584 Mon Sep 17 00:00:00 2001 From: Sebastian Ott <sebott@redhat.com> Date: Mon, 5 May 2025 11:09:58 -0400 Subject: [PATCH] KVM: arm64: Fix uninitialized memcache pointer in user_mem_abort() Commit fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM") made the initialization of the local memcache variable in user_mem_abort() conditional, leaving a codepath where it is used uninitialized via kvm_pgtable_stage2_map(). This can lead to migration failures where KVM_RUN exits with -ENOMEM. Fix this by making sure that memcache is always valid. Fixes: fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM") Signed-off-by: Sebastian Ott <sebott@redhat.com> --- arch/arm64/kvm/mmu.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 754f2fe0cc67..6c3c658c5f29 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1501,6 +1501,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } + if (!is_protected_kvm_enabled()) + memcache = &vcpu->arch.mmu_page_cache; + else + memcache = &vcpu->arch.pkvm_memcache; + /* * Permission faults just need to update the existing leaf entry, * and so normally don't require allocations from the memcache. The @@ -1511,10 +1516,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu); if (!is_protected_kvm_enabled()) { - memcache = &vcpu->arch.mmu_page_cache; ret = kvm_mmu_topup_memory_cache(memcache, min_pages); } else { - memcache = &vcpu->arch.pkvm_memcache; ret = topup_hyp_memcache(memcache, min_pages); } if (ret) base-commit: 92a09c47464d040866cf2b4cd052bc60555185fb -- 2.49.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: LM regression: fce886a60207 KVM: arm64: Plumb the pKVM MMU in KVM 2025-05-05 16:05 ` Sebastian Ott @ 2025-05-05 16:26 ` Marc Zyngier 0 siblings, 0 replies; 6+ messages in thread From: Marc Zyngier @ 2025-05-05 16:26 UTC (permalink / raw) To: Sebastian Ott Cc: Oliver Upton, Quentin Perret, Fuad Tabba, Catalin Marinas, Will Deacon, Mark Brown, Mark Rutland, linux-arm-kernel, kvmarm, linux-kernel On Mon, 05 May 2025 17:05:08 +0100, Sebastian Ott <sebott@redhat.com> wrote: > > On Mon, 5 May 2025, Marc Zyngier wrote: > > On Mon, 05 May 2025 15:01:24 +0100, > > Sebastian Ott <sebott@redhat.com> wrote: > >> > >> On Mon, 5 May 2025, Marc Zyngier wrote: > >>> On Mon, 05 May 2025 11:52:00 +0100, > >>> Sebastian Ott <sebott@redhat.com> wrote: > >>>> Doing back and forth migrations currently fails on arm after a couple iterations. > >>>> During the failing migration KVM_RUN exits via guest_abort and returns -ENOMEM. > >>>> I can reliably reproduce this by migrating between 2 qemu instances on an ampere > >>>> altra machine. This fails after < 5 iterations. In this case qemu would spit out > >>>> smth like this (other than that - nothing in the logs): > >>>> > >>>> error: kvm run failed Cannot allocate memory > >>>> PC=0000aaaae7d48590 X00=0000aaaae80a2e00 X01=0000aaaae7ea2fc0 > >>>> X02=0000000001d3a5d0 X03=0000aaaae7eace8c X04=000000003b9aca00 > >>>> X05=000000000000004a X06=000000000000004a X07=0000000028000000 > >>>> X08=0000000000001d70 X09=0000000000000018 X10=000144b7d0000000 > >>>> X11=00ffffffffffffff X12=000000008378f367 X13=0000aaab1a202d70 > >>>> X14=0000000000000000 X15=0000000000000000 X16=0000ffffa2e2f7a8 > >>>> X17=0000ffffa2541f20 X18=000000000000a000 X19=84bfda6288cf2dd6 > >>>> X20=0000aaab1a1f1ce0 X21=000000007fffffff X22=0000ffffc5431788 > >>>> X23=0000aaab1a17db60 X24=0000ffffc5431770 X25=0000000100000000 > >>>> X26=0000004100000000 X27=0000000000000001 X28=0000aaab1a1f1c20 > >>>> X29=0000ffffc54316d0 X30=0000aaaae7f8cd24 SP=0000ffffc5431650 > >>>> PSTATE=20001000 --C- EL0t > >>>> > >>>> Guest and host are otherwise idle, kvm is in normal VHE mode. > >>>> > >>>> Git bisect points to (fce886a60207 "KVM: arm64: Plumb the pKVM MMU in KVM") > >>>> I also double checked that by reverting this commit on top of 6.14. > >>> > >>> Thanks for find the triggering commit. Can you further identify *what* > >>> causes the -ENOMEM? The only new -ENOMEM in that patch is the one > >>> added to topup_hyp_memcache(), which shouldn't be called. > >> > >> The kvm_pgtable_stage2_map() call in user_mem_abort() returns -ENOMEM > >> because the memcache pointer was not initialized! > >> > >> It looks like smth like this without other conditions could do the trick: > >> > >> if (!is_protected_kvm_enabled()) > >> memcache = &vcpu->arch.mmu_page_cache; > >> else > >> memcache = &vcpu->arch.pkvm_memcache; > >> > >> (I'll try this now) > > > > Right, we end-up with an uninitialised memcache variable. Why isn't > > the compiler screaming? > > Yea. Also I was under the impression that these kind of warnings tend to > over indicate.. Quite. The obvious conclusion is that compilers are crap. Oh well. > > > I think you can indeed simply hoist the init of memcache very early > > on, which should solve hopefully solve the issue. > > It solves the issue for me. Please note that in this case topup cache is > not called - I hope that this is not an issue (but it was also the case > before commit fce886a60207). Indeed, and that's how I expect it to fail. We're also pretty lucky that we get a NULL instead of some random pointer... > From c594bbf9c3c4186594b798734ea9b1779be3b584 Mon Sep 17 00:00:00 2001 > From: Sebastian Ott <sebott@redhat.com> > Date: Mon, 5 May 2025 11:09:58 -0400 > Subject: [PATCH] KVM: arm64: Fix uninitialized memcache pointer in user_mem_abort() > > Commit fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM") made the > initialization of the local memcache variable in user_mem_abort() > conditional, leaving a codepath where it is used uninitialized via > kvm_pgtable_stage2_map(). > > This can lead to migration failures where KVM_RUN exits with -ENOMEM. Not only. It can fail on any path that requires a stage-2 allocation without transition via a permission fault or logging. It is actually pretty amazing that it fails so rarely... > Fix this by making sure that memcache is always valid. > > Fixes: fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM") > Signed-off-by: Sebastian Ott <sebott@redhat.com> + Cc: stable@vger.kernel.org > --- > arch/arm64/kvm/mmu.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 754f2fe0cc67..6c3c658c5f29 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1501,6 +1501,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > return -EFAULT; > } > > + if (!is_protected_kvm_enabled()) > + memcache = &vcpu->arch.mmu_page_cache; > + else > + memcache = &vcpu->arch.pkvm_memcache; > + > /* > * Permission faults just need to update the existing leaf entry, > * and so normally don't require allocations from the memcache. The > @@ -1511,10 +1516,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu); > > if (!is_protected_kvm_enabled()) { > - memcache = &vcpu->arch.mmu_page_cache; > ret = kvm_mmu_topup_memory_cache(memcache, min_pages); > } else { > - memcache = &vcpu->arch.pkvm_memcache; Feel free to drop the {}s that are now superfluous. > ret = topup_hyp_memcache(memcache, min_pages); > } > if (ret) > With the above addressed: Reviewed-by: Marc Zyngier <maz@kernel.org> Please post it as a standalone patch so that Oliver can add it to the next batch of fixes for 6.15. Thanks, M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-05-05 16:41 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-05 10:52 LM regression: fce886a60207 KVM: arm64: Plumb the pKVM MMU in KVM Sebastian Ott 2025-05-05 11:34 ` Marc Zyngier 2025-05-05 14:01 ` Sebastian Ott 2025-05-05 14:59 ` Marc Zyngier 2025-05-05 16:05 ` Sebastian Ott 2025-05-05 16:26 ` Marc Zyngier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).