All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sebastian Ott <sebott@redhat.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>,
	Quentin Perret <qperret@google.com>,
	 Fuad Tabba <tabba@google.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	 Will Deacon <will@kernel.org>, Mark Brown <broonie@kernel.org>,
	 Mark Rutland <mark.rutland@arm.com>,
	linux-arm-kernel@lists.infradead.org,  kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org
Subject: Re: LM regression: fce886a60207 KVM: arm64: Plumb the pKVM MMU in KVM
Date: Mon, 5 May 2025 18:05:08 +0200 (CEST)	[thread overview]
Message-ID: <7863e387-0b91-fac5-9925-e461ae7b30cd@redhat.com> (raw)
In-Reply-To: <86ldrbgl2x.wl-maz@kernel.org>

On Mon, 5 May 2025, Marc Zyngier wrote:
> On Mon, 05 May 2025 15:01:24 +0100,
> Sebastian Ott <sebott@redhat.com> wrote:
>>
>> On Mon, 5 May 2025, Marc Zyngier wrote:
>>> On Mon, 05 May 2025 11:52:00 +0100,
>>> Sebastian Ott <sebott@redhat.com> wrote:
>>>> Doing back and forth migrations currently fails on arm after a couple iterations.
>>>> During the failing migration KVM_RUN exits via guest_abort and returns -ENOMEM.
>>>> I can reliably reproduce this by migrating between 2 qemu instances on an ampere
>>>> altra machine. This fails after < 5 iterations. In this case qemu would spit out
>>>> smth like this (other than that - nothing in the logs):
>>>>
>>>> error: kvm run failed Cannot allocate memory
>>>>  PC=0000aaaae7d48590 X00=0000aaaae80a2e00 X01=0000aaaae7ea2fc0
>>>> X02=0000000001d3a5d0 X03=0000aaaae7eace8c X04=000000003b9aca00
>>>> X05=000000000000004a X06=000000000000004a X07=0000000028000000
>>>> X08=0000000000001d70 X09=0000000000000018 X10=000144b7d0000000
>>>> X11=00ffffffffffffff X12=000000008378f367 X13=0000aaab1a202d70
>>>> X14=0000000000000000 X15=0000000000000000 X16=0000ffffa2e2f7a8
>>>> X17=0000ffffa2541f20 X18=000000000000a000 X19=84bfda6288cf2dd6
>>>> X20=0000aaab1a1f1ce0 X21=000000007fffffff X22=0000ffffc5431788
>>>> X23=0000aaab1a17db60 X24=0000ffffc5431770 X25=0000000100000000
>>>> X26=0000004100000000 X27=0000000000000001 X28=0000aaab1a1f1c20
>>>> X29=0000ffffc54316d0 X30=0000aaaae7f8cd24  SP=0000ffffc5431650
>>>> PSTATE=20001000 --C- EL0t
>>>>
>>>> Guest and host are otherwise idle, kvm is in normal VHE mode.
>>>>
>>>> Git bisect points to (fce886a60207 "KVM: arm64: Plumb the pKVM MMU in KVM")
>>>> I also double checked that by reverting this commit on top of 6.14.
>>>
>>> Thanks for find the triggering commit. Can you further identify *what*
>>> causes the -ENOMEM? The only new -ENOMEM in that patch is the one
>>> added to topup_hyp_memcache(), which shouldn't be called.
>>
>> The kvm_pgtable_stage2_map() call in user_mem_abort() returns -ENOMEM
>> because the memcache pointer was not initialized!
>>
>> It looks like smth like this without other conditions could do the trick:
>>
>> if (!is_protected_kvm_enabled())
>> 	memcache = &vcpu->arch.mmu_page_cache;
>> else
>> 	memcache = &vcpu->arch.pkvm_memcache;
>>
>> (I'll try this now)
>
> Right, we end-up with an uninitialised memcache variable. Why isn't
> the compiler screaming?

Yea. Also I was under the impression that these kind of warnings tend to
over indicate..

> I think you can indeed simply hoist the init of memcache very early
> on, which should solve hopefully solve the issue.

It solves the issue for me. Please note that in this case topup cache is
not called - I hope that this is not an issue (but it was also the case
before commit fce886a60207).

>>
>>> Also, a failure to allocate would leave some nastygram in the kernel
>>> log, so it is unlikely to be an actual failure to allocate.
>>>
>>> Is it the first KVM_RUN that fails after migration?
>>
>> Nope, it happens on the side that triggers the migration.
>
> Probably because splitting pages requires allocating some memory, and
> all of a sudden you trigger the allocation from the memcache. Boo.
>
> Thanks for spotting this, I'm looking forward to the fix!

------->8
From c594bbf9c3c4186594b798734ea9b1779be3b584 Mon Sep 17 00:00:00 2001
From: Sebastian Ott <sebott@redhat.com>
Date: Mon, 5 May 2025 11:09:58 -0400
Subject: [PATCH] KVM: arm64: Fix uninitialized memcache pointer in user_mem_abort()

Commit fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM") made the
initialization of the local memcache variable in user_mem_abort()
conditional, leaving a codepath where it is used uninitialized via
kvm_pgtable_stage2_map().

This can lead to migration failures where KVM_RUN exits with -ENOMEM.
Fix this by making sure that memcache is always valid.

Fixes: fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM")
Signed-off-by: Sebastian Ott <sebott@redhat.com>
---
  arch/arm64/kvm/mmu.c | 7 +++++--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 754f2fe0cc67..6c3c658c5f29 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1501,6 +1501,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
  		return -EFAULT;
  	}

+	if (!is_protected_kvm_enabled())
+		memcache = &vcpu->arch.mmu_page_cache;
+	else
+		memcache = &vcpu->arch.pkvm_memcache;
+
  	/*
  	 * Permission faults just need to update the existing leaf entry,
  	 * and so normally don't require allocations from the memcache. The
@@ -1511,10 +1516,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
  		int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);

  		if (!is_protected_kvm_enabled()) {
-			memcache = &vcpu->arch.mmu_page_cache;
  			ret = kvm_mmu_topup_memory_cache(memcache, min_pages);
  		} else {
-			memcache = &vcpu->arch.pkvm_memcache;
  			ret = topup_hyp_memcache(memcache, min_pages);
  		}
  		if (ret)

base-commit: 92a09c47464d040866cf2b4cd052bc60555185fb
-- 
2.49.0


  reply	other threads:[~2025-05-05 16:05 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-05 10:52 LM regression: fce886a60207 KVM: arm64: Plumb the pKVM MMU in KVM Sebastian Ott
2025-05-05 11:34 ` Marc Zyngier
2025-05-05 14:01   ` Sebastian Ott
2025-05-05 14:59     ` Marc Zyngier
2025-05-05 16:05       ` Sebastian Ott [this message]
2025-05-05 16:26         ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7863e387-0b91-fac5-9925-e461ae7b30cd@redhat.com \
    --to=sebott@redhat.com \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=qperret@google.com \
    --cc=tabba@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.