Re: [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots

From: Marc Zyngier <maz@kernel.org>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: kvmarm@lists.cs.columbia.edu, kvmarm@lists.linux.dev,
	kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Alexandru Elisei <alexandru.elisei@arm.com>,
	Ard Biesheuvel <ardb@kernel.org>, Will Deacon <will@kernel.org>,
	Quentin Perret <qperret@google.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots
Date: Wed, 21 Dec 2022 09:35:06 +0000	[thread overview]
Message-ID: <86pmcdaylx.wl-maz@kernel.org> (raw)
In-Reply-To: <Y6IteDoK406o9pM+@google.com>

On Tue, 20 Dec 2022 21:47:36 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Hi Marc,
> 
> On Tue, Dec 20, 2022 at 08:09:21PM +0000, Marc Zyngier wrote:
> > A recent development on the EFI front has resulted in guests having
> > their page tables baked in the firmware binary, and mapped into
> > the IPA space as part as a read-only memslot.
> 
> as part of a
> 
> > Not only this is legitimate, but it also results in added security,
> > so thumbs up. However, this clashes mildly with our handling of a S1PTW
> > as a write to correctly handle AF/DB updates to the S1 PTs, and results
> > in the guest taking an abort it won't recover from (the PTs mapping the
> > vectors will suffer freom the same problem...).
> 
> To be clear, the read-only page tables already have the AF set,
> right?  They certainly must, or else the guest isn't getting far :)

Yes, the guest definitely has the AF set in the PT, and is not trying
to use the HW-assisted AF (which obviously wouldn't work).

>
> I understand you're trying to describe _why_ we promote S1PTW to a
> write, but doing it inline with the context of the EFI issue makes
> it slightly unclear. Could you break these ideas up into two
> paragraphs and maybe spell out the fault conditions a bit more?
> 
>   A recent development on the EFI front has resulted in guests having
>   their page tables baked in the firmware binary, and mapped into the
>   IPA space as part of a read-only memslot. Not only is this legitimate,
>   but it also results in added security, so thumbs up.
> 
>   It is possible to take an S1PTW translation fault if the S1 PTs are
>   unmapped at stage-2. However, KVM unconditionally treats S1PTW as a
>   write to correctly handle hardware AF/DB updates to the S1 PTs.
>   Furthermore, KVM injects a data abort into the guest for S1PTW writes.
>   In the aforementioned case this results in the guest taking an abort
>   it won't recover from, as the S1 PTs mapping the vectors suffer from
>   the same problem.
> 
> Dunno, maybe I stink at reading which is why I got confused in the
> first place.

Nothing wrong with you, just that my write-up is indeed sloppy. I'll
copy paste the above, thanks!

> 
> > So clearly our handling is... wrong.
> > 
> > Instead, switch to a two-pronged approach:
> > 
> > - On S1PTW translation fault, handle the fault as a read
> > 
> > - On S1PTW permission fault, handle the fault as a write
> > 
> > This is of no consequence to SW that *writes* to its PTs (the write
> > will trigger a non-S1PTW fault), and SW that uses RO PTs will not
> > use AF/DB anyway, as that'd be wrong.
> > 
> > Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
> > fault on S1PTW permission fault on instruction fetch") do we end-up
> > with two back-to-back faults (page being evicted and faulted back).
> > I don't think this is a case worth optimising for.
> > 
> > Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch")
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > Cc: stable@vger.kernel.org
> > ---
> >  arch/arm64/include/asm/kvm_emulate.h | 22 ++++++++++++++++++++--
> >  1 file changed, 20 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > index 9bdba47f7e14..fd6ad8b21f85 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -373,8 +373,26 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
> >  
> >  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> >  {
> > -	if (kvm_vcpu_abt_iss1tw(vcpu))
> > -		return true;
> > +	if (kvm_vcpu_abt_iss1tw(vcpu)) {
> > +		/*
> > +		 * Only a permission fault on a S1PTW should be
> > +		 * considered as a write. Otherwise, page tables baked
> > +		 * in a read-only memslot will result in an exception
> > +		 * being delivered in the guest.
> 
> Somewhat of a tangent, but:
> 
> Aren't we somewhat unaligned with the KVM UAPI by injecting an
> exception in this case? I know we've been doing it for a while, but it
> flies in the face of the rules outlined in the
> KVM_SET_USER_MEMORY_REGION documentation.

That's an interesting point, and I certainly haven't considered that
for faults introduced by page table walks.

I'm not sure what userspace can do with that though. The problem is
that this is a write for which we don't have useful data: although we
know it is a page-table walker access, we don't know what it was about
to write. The instruction that caused the write is meaningless (it
could either be a load, a store, or an instruction fetch). How do you
populate the data[] field then?

If anything, this is closer to KVM_EXIT_ARM_NISV, for which we give
userspace the full ESR and ask it to sort it out. I doubt it will be
able to, but hey, maybe it is worth a shot. This would need to be a
different exit reason though, as NISV is explicitly for non-memslot
stuff.

In any case, the documentation for KVM_SET_USER_MEMORY_REGION needs to
reflect the fact that KVM_EXIT_MMIO cannot represent a fault due to a
S1 PTW.

>
> > +		 * The drawback is that we end-up fauling twice if the
> 
> typo: s/fauling/faulting/
> 
> > +		 * guest is using any of HW AF/DB: a translation fault
> > +		 * to map the page containing the PT (read only at
> > +		 * first), then a permission fault to allow the flags
> > +		 * to be set.
> > +		 */
> > +		switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
> > +		case ESR_ELx_FSC_PERM:
> > +			return true;
> > +		default:
> > +			return false;
> > +		}
> > +	}
> >  
> >  	if (kvm_vcpu_trap_is_iabt(vcpu))
> >  		return false;
> > -- 
> > 2.34.1
> > 
> 
> Besides the changelog/comment suggestions, the patch looks good to me.
> 
> Reviewed-by: Oliver Upton <oliver.upton@linux.dev>

Thanks for the quick review! I'll wait a bit before respinning the
series, as I'd like to get closure on the UAPI point you have raised.

Cheers,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel