[PATCH 0/3] KVM: arm64: Fix handling of S1PTW S2 fault on RO memslots

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] KVM: arm64: Fix handling of S1PTW S2 fault on RO memslots
@ 2022-12-20 20:09 Marc Zyngier
  2022-12-20 20:09 ` [PATCH 1/3] KVM: arm64: Fix S1PTW handling " Marc Zyngier
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Marc Zyngier @ 2022-12-20 20:09 UTC (permalink / raw)
  To: kvmarm, kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Alexandru Elisei, Oliver Upton,
	Ard Biesheuvel, Will Deacon, Quentin Perret

Recent developments on the EFI front have resulted in guests that
simply won't boot if the page tables are in a read-only memslot and
that you're a bit unlucky in the way S2 gets paged in... The core
issue is related to the fact that we treat a S1PTW as a write, which
is close enough to what needs to be done. Until to get to RO memslots.

The first patch fixes this and is definitely a stable candidate. It
splits the faulting of page tables in two steps (RO translation fault,
followed by a writable permission fault -- should it even happen).
The second one is a potential optimisation. I'm not even sure it is
worth it. The last patch is totally optional, only tangentially
related, and randomly repainting stuff (maybe that's contagious, who
knows).

The whole thing is on top of Linus' tree as of today. The reason for
this very random choice is that there is a patch in v6.1-rc7 that
hides the problem, and that patch is reverted in rc8 (see commit
0ba09b1733878afe838fe35c310715fda3d46428). I also wanted to avoid
conflicts with kvmarm/next, so here you go.

I've tested the series on A55, M1 and M2. The original issue seems to
trigger best with 16kB pages, so please test with *other* page sizes!

	M.

Marc Zyngier (3):
  KVM: arm64: Fix S1PTW handling on RO memslots
  KVM: arm64: Handle S1PTW translation with TCR_HA set as a write
  KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*

 arch/arm64/include/asm/esr.h            |  9 ++++
 arch/arm64/include/asm/kvm_arm.h        | 15 -------
 arch/arm64/include/asm/kvm_emulate.h    | 60 ++++++++++++++++++++-----
 arch/arm64/kvm/hyp/include/hyp/fault.h  |  2 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h |  2 +-
 arch/arm64/kvm/mmu.c                    | 21 +++++----
 6 files changed, 71 insertions(+), 38 deletions(-)

-- 
2.34.1

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots
  2022-12-20 20:09 [PATCH 0/3] KVM: arm64: Fix handling of S1PTW S2 fault on RO memslots Marc Zyngier
@ 2022-12-20 20:09 ` Marc Zyngier
  2022-12-20 21:47   ` Oliver Upton
  2022-12-22 13:01   ` Ard Biesheuvel
  2022-12-20 20:09 ` [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write Marc Zyngier
  2022-12-20 20:09 ` [PATCH 3/3] KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_* Marc Zyngier
  2 siblings, 2 replies; 20+ messages in thread
From: Marc Zyngier @ 2022-12-20 20:09 UTC (permalink / raw)
  To: kvmarm, kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Alexandru Elisei, Oliver Upton,
	Ard Biesheuvel, Will Deacon, Quentin Perret, stable

A recent development on the EFI front has resulted in guests having
their page tables baked in the firmware binary, and mapped into
the IPA space as part as a read-only memslot.

Not only this is legitimate, but it also results in added security,
so thumbs up. However, this clashes mildly with our handling of a S1PTW
as a write to correctly handle AF/DB updates to the S1 PTs, and results
in the guest taking an abort it won't recover from (the PTs mapping the
vectors will suffer freom the same problem...).

So clearly our handling is... wrong.

Instead, switch to a two-pronged approach:

- On S1PTW translation fault, handle the fault as a read

- On S1PTW permission fault, handle the fault as a write

This is of no consequence to SW that *writes* to its PTs (the write
will trigger a non-S1PTW fault), and SW that uses RO PTs will not
use AF/DB anyway, as that'd be wrong.

Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
fault on S1PTW permission fault on instruction fetch") do we end-up
with two back-to-back faults (page being evicted and faulted back).
I don't think this is a case worth optimising for.

Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
---
 arch/arm64/include/asm/kvm_emulate.h | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 9bdba47f7e14..fd6ad8b21f85 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -373,8 +373,26 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
 
 static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
 {
-	if (kvm_vcpu_abt_iss1tw(vcpu))
-		return true;
+	if (kvm_vcpu_abt_iss1tw(vcpu)) {
+		/*
+		 * Only a permission fault on a S1PTW should be
+		 * considered as a write. Otherwise, page tables baked
+		 * in a read-only memslot will result in an exception
+		 * being delivered in the guest.
+		 *
+		 * The drawback is that we end-up fauling twice if the
+		 * guest is using any of HW AF/DB: a translation fault
+		 * to map the page containing the PT (read only at
+		 * first), then a permission fault to allow the flags
+		 * to be set.
+		 */
+		switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
+		case ESR_ELx_FSC_PERM:
+			return true;
+		default:
+			return false;
+		}
+	}
 
 	if (kvm_vcpu_trap_is_iabt(vcpu))
 		return false;
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots
  2022-12-20 20:09 ` [PATCH 1/3] KVM: arm64: Fix S1PTW handling " Marc Zyngier
@ 2022-12-20 21:47   ` Oliver Upton
  2022-12-21  9:35     ` Marc Zyngier
  2022-12-22 13:01   ` Ard Biesheuvel
  1 sibling, 1 reply; 20+ messages in thread
From: Oliver Upton @ 2022-12-20 21:47 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvmarm, kvm, linux-arm-kernel, James Morse,
	Suzuki K Poulose, Alexandru Elisei, Ard Biesheuvel, Will Deacon,
	Quentin Perret, stable

Hi Marc,

On Tue, Dec 20, 2022 at 08:09:21PM +0000, Marc Zyngier wrote:
> A recent development on the EFI front has resulted in guests having
> their page tables baked in the firmware binary, and mapped into
> the IPA space as part as a read-only memslot.

as part of a

> Not only this is legitimate, but it also results in added security,
> so thumbs up. However, this clashes mildly with our handling of a S1PTW
> as a write to correctly handle AF/DB updates to the S1 PTs, and results
> in the guest taking an abort it won't recover from (the PTs mapping the
> vectors will suffer freom the same problem...).

To be clear, the read-only page tables already have the AF set, right?
They certainly must, or else the guest isn't getting far :)

I understand you're trying to describe _why_ we promote S1PTW to
a write, but doing it inline with the context of the EFI issue makes it
slightly unclear. Could you break these ideas up into two paragraphs and
maybe spell out the fault conditions a bit more?

  A recent development on the EFI front has resulted in guests having
  their page tables baked in the firmware binary, and mapped into the
  IPA space as part of a read-only memslot. Not only is this legitimate,
  but it also results in added security, so thumbs up.

  It is possible to take an S1PTW translation fault if the S1 PTs are
  unmapped at stage-2. However, KVM unconditionally treats S1PTW as a
  write to correctly handle hardware AF/DB updates to the S1 PTs.
  Furthermore, KVM injects a data abort into the guest for S1PTW writes.
  In the aforementioned case this results in the guest taking an abort
  it won't recover from, as the S1 PTs mapping the vectors suffer from
  the same problem.

Dunno, maybe I stink at reading which is why I got confused in the first
place.

> So clearly our handling is... wrong.
> 
> Instead, switch to a two-pronged approach:
> 
> - On S1PTW translation fault, handle the fault as a read
> 
> - On S1PTW permission fault, handle the fault as a write
> 
> This is of no consequence to SW that *writes* to its PTs (the write
> will trigger a non-S1PTW fault), and SW that uses RO PTs will not
> use AF/DB anyway, as that'd be wrong.
> 
> Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
> fault on S1PTW permission fault on instruction fetch") do we end-up
> with two back-to-back faults (page being evicted and faulted back).
> I don't think this is a case worth optimising for.
> 
> Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch")
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Cc: stable@vger.kernel.org
> ---
>  arch/arm64/include/asm/kvm_emulate.h | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 9bdba47f7e14..fd6ad8b21f85 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -373,8 +373,26 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
>  
>  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>  {
> -	if (kvm_vcpu_abt_iss1tw(vcpu))
> -		return true;
> +	if (kvm_vcpu_abt_iss1tw(vcpu)) {
> +		/*
> +		 * Only a permission fault on a S1PTW should be
> +		 * considered as a write. Otherwise, page tables baked
> +		 * in a read-only memslot will result in an exception
> +		 * being delivered in the guest.

Somewhat of a tangent, but:

Aren't we somewhat unaligned with the KVM UAPI by injecting an
exception in this case? I know we've been doing it for a while, but it
flies in the face of the rules outlined in the
KVM_SET_USER_MEMORY_REGION documentation.

> +		 * The drawback is that we end-up fauling twice if the

typo: s/fauling/faulting/

> +		 * guest is using any of HW AF/DB: a translation fault
> +		 * to map the page containing the PT (read only at
> +		 * first), then a permission fault to allow the flags
> +		 * to be set.
> +		 */
> +		switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
> +		case ESR_ELx_FSC_PERM:
> +			return true;
> +		default:
> +			return false;
> +		}
> +	}
>  
>  	if (kvm_vcpu_trap_is_iabt(vcpu))
>  		return false;
> -- 
> 2.34.1
> 

Besides the changelog/comment suggestions, the patch looks good to me.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots
  2022-12-20 21:47   ` Oliver Upton
@ 2022-12-21  9:35     ` Marc Zyngier
  2022-12-21 16:50       ` Oliver Upton
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2022-12-21  9:35 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, kvmarm, kvm, linux-arm-kernel, James Morse,
	Suzuki K Poulose, Alexandru Elisei, Ard Biesheuvel, Will Deacon,
	Quentin Perret, stable

On Tue, 20 Dec 2022 21:47:36 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Hi Marc,
> 
> On Tue, Dec 20, 2022 at 08:09:21PM +0000, Marc Zyngier wrote:
> > A recent development on the EFI front has resulted in guests having
> > their page tables baked in the firmware binary, and mapped into
> > the IPA space as part as a read-only memslot.
> 
> as part of a
> 
> > Not only this is legitimate, but it also results in added security,
> > so thumbs up. However, this clashes mildly with our handling of a S1PTW
> > as a write to correctly handle AF/DB updates to the S1 PTs, and results
> > in the guest taking an abort it won't recover from (the PTs mapping the
> > vectors will suffer freom the same problem...).
> 
> To be clear, the read-only page tables already have the AF set,
> right?  They certainly must, or else the guest isn't getting far :)

Yes, the guest definitely has the AF set in the PT, and is not trying
to use the HW-assisted AF (which obviously wouldn't work).

>
> I understand you're trying to describe _why_ we promote S1PTW to a
> write, but doing it inline with the context of the EFI issue makes
> it slightly unclear. Could you break these ideas up into two
> paragraphs and maybe spell out the fault conditions a bit more?
> 
>   A recent development on the EFI front has resulted in guests having
>   their page tables baked in the firmware binary, and mapped into the
>   IPA space as part of a read-only memslot. Not only is this legitimate,
>   but it also results in added security, so thumbs up.
> 
>   It is possible to take an S1PTW translation fault if the S1 PTs are
>   unmapped at stage-2. However, KVM unconditionally treats S1PTW as a
>   write to correctly handle hardware AF/DB updates to the S1 PTs.
>   Furthermore, KVM injects a data abort into the guest for S1PTW writes.
>   In the aforementioned case this results in the guest taking an abort
>   it won't recover from, as the S1 PTs mapping the vectors suffer from
>   the same problem.
> 
> Dunno, maybe I stink at reading which is why I got confused in the
> first place.

Nothing wrong with you, just that my write-up is indeed sloppy. I'll
copy paste the above, thanks!

> 
> > So clearly our handling is... wrong.
> > 
> > Instead, switch to a two-pronged approach:
> > 
> > - On S1PTW translation fault, handle the fault as a read
> > 
> > - On S1PTW permission fault, handle the fault as a write
> > 
> > This is of no consequence to SW that *writes* to its PTs (the write
> > will trigger a non-S1PTW fault), and SW that uses RO PTs will not
> > use AF/DB anyway, as that'd be wrong.
> > 
> > Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
> > fault on S1PTW permission fault on instruction fetch") do we end-up
> > with two back-to-back faults (page being evicted and faulted back).
> > I don't think this is a case worth optimising for.
> > 
> > Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch")
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > Cc: stable@vger.kernel.org
> > ---
> >  arch/arm64/include/asm/kvm_emulate.h | 22 ++++++++++++++++++++--
> >  1 file changed, 20 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > index 9bdba47f7e14..fd6ad8b21f85 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -373,8 +373,26 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
> >  
> >  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> >  {
> > -	if (kvm_vcpu_abt_iss1tw(vcpu))
> > -		return true;
> > +	if (kvm_vcpu_abt_iss1tw(vcpu)) {
> > +		/*
> > +		 * Only a permission fault on a S1PTW should be
> > +		 * considered as a write. Otherwise, page tables baked
> > +		 * in a read-only memslot will result in an exception
> > +		 * being delivered in the guest.
> 
> Somewhat of a tangent, but:
> 
> Aren't we somewhat unaligned with the KVM UAPI by injecting an
> exception in this case? I know we've been doing it for a while, but it
> flies in the face of the rules outlined in the
> KVM_SET_USER_MEMORY_REGION documentation.

That's an interesting point, and I certainly haven't considered that
for faults introduced by page table walks.

I'm not sure what userspace can do with that though. The problem is
that this is a write for which we don't have useful data: although we
know it is a page-table walker access, we don't know what it was about
to write. The instruction that caused the write is meaningless (it
could either be a load, a store, or an instruction fetch). How do you
populate the data[] field then?

If anything, this is closer to KVM_EXIT_ARM_NISV, for which we give
userspace the full ESR and ask it to sort it out. I doubt it will be
able to, but hey, maybe it is worth a shot. This would need to be a
different exit reason though, as NISV is explicitly for non-memslot
stuff.

In any case, the documentation for KVM_SET_USER_MEMORY_REGION needs to
reflect the fact that KVM_EXIT_MMIO cannot represent a fault due to a
S1 PTW.

>
> > +		 * The drawback is that we end-up fauling twice if the
> 
> typo: s/fauling/faulting/
> 
> > +		 * guest is using any of HW AF/DB: a translation fault
> > +		 * to map the page containing the PT (read only at
> > +		 * first), then a permission fault to allow the flags
> > +		 * to be set.
> > +		 */
> > +		switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
> > +		case ESR_ELx_FSC_PERM:
> > +			return true;
> > +		default:
> > +			return false;
> > +		}
> > +	}
> >  
> >  	if (kvm_vcpu_trap_is_iabt(vcpu))
> >  		return false;
> > -- 
> > 2.34.1
> > 
> 
> Besides the changelog/comment suggestions, the patch looks good to me.
> 
> Reviewed-by: Oliver Upton <oliver.upton@linux.dev>

Thanks for the quick review! I'll wait a bit before respinning the
series, as I'd like to get closure on the UAPI point you have raised.

Cheers,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots
  2022-12-21  9:35     ` Marc Zyngier
@ 2022-12-21 16:50       ` Oliver Upton
  2022-12-21 17:53         ` Marc Zyngier
  0 siblings, 1 reply; 20+ messages in thread
From: Oliver Upton @ 2022-12-21 16:50 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvmarm, kvm, linux-arm-kernel, James Morse,
	Suzuki K Poulose, Alexandru Elisei, Ard Biesheuvel, Will Deacon,
	Quentin Perret, stable

On Wed, Dec 21, 2022 at 09:35:06AM +0000, Marc Zyngier wrote:

[...]

> > > +	if (kvm_vcpu_abt_iss1tw(vcpu)) {
> > > +		/*
> > > +		 * Only a permission fault on a S1PTW should be
> > > +		 * considered as a write. Otherwise, page tables baked
> > > +		 * in a read-only memslot will result in an exception
> > > +		 * being delivered in the guest.
> > 
> > Somewhat of a tangent, but:
> > 
> > Aren't we somewhat unaligned with the KVM UAPI by injecting an
> > exception in this case? I know we've been doing it for a while, but it
> > flies in the face of the rules outlined in the
> > KVM_SET_USER_MEMORY_REGION documentation.
> 
> That's an interesting point, and I certainly haven't considered that
> for faults introduced by page table walks.
> 
> I'm not sure what userspace can do with that though. The problem is
> that this is a write for which we don't have useful data: although we
> know it is a page-table walker access, we don't know what it was about
> to write. The instruction that caused the write is meaningless (it
> could either be a load, a store, or an instruction fetch). How do you
> populate the data[] field then?
> 
> If anything, this is closer to KVM_EXIT_ARM_NISV, for which we give
> userspace the full ESR and ask it to sort it out. I doubt it will be
> able to, but hey, maybe it is worth a shot. This would need to be a
> different exit reason though, as NISV is explicitly for non-memslot
> stuff.
> 
> In any case, the documentation for KVM_SET_USER_MEMORY_REGION needs to
> reflect the fact that KVM_EXIT_MMIO cannot represent a fault due to a
> S1 PTW.

Oh I completely agree with you here. I probably should have said before,
I think the exit would be useless anyway. Getting the documentation in
line with the intended behavior seems to be the best fix.

> >
> > > +		 * The drawback is that we end-up fauling twice if the
> > 
> > typo: s/fauling/faulting/
> > 
> > > +		 * guest is using any of HW AF/DB: a translation fault
> > > +		 * to map the page containing the PT (read only at
> > > +		 * first), then a permission fault to allow the flags
> > > +		 * to be set.
> > > +		 */
> > > +		switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
> > > +		case ESR_ELx_FSC_PERM:
> > > +			return true;
> > > +		default:
> > > +			return false;
> > > +		}
> > > +	}
> > >  
> > >  	if (kvm_vcpu_trap_is_iabt(vcpu))
> > >  		return false;
> > > -- 
> > > 2.34.1
> > > 
> > 
> > Besides the changelog/comment suggestions, the patch looks good to me.
> > 
> > Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
> 
> Thanks for the quick review! I'll wait a bit before respinning the
> series, as I'd like to get closure on the UAPI point you have raised.

I'm satisfied if you are :)

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots
  2022-12-21 16:50       ` Oliver Upton
@ 2022-12-21 17:53         ` Marc Zyngier
  2022-12-21 18:26           ` Oliver Upton
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2022-12-21 17:53 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, kvmarm, kvm, linux-arm-kernel, James Morse,
	Suzuki K Poulose, Alexandru Elisei, Ard Biesheuvel, Will Deacon,
	Quentin Perret, stable

On Wed, 21 Dec 2022 16:50:30 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Wed, Dec 21, 2022 at 09:35:06AM +0000, Marc Zyngier wrote:
> 
> [...]
> 
> > > > +	if (kvm_vcpu_abt_iss1tw(vcpu)) {
> > > > +		/*
> > > > +		 * Only a permission fault on a S1PTW should be
> > > > +		 * considered as a write. Otherwise, page tables baked
> > > > +		 * in a read-only memslot will result in an exception
> > > > +		 * being delivered in the guest.
> > > 
> > > Somewhat of a tangent, but:
> > > 
> > > Aren't we somewhat unaligned with the KVM UAPI by injecting an
> > > exception in this case? I know we've been doing it for a while, but it
> > > flies in the face of the rules outlined in the
> > > KVM_SET_USER_MEMORY_REGION documentation.
> > 
> > That's an interesting point, and I certainly haven't considered that
> > for faults introduced by page table walks.
> > 
> > I'm not sure what userspace can do with that though. The problem is
> > that this is a write for which we don't have useful data: although we
> > know it is a page-table walker access, we don't know what it was about
> > to write. The instruction that caused the write is meaningless (it
> > could either be a load, a store, or an instruction fetch). How do you
> > populate the data[] field then?
> > 
> > If anything, this is closer to KVM_EXIT_ARM_NISV, for which we give
> > userspace the full ESR and ask it to sort it out. I doubt it will be
> > able to, but hey, maybe it is worth a shot. This would need to be a
> > different exit reason though, as NISV is explicitly for non-memslot
> > stuff.
> > 
> > In any case, the documentation for KVM_SET_USER_MEMORY_REGION needs to
> > reflect the fact that KVM_EXIT_MMIO cannot represent a fault due to a
> > S1 PTW.
> 
> Oh I completely agree with you here. I probably should have said before,
> I think the exit would be useless anyway. Getting the documentation in
> line with the intended behavior seems to be the best fix.

Right. How about something like this?

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 226b40baffb8..72abd018a618 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1381,6 +1381,14 @@ It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl.
 The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
 allocation and is deprecated.
 
+Note: On arm64, a write generated by the page-table walker (to update
+the Access and Dirty flags, for example) never results in a
+KVM_EXIT_MMIO exit when the slot has the KVM_MEM_READONLY flag. This
+is because KVM cannot provide the data that would be written by the
+page-table walker, making it impossible to emulate the access.
+Instead, an abort (data abort if the cause of the page-table update
+was a load or a store, instruction abort if it was an instruction
+fetch) is injected in the guest.
 
 4.36 KVM_SET_TSS_ADDR
 ---------------------

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots
  2022-12-21 17:53         ` Marc Zyngier
@ 2022-12-21 18:26           ` Oliver Upton
  0 siblings, 0 replies; 20+ messages in thread
From: Oliver Upton @ 2022-12-21 18:26 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvmarm, kvm, linux-arm-kernel, James Morse,
	Suzuki K Poulose, Alexandru Elisei, Ard Biesheuvel, Will Deacon,
	Quentin Perret, stable

On Wed, Dec 21, 2022 at 05:53:58PM +0000, Marc Zyngier wrote:
> On Wed, 21 Dec 2022 16:50:30 +0000,
> Oliver Upton <oliver.upton@linux.dev> wrote:
> > 
> > On Wed, Dec 21, 2022 at 09:35:06AM +0000, Marc Zyngier wrote:
> > 
> > [...]
> > 
> > > > > +	if (kvm_vcpu_abt_iss1tw(vcpu)) {
> > > > > +		/*
> > > > > +		 * Only a permission fault on a S1PTW should be
> > > > > +		 * considered as a write. Otherwise, page tables baked
> > > > > +		 * in a read-only memslot will result in an exception
> > > > > +		 * being delivered in the guest.
> > > > 
> > > > Somewhat of a tangent, but:
> > > > 
> > > > Aren't we somewhat unaligned with the KVM UAPI by injecting an
> > > > exception in this case? I know we've been doing it for a while, but it
> > > > flies in the face of the rules outlined in the
> > > > KVM_SET_USER_MEMORY_REGION documentation.
> > > 
> > > That's an interesting point, and I certainly haven't considered that
> > > for faults introduced by page table walks.
> > > 
> > > I'm not sure what userspace can do with that though. The problem is
> > > that this is a write for which we don't have useful data: although we
> > > know it is a page-table walker access, we don't know what it was about
> > > to write. The instruction that caused the write is meaningless (it
> > > could either be a load, a store, or an instruction fetch). How do you
> > > populate the data[] field then?
> > > 
> > > If anything, this is closer to KVM_EXIT_ARM_NISV, for which we give
> > > userspace the full ESR and ask it to sort it out. I doubt it will be
> > > able to, but hey, maybe it is worth a shot. This would need to be a
> > > different exit reason though, as NISV is explicitly for non-memslot
> > > stuff.
> > > 
> > > In any case, the documentation for KVM_SET_USER_MEMORY_REGION needs to
> > > reflect the fact that KVM_EXIT_MMIO cannot represent a fault due to a
> > > S1 PTW.
> > 
> > Oh I completely agree with you here. I probably should have said before,
> > I think the exit would be useless anyway. Getting the documentation in
> > line with the intended behavior seems to be the best fix.
> 
> Right. How about something like this?

Looks good to me, thanks!

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>

> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 226b40baffb8..72abd018a618 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -1381,6 +1381,14 @@ It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl.
>  The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
>  allocation and is deprecated.
>  
> +Note: On arm64, a write generated by the page-table walker (to update
> +the Access and Dirty flags, for example) never results in a
> +KVM_EXIT_MMIO exit when the slot has the KVM_MEM_READONLY flag. This
> +is because KVM cannot provide the data that would be written by the
> +page-table walker, making it impossible to emulate the access.
> +Instead, an abort (data abort if the cause of the page-table update
> +was a load or a store, instruction abort if it was an instruction
> +fetch) is injected in the guest.
>  
>  4.36 KVM_SET_TSS_ADDR
>  ---------------------

--
Best,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots
  2022-12-20 20:09 ` [PATCH 1/3] KVM: arm64: Fix S1PTW handling " Marc Zyngier
  2022-12-20 21:47   ` Oliver Upton
@ 2022-12-22 13:01   ` Ard Biesheuvel
  2022-12-24 12:18     ` Marc Zyngier
  1 sibling, 1 reply; 20+ messages in thread
From: Ard Biesheuvel @ 2022-12-22 13:01 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvmarm, kvm, linux-arm-kernel, James Morse,
	Suzuki K Poulose, Alexandru Elisei, Oliver Upton, Will Deacon,
	Quentin Perret, stable

On Tue, 20 Dec 2022 at 21:09, Marc Zyngier <maz@kernel.org> wrote:
>
> A recent development on the EFI front has resulted in guests having
> their page tables baked in the firmware binary, and mapped into
> the IPA space as part as a read-only memslot.
>
> Not only this is legitimate, but it also results in added security,
> so thumbs up. However, this clashes mildly with our handling of a S1PTW
> as a write to correctly handle AF/DB updates to the S1 PTs, and results
> in the guest taking an abort it won't recover from (the PTs mapping the
> vectors will suffer freom the same problem...).
>
> So clearly our handling is... wrong.
>
> Instead, switch to a two-pronged approach:
>
> - On S1PTW translation fault, handle the fault as a read
>
> - On S1PTW permission fault, handle the fault as a write
>
> This is of no consequence to SW that *writes* to its PTs (the write
> will trigger a non-S1PTW fault), and SW that uses RO PTs will not
> use AF/DB anyway, as that'd be wrong.
>
> Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
> fault on S1PTW permission fault on instruction fetch") do we end-up
> with two back-to-back faults (page being evicted and faulted back).
> I don't think this is a case worth optimising for.
>
> Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch")
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Cc: stable@vger.kernel.org

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>

I have tested this patch on my TX2 with one of the EFI builds in
question, and everything works as before (I never observed the issue
itself)

Regression-tested-by: Ard Biesheuvel <ardb@kernel.org>

For the record, the EFI build in question targets QEMU/mach-virt and
switches to a set of read-only page tables in emulated NOR flash
straight out of reset, so it can create and populate the real page
tables with MMU and caches enabled. EFI does not use virtual memory or
paging so managing access flags or dirty bits in hardware is unlikely
to add any value, and it is not being used at the moment. And given
that this is emulated NOR flash, any ordinary write to it tears down
the r/o memslot altogether, and kicks the NOR flash emulation in QEMU
into programming mode, which is fully based on MMIO emulation and does
not use a memslot at all. IOW, even if we could figure out what store
the PTW was attempting to do, it is always going to be rejected since
the r/o page tables can only be modified by 'programming' the NOR
flash sector.


> ---
>  arch/arm64/include/asm/kvm_emulate.h | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 9bdba47f7e14..fd6ad8b21f85 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -373,8 +373,26 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
>
>  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>  {
> -       if (kvm_vcpu_abt_iss1tw(vcpu))
> -               return true;
> +       if (kvm_vcpu_abt_iss1tw(vcpu)) {
> +               /*
> +                * Only a permission fault on a S1PTW should be
> +                * considered as a write. Otherwise, page tables baked
> +                * in a read-only memslot will result in an exception
> +                * being delivered in the guest.
> +                *
> +                * The drawback is that we end-up fauling twice if the
> +                * guest is using any of HW AF/DB: a translation fault
> +                * to map the page containing the PT (read only at
> +                * first), then a permission fault to allow the flags
> +                * to be set.
> +                */
> +               switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
> +               case ESR_ELx_FSC_PERM:
> +                       return true;
> +               default:
> +                       return false;
> +               }
> +       }
>
>         if (kvm_vcpu_trap_is_iabt(vcpu))
>                 return false;
> --
> 2.34.1
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots
  2022-12-22 13:01   ` Ard Biesheuvel
@ 2022-12-24 12:18     ` Marc Zyngier
  2022-12-24 13:09       ` Ard Biesheuvel
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2022-12-24 12:18 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: kvmarm, kvmarm, kvm, linux-arm-kernel, James Morse,
	Suzuki K Poulose, Alexandru Elisei, Oliver Upton, Will Deacon,
	Quentin Perret, stable

On Thu, 22 Dec 2022 13:01:55 +0000,
Ard Biesheuvel <ardb@kernel.org> wrote:
> 
> On Tue, 20 Dec 2022 at 21:09, Marc Zyngier <maz@kernel.org> wrote:
> >
> > A recent development on the EFI front has resulted in guests having
> > their page tables baked in the firmware binary, and mapped into
> > the IPA space as part as a read-only memslot.
> >
> > Not only this is legitimate, but it also results in added security,
> > so thumbs up. However, this clashes mildly with our handling of a S1PTW
> > as a write to correctly handle AF/DB updates to the S1 PTs, and results
> > in the guest taking an abort it won't recover from (the PTs mapping the
> > vectors will suffer freom the same problem...).
> >
> > So clearly our handling is... wrong.
> >
> > Instead, switch to a two-pronged approach:
> >
> > - On S1PTW translation fault, handle the fault as a read
> >
> > - On S1PTW permission fault, handle the fault as a write
> >
> > This is of no consequence to SW that *writes* to its PTs (the write
> > will trigger a non-S1PTW fault), and SW that uses RO PTs will not
> > use AF/DB anyway, as that'd be wrong.
> >
> > Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
> > fault on S1PTW permission fault on instruction fetch") do we end-up
> > with two back-to-back faults (page being evicted and faulted back).
> > I don't think this is a case worth optimising for.
> >
> > Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch")
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > Cc: stable@vger.kernel.org
> 
> Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
> 
> I have tested this patch on my TX2 with one of the EFI builds in
> question, and everything works as before (I never observed the issue
> itself)

If you get the chance, could you try with non-4kB page sizes? Here, I
could only reproduce it with 16kB pages. It was firing like clockwork
on Cortex-A55 with that.

> 
> Regression-tested-by: Ard Biesheuvel <ardb@kernel.org>
> 
> For the record, the EFI build in question targets QEMU/mach-virt and
> switches to a set of read-only page tables in emulated NOR flash
> straight out of reset, so it can create and populate the real page
> tables with MMU and caches enabled. EFI does not use virtual memory or
> paging so managing access flags or dirty bits in hardware is unlikely
> to add any value, and it is not being used at the moment. And given
> that this is emulated NOR flash, any ordinary write to it tears down
> the r/o memslot altogether, and kicks the NOR flash emulation in QEMU
> into programming mode, which is fully based on MMIO emulation and does
> not use a memslot at all. IOW, even if we could figure out what store
> the PTW was attempting to do, it is always going to be rejected since
> the r/o page tables can only be modified by 'programming' the NOR
> flash sector.

Indeed, and this would be a pretty dodgy setup anyway.

Thanks for having had a look,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots
  2022-12-24 12:18     ` Marc Zyngier
@ 2022-12-24 13:09       ` Ard Biesheuvel
  0 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2022-12-24 13:09 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvmarm, kvm, linux-arm-kernel, James Morse,
	Suzuki K Poulose, Alexandru Elisei, Oliver Upton, Will Deacon,
	Quentin Perret, stable

On Sat, 24 Dec 2022 at 13:19, Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 22 Dec 2022 13:01:55 +0000,
> Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Tue, 20 Dec 2022 at 21:09, Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > A recent development on the EFI front has resulted in guests having
> > > their page tables baked in the firmware binary, and mapped into
> > > the IPA space as part as a read-only memslot.
> > >
> > > Not only this is legitimate, but it also results in added security,
> > > so thumbs up. However, this clashes mildly with our handling of a S1PTW
> > > as a write to correctly handle AF/DB updates to the S1 PTs, and results
> > > in the guest taking an abort it won't recover from (the PTs mapping the
> > > vectors will suffer freom the same problem...).
> > >
> > > So clearly our handling is... wrong.
> > >
> > > Instead, switch to a two-pronged approach:
> > >
> > > - On S1PTW translation fault, handle the fault as a read
> > >
> > > - On S1PTW permission fault, handle the fault as a write
> > >
> > > This is of no consequence to SW that *writes* to its PTs (the write
> > > will trigger a non-S1PTW fault), and SW that uses RO PTs will not
> > > use AF/DB anyway, as that'd be wrong.
> > >
> > > Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
> > > fault on S1PTW permission fault on instruction fetch") do we end-up
> > > with two back-to-back faults (page being evicted and faulted back).
> > > I don't think this is a case worth optimising for.
> > >
> > > Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch")
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > Cc: stable@vger.kernel.org
> >
> > Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
> >
> > I have tested this patch on my TX2 with one of the EFI builds in
> > question, and everything works as before (I never observed the issue
> > itself)
>
> If you get the chance, could you try with non-4kB page sizes? Here, I
> could only reproduce it with 16kB pages. It was firing like clockwork
> on Cortex-A55 with that.
>

I'll try on 64k but I don't have access to a 16k capable machine that
runs KVM atm (I'm still enjoying working wifi and GPU etc on my M1
Macbook Air)

> >
> > Regression-tested-by: Ard Biesheuvel <ardb@kernel.org>
> >
> > For the record, the EFI build in question targets QEMU/mach-virt and
> > switches to a set of read-only page tables in emulated NOR flash
> > straight out of reset, so it can create and populate the real page
> > tables with MMU and caches enabled. EFI does not use virtual memory or
> > paging so managing access flags or dirty bits in hardware is unlikely
> > to add any value, and it is not being used at the moment. And given
> > that this is emulated NOR flash, any ordinary write to it tears down
> > the r/o memslot altogether, and kicks the NOR flash emulation in QEMU
> > into programming mode, which is fully based on MMIO emulation and does
> > not use a memslot at all. IOW, even if we could figure out what store
> > the PTW was attempting to do, it is always going to be rejected since
> > the r/o page tables can only be modified by 'programming' the NOR
> > flash sector.
>
> Indeed, and this would be a pretty dodgy setup anyway.
>
> Thanks for having had a look,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write
  2022-12-20 20:09 [PATCH 0/3] KVM: arm64: Fix handling of S1PTW S2 fault on RO memslots Marc Zyngier
  2022-12-20 20:09 ` [PATCH 1/3] KVM: arm64: Fix S1PTW handling " Marc Zyngier
@ 2022-12-20 20:09 ` Marc Zyngier
  2022-12-21 16:46   ` Ricardo Koller
  2022-12-20 20:09 ` [PATCH 3/3] KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_* Marc Zyngier
  2 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2022-12-20 20:09 UTC (permalink / raw)
  To: kvmarm, kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Alexandru Elisei, Oliver Upton,
	Ard Biesheuvel, Will Deacon, Quentin Perret

As a minor optimisation, we can retrofit the "S1PTW is a write
even on translation fault" concept *if* the vcpu is using the
HW-managed Access Flag, as setting TCR_EL1.HA is guaranteed
to result in an update of the PTE.

However, we cannot do the same thing for DB, as it would require
us to parse the PTs to find out if the DBM bit is set there.
This is not going to happen.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index fd6ad8b21f85..4ee467065042 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -374,6 +374,9 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
 static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
 {
 	if (kvm_vcpu_abt_iss1tw(vcpu)) {
+		unsigned int afdb;
+		u64 mmfr1;
+
 		/*
 		 * Only a permission fault on a S1PTW should be
 		 * considered as a write. Otherwise, page tables baked
@@ -385,12 +388,27 @@ static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
 		 * to map the page containing the PT (read only at
 		 * first), then a permission fault to allow the flags
 		 * to be set.
+		 *
+		 * We can improve things if the guest uses AF, as this
+		 * is guaranteed to result in a write to the PTE. For
+		 * DB, however, we'd need to parse the guest's PTs,
+		 * and that's not on. DB is crap anyway.
 		 */
 		switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
 		case ESR_ELx_FSC_PERM:
 			return true;
 		default:
-			return false;
+			/* Can't introspect TCR_EL1 with pKVM */
+			if (kvm_vm_is_protected(vcpu->kvm))
+				return false;
+
+			mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
+			afdb = cpuid_feature_extract_unsigned_field(mmfr1, ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
+
+			if (afdb == ID_AA64MMFR1_EL1_HAFDBS_NI)
+				return false;
+
+			return (vcpu_read_sys_reg(vcpu, TCR_EL1) & TCR_HA);
 		}
 	}
 
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write
  2022-12-20 20:09 ` [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write Marc Zyngier
@ 2022-12-21 16:46   ` Ricardo Koller
  2022-12-21 17:43     ` Marc Zyngier
  2022-12-21 17:46     ` Oliver Upton
  0 siblings, 2 replies; 20+ messages in thread
From: Ricardo Koller @ 2022-12-21 16:46 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm, kvmarm, kvm, linux-arm-kernel, Will Deacon

Hello,

On Tue, Dec 20, 2022 at 08:09:22PM +0000, Marc Zyngier wrote:
> As a minor optimisation, we can retrofit the "S1PTW is a write
> even on translation fault" concept *if* the vcpu is using the
> HW-managed Access Flag, as setting TCR_EL1.HA is guaranteed
> to result in an update of the PTE.
> 
> However, we cannot do the same thing for DB, as it would require
> us to parse the PTs to find out if the DBM bit is set there.
> This is not going to happen.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_emulate.h | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index fd6ad8b21f85..4ee467065042 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -374,6 +374,9 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
>  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>  {
>  	if (kvm_vcpu_abt_iss1tw(vcpu)) {
> +		unsigned int afdb;
> +		u64 mmfr1;
> +
>  		/*
>  		 * Only a permission fault on a S1PTW should be
>  		 * considered as a write. Otherwise, page tables baked
> @@ -385,12 +388,27 @@ static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>  		 * to map the page containing the PT (read only at
>  		 * first), then a permission fault to allow the flags
>  		 * to be set.
> +		 *
> +		 * We can improve things if the guest uses AF, as this
> +		 * is guaranteed to result in a write to the PTE. For
> +		 * DB, however, we'd need to parse the guest's PTs,
> +		 * and that's not on. DB is crap anyway.
>  		 */
>  		switch (kvm_vcpu_trap_get_fault_type(vcpu)) {

Nit: fault_status is calculated once when taking the fault, and passed
around to all users (like user_mem_abort()). Not sure if this is because
of the extra cycles needed to get it, or just style. Anyway, maybe it
applies here.

>  		case ESR_ELx_FSC_PERM:
>  			return true;
>  		default:
> -			return false;
> +			/* Can't introspect TCR_EL1 with pKVM */
> +			if (kvm_vm_is_protected(vcpu->kvm))
> +				return false;
> +
> +			mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> +			afdb = cpuid_feature_extract_unsigned_field(mmfr1, ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> +
> +			if (afdb == ID_AA64MMFR1_EL1_HAFDBS_NI)
> +				return false;
> +
> +			return (vcpu_read_sys_reg(vcpu, TCR_EL1) & TCR_HA);

Also tested this specific case using page_fault_test when the PT page is
marked for dirty logging with and without AF. In both cases there's a
single _FSC_FAULT (no PERM_FAUT) as expected, and the PT page is marked dirty
in the AF case. The RO and UFFD cases also work as expected.

Need to send some changes for page_fault_test as many tests assume that
any S1PTW is always a PT write, and are failing. Also need to add some new
tests for PTs in RO memslots (as it didn't make much sense before this
change).

>  		}
>  	}
>  
> -- 
> 2.34.1
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Reviewed-by: Ricardo Koller <ricarkol@google.com>

Thanks,
Ricardo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write
  2022-12-21 16:46   ` Ricardo Koller
@ 2022-12-21 17:43     ` Marc Zyngier
  2022-12-23  0:33       ` Ricardo Koller
  2022-12-21 17:46     ` Oliver Upton
  1 sibling, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2022-12-21 17:43 UTC (permalink / raw)
  To: Ricardo Koller; +Cc: kvmarm, kvmarm, kvm, linux-arm-kernel, Will Deacon

Hi Ricardo,

On Wed, 21 Dec 2022 16:46:06 +0000,
Ricardo Koller <ricarkol@google.com> wrote:
> 
> Hello,
> 
> On Tue, Dec 20, 2022 at 08:09:22PM +0000, Marc Zyngier wrote:
> > As a minor optimisation, we can retrofit the "S1PTW is a write
> > even on translation fault" concept *if* the vcpu is using the
> > HW-managed Access Flag, as setting TCR_EL1.HA is guaranteed
> > to result in an update of the PTE.
> > 
> > However, we cannot do the same thing for DB, as it would require
> > us to parse the PTs to find out if the DBM bit is set there.
> > This is not going to happen.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_emulate.h | 20 +++++++++++++++++++-
> >  1 file changed, 19 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > index fd6ad8b21f85..4ee467065042 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -374,6 +374,9 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
> >  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> >  {
> >  	if (kvm_vcpu_abt_iss1tw(vcpu)) {
> > +		unsigned int afdb;
> > +		u64 mmfr1;
> > +
> >  		/*
> >  		 * Only a permission fault on a S1PTW should be
> >  		 * considered as a write. Otherwise, page tables baked
> > @@ -385,12 +388,27 @@ static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> >  		 * to map the page containing the PT (read only at
> >  		 * first), then a permission fault to allow the flags
> >  		 * to be set.
> > +		 *
> > +		 * We can improve things if the guest uses AF, as this
> > +		 * is guaranteed to result in a write to the PTE. For
> > +		 * DB, however, we'd need to parse the guest's PTs,
> > +		 * and that's not on. DB is crap anyway.
> >  		 */
> >  		switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
> 
> Nit: fault_status is calculated once when taking the fault, and passed
> around to all users (like user_mem_abort()). Not sure if this is because
> of the extra cycles needed to get it, or just style. Anyway, maybe it
> applies here.

All these things are just fields in ESR_EL2, which we keep looking at
all the time. The compiler actually does a pretty good job at keeping
that around, specially considering that this function is inlined (at
least here, kvm_handle_guest_abort and kvm_user_mem_abort are merged
into a single monster).

So passing the parameter wouldn't change a thing, and I find the above
more readable (I know that all the information in this function are
derived from the same data structure).

> 
> >  		case ESR_ELx_FSC_PERM:
> >  			return true;
> >  		default:
> > -			return false;
> > +			/* Can't introspect TCR_EL1 with pKVM */
> > +			if (kvm_vm_is_protected(vcpu->kvm))
> > +				return false;
> > +
> > +			mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > +			afdb = cpuid_feature_extract_unsigned_field(mmfr1, ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> > +
> > +			if (afdb == ID_AA64MMFR1_EL1_HAFDBS_NI)
> > +				return false;
> > +
> > +			return (vcpu_read_sys_reg(vcpu, TCR_EL1) & TCR_HA);
> 
> Also tested this specific case using page_fault_test when the PT page is
> marked for dirty logging with and without AF. In both cases there's a
> single _FSC_FAULT (no PERM_FAUT) as expected, and the PT page is marked dirty
> in the AF case. The RO and UFFD cases also work as expected.

Ah, thanks for checking this.

> 
> Need to send some changes for page_fault_test as many tests assume that
> any S1PTW is always a PT write, and are failing. Also need to add some new
> tests for PTs in RO memslots (as it didn't make much sense before this
> change).

I think this is what I really quite didn't grok in these tests. They
seem to verify the KVM behaviour, which is not what we should check
for.

Instead, we should check for the architectural behaviour, which is
that if HAFDBS is enabled, we can observe updates to the PTs even when
we do not write to them directly.

> 
> >  		}
> >  	}
> >  
> > -- 
> > 2.34.1
> > 
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 
> Reviewed-by: Ricardo Koller <ricarkol@google.com>

Thanks!

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write
  2022-12-21 17:43     ` Marc Zyngier
@ 2022-12-23  0:33       ` Ricardo Koller
  0 siblings, 0 replies; 20+ messages in thread
From: Ricardo Koller @ 2022-12-23  0:33 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm, kvmarm, kvm, linux-arm-kernel, Will Deacon

Hi Marc,

On Wed, Dec 21, 2022 at 05:43:03PM +0000, Marc Zyngier wrote:
> Hi Ricardo,
> 
> On Wed, 21 Dec 2022 16:46:06 +0000,
> Ricardo Koller <ricarkol@google.com> wrote:
> > 
> > Hello,
> > 
> > On Tue, Dec 20, 2022 at 08:09:22PM +0000, Marc Zyngier wrote:
> > > As a minor optimisation, we can retrofit the "S1PTW is a write
> > > even on translation fault" concept *if* the vcpu is using the
> > > HW-managed Access Flag, as setting TCR_EL1.HA is guaranteed
> > > to result in an update of the PTE.
> > > 
> > > However, we cannot do the same thing for DB, as it would require
> > > us to parse the PTs to find out if the DBM bit is set there.
> > > This is not going to happen.
> > > 
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > ---
> > >  arch/arm64/include/asm/kvm_emulate.h | 20 +++++++++++++++++++-
> > >  1 file changed, 19 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > > index fd6ad8b21f85..4ee467065042 100644
> > > --- a/arch/arm64/include/asm/kvm_emulate.h
> > > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > > @@ -374,6 +374,9 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
> > >  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> > >  {
> > >  	if (kvm_vcpu_abt_iss1tw(vcpu)) {
> > > +		unsigned int afdb;
> > > +		u64 mmfr1;
> > > +
> > >  		/*
> > >  		 * Only a permission fault on a S1PTW should be
> > >  		 * considered as a write. Otherwise, page tables baked
> > > @@ -385,12 +388,27 @@ static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> > >  		 * to map the page containing the PT (read only at
> > >  		 * first), then a permission fault to allow the flags
> > >  		 * to be set.
> > > +		 *
> > > +		 * We can improve things if the guest uses AF, as this
> > > +		 * is guaranteed to result in a write to the PTE. For
> > > +		 * DB, however, we'd need to parse the guest's PTs,
> > > +		 * and that's not on. DB is crap anyway.
> > >  		 */
> > >  		switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
> > 
> > Nit: fault_status is calculated once when taking the fault, and passed
> > around to all users (like user_mem_abort()). Not sure if this is because
> > of the extra cycles needed to get it, or just style. Anyway, maybe it
> > applies here.
> 
> All these things are just fields in ESR_EL2, which we keep looking at
> all the time. The compiler actually does a pretty good job at keeping
> that around, specially considering that this function is inlined (at
> least here, kvm_handle_guest_abort and kvm_user_mem_abort are merged
> into a single monster).
> 
> So passing the parameter wouldn't change a thing, and I find the above
> more readable (I know that all the information in this function are
> derived from the same data structure).
>

Got it, thanks for the info.

> > 
> > >  		case ESR_ELx_FSC_PERM:
> > >  			return true;
> > >  		default:
> > > -			return false;
> > > +			/* Can't introspect TCR_EL1 with pKVM */
> > > +			if (kvm_vm_is_protected(vcpu->kvm))
> > > +				return false;
> > > +
> > > +			mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > > +			afdb = cpuid_feature_extract_unsigned_field(mmfr1, ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> > > +
> > > +			if (afdb == ID_AA64MMFR1_EL1_HAFDBS_NI)
> > > +				return false;
> > > +
> > > +			return (vcpu_read_sys_reg(vcpu, TCR_EL1) & TCR_HA);
> > 
> > Also tested this specific case using page_fault_test when the PT page is
> > marked for dirty logging with and without AF. In both cases there's a
> > single _FSC_FAULT (no PERM_FAUT) as expected, and the PT page is marked dirty
> > in the AF case. The RO and UFFD cases also work as expected.
> 
> Ah, thanks for checking this.
> 
> > 
> > Need to send some changes for page_fault_test as many tests assume that
> > any S1PTW is always a PT write, and are failing. Also need to add some new
> > tests for PTs in RO memslots (as it didn't make much sense before this
> > change).
> 
> I think this is what I really quite didn't grok in these tests. They
> seem to verify the KVM behaviour, which is not what we should check
> for.
> 
> Instead, we should check for the architectural behaviour, which is
> that if HAFDBS is enabled, we can observe updates to the PTs even when
> we do not write to them directly.

There are some tests checking that case (e.g., AF set by HW), but they
also do it while interacting with dirty-logging, userfaultfd, and/or RO
memslots. Some checks are clearly dealing with architectural behavior,
while others are not that clear. Let me use this sample test to get more
specific.  This test deals with HW setting the AF bit on a punched hole
backed by userfaultfd:

TEST_UFFD(guest_exec, with_af, CMD_HOLE_PT, ...):
	1. set TCR_EL1.HA and clear the AF in the test-data PTE,
	2. punch a hole on the test-data PTE page, and register it for
	   userfaultfd,
	3. execute code in the test-data page; this triggers a S1PTW,
	4. assert that there is a userfaultfd _write_ fault on the PT page,
	5. assert that the test-data instruction was executed,
	6. assert that the AF is set on the test-data PTE,

IIUC, only checking for architectural behavior implies skipping step 4.
And I agree, it might be a good idea to skip 4, as it actually depends
on whether the kernel has only the previous commit, or both the previous
and this one. The fault is a _write_ at this commit, and a _read_ at the
previous.

The issue is that there are some cases where checking KVM behavior could
be useful. For example, this dirty_logging test can also help (besides
testing dirty-logging) for checking regressions of commit c4ad98e4b72c
("KVM: arm64: Assume write fault on S1PTW permission fault on
instruction fetch"):

TEST_DIRTY_LOG(guest_exec, with_af, ...)
	1. set TCR_EL1.HA and clear the AF in the test-data PTE,
	2. enable dirty logging on the PT memslot,
	3. execute code in the test-data page; this triggers a S1PTW,
	4. assert that the test-data PTE page is dirty on the log,
	5. assert that the test-data instruction was executed,
	6. assert that the AF is set on the test-data PTE,

Step 4 above is not exactly checking architectural behavior, but I think
it's still useful.

So, regarding what to do with page_fault_test. As a start, I need to go
through all tests and make sure they pass at both this and the previous
commit. Next, I have to identify other tests that also need to be
relaxed a bit (removing some test asserts).

Thanks,
Ricardo

> 
> > 
> > >  		}
> > >  	}
> > >  
> > > -- 
> > > 2.34.1
> > > 
> > > _______________________________________________
> > > kvmarm mailing list
> > > kvmarm@lists.cs.columbia.edu
> > > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> > 
> > Reviewed-by: Ricardo Koller <ricarkol@google.com>
> 
> Thanks!
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write
  2022-12-21 16:46   ` Ricardo Koller
  2022-12-21 17:43     ` Marc Zyngier
@ 2022-12-21 17:46     ` Oliver Upton
  2022-12-22  9:01       ` Marc Zyngier
  1 sibling, 1 reply; 20+ messages in thread
From: Oliver Upton @ 2022-12-21 17:46 UTC (permalink / raw)
  To: Ricardo Koller
  Cc: Marc Zyngier, kvmarm, kvmarm, kvm, linux-arm-kernel, Will Deacon

On Wed, Dec 21, 2022 at 08:46:06AM -0800, Ricardo Koller wrote:

[...]

> > -			return false;
> > +			/* Can't introspect TCR_EL1 with pKVM */
> > +			if (kvm_vm_is_protected(vcpu->kvm))
> > +				return false;
> > +
> > +			mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > +			afdb = cpuid_feature_extract_unsigned_field(mmfr1, ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> > +
> > +			if (afdb == ID_AA64MMFR1_EL1_HAFDBS_NI)
> > +				return false;
> > +
> > +			return (vcpu_read_sys_reg(vcpu, TCR_EL1) & TCR_HA);
> 
> Also tested this specific case using page_fault_test when the PT page is
> marked for dirty logging with and without AF. In both cases there's a
> single _FSC_FAULT (no PERM_FAUT) as expected, and the PT page is marked dirty
> in the AF case. The RO and UFFD cases also work as expected.
> 
> Need to send some changes for page_fault_test as many tests assume that
> any S1PTW is always a PT write, and are failing. Also need to add some new
> tests for PTs in RO memslots (as it didn't make much sense before this
> change).

So I actually wanted to bring up the issue of user visibility, glad your
test picked up something.

This has two implications, which are rather odd.

 - When UFFD is in use, translation faults are reported to userspace as
   writes when from a RW memslot and reads when from an RO memslot.

 - S1 page table memory is spuriously marked as dirty, as we presume a
   write immediately follows the translation fault. That isn't entirely
   senseless, as it would mean both the target page and the S1 PT that
   maps it are both old. This is nothing new I suppose, just weird.

Marc, do you have any concerns about leaving this as-is for the time
being? At least before we were doing the same thing (write fault) every
time.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write
  2022-12-21 17:46     ` Oliver Upton
@ 2022-12-22  9:01       ` Marc Zyngier
  2022-12-22 20:58         ` Oliver Upton
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2022-12-22  9:01 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Ricardo Koller, kvmarm, kvmarm, kvm, linux-arm-kernel,
	Will Deacon

On Wed, 21 Dec 2022 17:46:24 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Wed, Dec 21, 2022 at 08:46:06AM -0800, Ricardo Koller wrote:
> 
> [...]
> 
> > > -			return false;
> > > +			/* Can't introspect TCR_EL1 with pKVM */
> > > +			if (kvm_vm_is_protected(vcpu->kvm))
> > > +				return false;
> > > +
> > > +			mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > > +			afdb = cpuid_feature_extract_unsigned_field(mmfr1, ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> > > +
> > > +			if (afdb == ID_AA64MMFR1_EL1_HAFDBS_NI)
> > > +				return false;
> > > +
> > > +			return (vcpu_read_sys_reg(vcpu, TCR_EL1) & TCR_HA);
> > 
> > Also tested this specific case using page_fault_test when the PT page is
> > marked for dirty logging with and without AF. In both cases there's a
> > single _FSC_FAULT (no PERM_FAUT) as expected, and the PT page is marked dirty
> > in the AF case. The RO and UFFD cases also work as expected.
> > 
> > Need to send some changes for page_fault_test as many tests assume that
> > any S1PTW is always a PT write, and are failing. Also need to add some new
> > tests for PTs in RO memslots (as it didn't make much sense before this
> > change).
> 
> So I actually wanted to bring up the issue of user visibility, glad your
> test picked up something.
> 
> This has two implications, which are rather odd.
> 
>  - When UFFD is in use, translation faults are reported to userspace as
>    writes when from a RW memslot and reads when from an RO memslot.

Not quite: translation faults are reported as reads if TCR_EL1.HA
isn't set, and as writes if it is. Ignoring TCR_EL1.HD for a moment,
this matches exactly the behaviour of the page-table walker, which
will update the S1 PTs only if this bit is set.

Or is it what userfaultfd does on its own? That'd be confusing...

> 
>  - S1 page table memory is spuriously marked as dirty, as we presume a
>    write immediately follows the translation fault. That isn't entirely
>    senseless, as it would mean both the target page and the S1 PT that
>    maps it are both old. This is nothing new I suppose, just weird.

s/old/young/ ?

I think you're confusing the PT access with the access that caused the
PT access (I'll have that printed on a t-shirt, thank you very much).

Here, we're not considering the cause of the PT access anymore. If
TCR_EL1.HA is set, the S1 PT page will be marked as accessed even on a
read, and only that page.

TCR_EL1.HD is what muddies the waters a bit. If it is set without HA
being set, we still handle the translation fault as a read, followed
by a write permission fault. But again, that's solely for the purpose
of the S1 PT. What happens for the mapped page is completely
independent.

> Marc, do you have any concerns about leaving this as-is for the time
> being? At least before we were doing the same thing (write fault) every
> time.

I have the ugly feeling we're talking at cross purpose here, mostly
because I don't get how userfaultfd fits in that picture. Can you shed
some light here?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write
  2022-12-22  9:01       ` Marc Zyngier
@ 2022-12-22 20:58         ` Oliver Upton
  2022-12-23  1:00           ` Ricardo Koller
  2022-12-24 11:59           ` Marc Zyngier
  0 siblings, 2 replies; 20+ messages in thread
From: Oliver Upton @ 2022-12-22 20:58 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Ricardo Koller, kvmarm, kvmarm, kvm, linux-arm-kernel,
	Will Deacon

On Thu, Dec 22, 2022 at 09:01:15AM +0000, Marc Zyngier wrote:
> On Wed, 21 Dec 2022 17:46:24 +0000, Oliver Upton <oliver.upton@linux.dev> wrote:
> >  - When UFFD is in use, translation faults are reported to userspace as
> >    writes when from a RW memslot and reads when from an RO memslot.
> 
> Not quite: translation faults are reported as reads if TCR_EL1.HA
> isn't set, and as writes if it is. Ignoring TCR_EL1.HD for a moment,
> this matches exactly the behaviour of the page-table walker, which
> will update the S1 PTs only if this bit is set.

My bad, yes you're right. I conflated the use case here with the
architectural state.

I'm probably being way too pedantic, but I just wanted to make sure we
agree about the ensuing subtlety. More below:

> Or is it what userfaultfd does on its own? That'd be confusing...
> 
> > 
> >  - S1 page table memory is spuriously marked as dirty, as we presume a
> >    write immediately follows the translation fault. That isn't entirely
> >    senseless, as it would mean both the target page and the S1 PT that
> >    maps it are both old. This is nothing new I suppose, just weird.
> 
> s/old/young/ ?
> 
> I think you're confusing the PT access with the access that caused the
> PT access (I'll have that printed on a t-shirt, thank you very much).

I'd buy it!

> Here, we're not considering the cause of the PT access anymore. If
> TCR_EL1.HA is set, the S1 PT page will be marked as accessed even on a
> read, and only that page.

I think this is where the disconnect might be. TCR_EL1.HA == 1 suggests
a write could possibly follow, but I don't think it requires it. The
page table walker must first load the S1 PTE before writing to it.

From AArch64.S1Translate() (DDI0487H.a):

    (fault, descaddress, walkstate, descriptor) = AArch64.S1Walk(fault, walkparams, va, regime,
								 ss, acctype, iswrite, ispriv);

    [...]

    new_desc = descriptor;
    if walkparams.ha == '1' && AArch64.FaultAllowsSetAccessFlag(fault) then
      // Set descriptor AF bit
      new_desc<10> = '1';

    [...]

    // Either the access flag was clear or AP<2> is set
    if new_desc != descriptor then
      if regime == Regime_EL10 && EL2Enabled() then
        s1aarch64 = TRUE;
	s2fs1walk = TRUE;
	aligned = TRUE;
	iswrite = TRUE;
	(s2fault, descupdateaddress) = AArch64.S2Translate(fault, descaddress, s1aarch64,
							   ss, s2fs1walk, AccType_ATOMICRW,
							   aligned, iswrite, ispriv);

    if s2fault.statuscode != Fault_None then
      return (s2fault, AddressDescriptor UNKNOWN);
    else
      descupdateaddress = descaddress;

    (fault, mem_desc) = AArch64.MemSwapTableDesc(fault, descriptor, new_desc,
    						 walkparams.ee, descupdateaddress)

Buried in AArch64.S1Walk() is a stage-2 walk for a read to fetch the
descriptor. The second stage-2 walk for write is conditioned on having
already fetched the stage-1 descriptor and determining the AF needs
to be set.

Relating back to UFFD: if we expect KVM to do exactly what hardware
does, UFFD should see an attempted read when the first walk fails
because of an S2 translation fault. Based on this patch, though, we'd
promote it to a write if TCR_EL1.HA == 1.

This has the additional nuance of marking the S1 PT's IPA as dirty, even
though it might not actually have been written to. Having said that,
the false positive rate should be negligible given that S1 PTs ought to
account for a small amount of guest memory.

Like I said before, I'm probably being unnecessarily pedantic :) It just
seems to me that the view we're giving userspace of S1PTW aborts isn't
exactly architectural and I want to make sure that is explicitly
intentional.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write
  2022-12-22 20:58         ` Oliver Upton
@ 2022-12-23  1:00           ` Ricardo Koller
  2022-12-24 11:59           ` Marc Zyngier
  1 sibling, 0 replies; 20+ messages in thread
From: Ricardo Koller @ 2022-12-23  1:00 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, kvmarm, kvmarm, kvm, linux-arm-kernel, Will Deacon

On Thu, Dec 22, 2022 at 08:58:40PM +0000, Oliver Upton wrote:
> On Thu, Dec 22, 2022 at 09:01:15AM +0000, Marc Zyngier wrote:
> > On Wed, 21 Dec 2022 17:46:24 +0000, Oliver Upton <oliver.upton@linux.dev> wrote:
> > >  - When UFFD is in use, translation faults are reported to userspace as
> > >    writes when from a RW memslot and reads when from an RO memslot.
> > 
> > Not quite: translation faults are reported as reads if TCR_EL1.HA
> > isn't set, and as writes if it is. Ignoring TCR_EL1.HD for a moment,
> > this matches exactly the behaviour of the page-table walker, which
> > will update the S1 PTs only if this bit is set.
> 
> My bad, yes you're right. I conflated the use case here with the
> architectural state.
> 
> I'm probably being way too pedantic, but I just wanted to make sure we
> agree about the ensuing subtlety. More below:
> 
> > Or is it what userfaultfd does on its own? That'd be confusing...
> > 
> > > 
> > >  - S1 page table memory is spuriously marked as dirty, as we presume a
> > >    write immediately follows the translation fault. That isn't entirely
> > >    senseless, as it would mean both the target page and the S1 PT that
> > >    maps it are both old. This is nothing new I suppose, just weird.
> > 
> > s/old/young/ ?
> > 
> > I think you're confusing the PT access with the access that caused the
> > PT access (I'll have that printed on a t-shirt, thank you very much).
> 
> I'd buy it!
> 
> > Here, we're not considering the cause of the PT access anymore. If
> > TCR_EL1.HA is set, the S1 PT page will be marked as accessed even on a
> > read, and only that page.
> 
> I think this is where the disconnect might be. TCR_EL1.HA == 1 suggests
> a write could possibly follow, but I don't think it requires it. The
> page table walker must first load the S1 PTE before writing to it.
> 
> From AArch64.S1Translate() (DDI0487H.a):
> 
>     (fault, descaddress, walkstate, descriptor) = AArch64.S1Walk(fault, walkparams, va, regime,
> 								 ss, acctype, iswrite, ispriv);
> 
>     [...]
> 
>     new_desc = descriptor;
>     if walkparams.ha == '1' && AArch64.FaultAllowsSetAccessFlag(fault) then
>       // Set descriptor AF bit
>       new_desc<10> = '1';
> 
>     [...]
> 
>     // Either the access flag was clear or AP<2> is set
>     if new_desc != descriptor then
>       if regime == Regime_EL10 && EL2Enabled() then
>         s1aarch64 = TRUE;
> 	s2fs1walk = TRUE;
> 	aligned = TRUE;
> 	iswrite = TRUE;
> 	(s2fault, descupdateaddress) = AArch64.S2Translate(fault, descaddress, s1aarch64,
> 							   ss, s2fs1walk, AccType_ATOMICRW,
> 							   aligned, iswrite, ispriv);
> 
>     if s2fault.statuscode != Fault_None then
>       return (s2fault, AddressDescriptor UNKNOWN);
>     else
>       descupdateaddress = descaddress;
> 
>     (fault, mem_desc) = AArch64.MemSwapTableDesc(fault, descriptor, new_desc,
>     						 walkparams.ee, descupdateaddress)
> 
> Buried in AArch64.S1Walk() is a stage-2 walk for a read to fetch the
> descriptor. The second stage-2 walk for write is conditioned on having
> already fetched the stage-1 descriptor and determining the AF needs
> to be set.
> 
> Relating back to UFFD: if we expect KVM to do exactly what hardware
> does, UFFD should see an attempted read when the first walk fails
> because of an S2 translation fault. Based on this patch, though, we'd
> promote it to a write if TCR_EL1.HA == 1.
> 
> This has the additional nuance of marking the S1 PT's IPA as dirty, even
> though it might not actually have been written to. Having said that,
> the false positive rate should be negligible given that S1 PTs ought to
> account for a small amount of guest memory.

Another false positive is TCR_EL1.HA == 1 and having the AF bit set in
the PTE. This results on a write, when I don't think it should.

> 
> Like I said before, I'm probably being unnecessarily pedantic :) It just
> seems to me that the view we're giving userspace of S1PTW aborts isn't
> exactly architectural and I want to make sure that is explicitly
> intentional.
>
> --
> Thanks,
> Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write
  2022-12-22 20:58         ` Oliver Upton
  2022-12-23  1:00           ` Ricardo Koller
@ 2022-12-24 11:59           ` Marc Zyngier
  1 sibling, 0 replies; 20+ messages in thread
From: Marc Zyngier @ 2022-12-24 11:59 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Ricardo Koller, kvmarm, kvmarm, kvm, linux-arm-kernel,
	Will Deacon

On Thu, 22 Dec 2022 20:58:40 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Thu, Dec 22, 2022 at 09:01:15AM +0000, Marc Zyngier wrote:
> > On Wed, 21 Dec 2022 17:46:24 +0000, Oliver Upton <oliver.upton@linux.dev> wrote:
> > >  - When UFFD is in use, translation faults are reported to userspace as
> > >    writes when from a RW memslot and reads when from an RO memslot.
> > 
> > Not quite: translation faults are reported as reads if TCR_EL1.HA
> > isn't set, and as writes if it is. Ignoring TCR_EL1.HD for a moment,
> > this matches exactly the behaviour of the page-table walker, which
> > will update the S1 PTs only if this bit is set.
> 
> My bad, yes you're right. I conflated the use case here with the
> architectural state.
> 
> I'm probably being way too pedantic, but I just wanted to make sure we
> agree about the ensuing subtlety. More below:
> 
> > Or is it what userfaultfd does on its own? That'd be confusing...
> > 
> > > 
> > >  - S1 page table memory is spuriously marked as dirty, as we presume a
> > >    write immediately follows the translation fault. That isn't entirely
> > >    senseless, as it would mean both the target page and the S1 PT that
> > >    maps it are both old. This is nothing new I suppose, just weird.
> > 
> > s/old/young/ ?
> > 
> > I think you're confusing the PT access with the access that caused the
> > PT access (I'll have that printed on a t-shirt, thank you very much).
> 
> I'd buy it!
> 
> > Here, we're not considering the cause of the PT access anymore. If
> > TCR_EL1.HA is set, the S1 PT page will be marked as accessed even on a
> > read, and only that page.
> 
> I think this is where the disconnect might be. TCR_EL1.HA == 1 suggests
> a write could possibly follow, but I don't think it requires it. The
> page table walker must first load the S1 PTE before writing to it.

Ah, you're talking of the write to the PTE. Too many writes!

My reasoning is based on Rule LFTXR in DDI0487I.a, which says:

"When the PE performs a hardware update of the AF, it sets the AF to 1
 in the corresponding descriptor in memory, in a coherent manner,
 using an atomic read-modify-write of that descriptor."

An atomic-or operation fits this description, and I cannot see
anything in the architecture that would prevent the write of a PTE
even if AF is already set, such as mandating something like a
test-and-set or compare-and-swap.

I'm not saying this is the only possible implementation, or even a
good one. But I don't think this is incompatible with what the
architecture mandates.

> 
> From AArch64.S1Translate() (DDI0487H.a):
> 
>     (fault, descaddress, walkstate, descriptor) = AArch64.S1Walk(fault, walkparams, va, regime,
> 								 ss, acctype, iswrite, ispriv);
> 
>     [...]
> 
>     new_desc = descriptor;
>     if walkparams.ha == '1' && AArch64.FaultAllowsSetAccessFlag(fault) then
>       // Set descriptor AF bit
>       new_desc<10> = '1';
> 
>     [...]
> 
>     // Either the access flag was clear or AP<2> is set
>     if new_desc != descriptor then
>       if regime == Regime_EL10 && EL2Enabled() then
>         s1aarch64 = TRUE;
> 	s2fs1walk = TRUE;
> 	aligned = TRUE;
> 	iswrite = TRUE;
> 	(s2fault, descupdateaddress) = AArch64.S2Translate(fault, descaddress, s1aarch64,
> 							   ss, s2fs1walk, AccType_ATOMICRW,
> 							   aligned, iswrite, ispriv);
> 
>     if s2fault.statuscode != Fault_None then
>       return (s2fault, AddressDescriptor UNKNOWN);
>     else
>       descupdateaddress = descaddress;
> 
>     (fault, mem_desc) = AArch64.MemSwapTableDesc(fault, descriptor, new_desc,
>     						 walkparams.ee, descupdateaddress)
> 
> Buried in AArch64.S1Walk() is a stage-2 walk for a read to fetch the
> descriptor. The second stage-2 walk for write is conditioned on having
> already fetched the stage-1 descriptor and determining the AF needs
> to be set.

The question is whether this is one possible implementation, or the
only possible implementation. My bet is on the former.

> Relating back to UFFD: if we expect KVM to do exactly what hardware
> does, UFFD should see an attempted read when the first walk fails
> because of an S2 translation fault. Based on this patch, though, we'd
> promote it to a write if TCR_EL1.HA == 1.
> 
> This has the additional nuance of marking the S1 PT's IPA as dirty, even
> though it might not actually have been written to. Having said that,
> the false positive rate should be negligible given that S1 PTs ought to
> account for a small amount of guest memory.
> 
> Like I said before, I'm probably being unnecessarily pedantic :) It just
> seems to me that the view we're giving userspace of S1PTW aborts isn't
> exactly architectural and I want to make sure that is explicitly
> intentional.

I think it is perfectly fine to be pedantic about these things,
because they really matter.

I still think that this change doesn't violate the architecture. But
at the same time, I see some value in strictly following what the HW
does, specially given that this only optimises the case where:

- S1 PTs do not have a pre-existing S2 mapping (either placed there by
  the VMM, or swapped out)

- TCR_EL1.HA==1

which is such a corner case that nobody will loose any sleep over it
(and I'll buy beer to anyone who can come up with a real workload
where this optimisation actually matters).

So my suggestion is to drop the change altogether, and stick with the
original fix.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/3] KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*
  2022-12-20 20:09 [PATCH 0/3] KVM: arm64: Fix handling of S1PTW S2 fault on RO memslots Marc Zyngier
  2022-12-20 20:09 ` [PATCH 1/3] KVM: arm64: Fix S1PTW handling " Marc Zyngier
  2022-12-20 20:09 ` [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write Marc Zyngier
@ 2022-12-20 20:09 ` Marc Zyngier
  2 siblings, 0 replies; 20+ messages in thread
From: Marc Zyngier @ 2022-12-20 20:09 UTC (permalink / raw)
  To: kvmarm, kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Alexandru Elisei, Oliver Upton,
	Ard Biesheuvel, Will Deacon, Quentin Perret

The former is an AArch32 legacy, so let's move over to the
verbose (and strictly identical) version.

This involves moving some of the #defines that were private
to KVM into the more generic esr.h.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/esr.h            |  9 +++++++++
 arch/arm64/include/asm/kvm_arm.h        | 15 ---------------
 arch/arm64/include/asm/kvm_emulate.h    | 20 ++++++++++----------
 arch/arm64/kvm/hyp/include/hyp/fault.h  |  2 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h |  2 +-
 arch/arm64/kvm/mmu.c                    | 21 ++++++++++++---------
 6 files changed, 33 insertions(+), 36 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 15b34fbfca66..206de10524e3 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -114,6 +114,15 @@
 #define ESR_ELx_FSC_ACCESS	(0x08)
 #define ESR_ELx_FSC_FAULT	(0x04)
 #define ESR_ELx_FSC_PERM	(0x0C)
+#define ESR_ELx_FSC_SEA_TTW0	(0x14)
+#define ESR_ELx_FSC_SEA_TTW1	(0x15)
+#define ESR_ELx_FSC_SEA_TTW2	(0x16)
+#define ESR_ELx_FSC_SEA_TTW3	(0x17)
+#define ESR_ELx_FSC_SECC	(0x18)
+#define ESR_ELx_FSC_SECC_TTW0	(0x1c)
+#define ESR_ELx_FSC_SECC_TTW1	(0x1d)
+#define ESR_ELx_FSC_SECC_TTW2	(0x1e)
+#define ESR_ELx_FSC_SECC_TTW3	(0x1f)
 
 /* ISS field definitions for Data Aborts */
 #define ESR_ELx_ISV_SHIFT	(24)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 0df3fc3a0173..26b0c97df986 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -319,21 +319,6 @@
 				 BIT(18) |		\
 				 GENMASK(16, 15))
 
-/* For compatibility with fault code shared with 32-bit */
-#define FSC_FAULT	ESR_ELx_FSC_FAULT
-#define FSC_ACCESS	ESR_ELx_FSC_ACCESS
-#define FSC_PERM	ESR_ELx_FSC_PERM
-#define FSC_SEA		ESR_ELx_FSC_EXTABT
-#define FSC_SEA_TTW0	(0x14)
-#define FSC_SEA_TTW1	(0x15)
-#define FSC_SEA_TTW2	(0x16)
-#define FSC_SEA_TTW3	(0x17)
-#define FSC_SECC	(0x18)
-#define FSC_SECC_TTW0	(0x1c)
-#define FSC_SECC_TTW1	(0x1d)
-#define FSC_SECC_TTW2	(0x1e)
-#define FSC_SECC_TTW3	(0x1f)
-
 /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
 #define HPFAR_MASK	(~UL(0xf))
 /*
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 4ee467065042..d67a09c07f98 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -349,16 +349,16 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *v
 static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
 {
 	switch (kvm_vcpu_trap_get_fault(vcpu)) {
-	case FSC_SEA:
-	case FSC_SEA_TTW0:
-	case FSC_SEA_TTW1:
-	case FSC_SEA_TTW2:
-	case FSC_SEA_TTW3:
-	case FSC_SECC:
-	case FSC_SECC_TTW0:
-	case FSC_SECC_TTW1:
-	case FSC_SECC_TTW2:
-	case FSC_SECC_TTW3:
+	case ESR_ELx_FSC_EXTABT:
+	case ESR_ELx_FSC_SEA_TTW0:
+	case ESR_ELx_FSC_SEA_TTW1:
+	case ESR_ELx_FSC_SEA_TTW2:
+	case ESR_ELx_FSC_SEA_TTW3:
+	case ESR_ELx_FSC_SECC:
+	case ESR_ELx_FSC_SECC_TTW0:
+	case ESR_ELx_FSC_SECC_TTW1:
+	case ESR_ELx_FSC_SECC_TTW2:
+	case ESR_ELx_FSC_SECC_TTW3:
 		return true;
 	default:
 		return false;
diff --git a/arch/arm64/kvm/hyp/include/hyp/fault.h b/arch/arm64/kvm/hyp/include/hyp/fault.h
index 1b8a2dcd712f..9ddcfe2c3e57 100644
--- a/arch/arm64/kvm/hyp/include/hyp/fault.h
+++ b/arch/arm64/kvm/hyp/include/hyp/fault.h
@@ -60,7 +60,7 @@ static inline bool __get_fault_info(u64 esr, struct kvm_vcpu_fault_info *fault)
 	 */
 	if (!(esr & ESR_ELx_S1PTW) &&
 	    (cpus_have_final_cap(ARM64_WORKAROUND_834220) ||
-	     (esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
+	     (esr & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_PERM)) {
 		if (!__translate_far_to_hpfar(far, &hpfar))
 			return false;
 	} else {
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 3330d1b76bdd..07d37ff88a3f 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -367,7 +367,7 @@ static bool kvm_hyp_handle_dabt_low(struct kvm_vcpu *vcpu, u64 *exit_code)
 	if (static_branch_unlikely(&vgic_v2_cpuif_trap)) {
 		bool valid;
 
-		valid = kvm_vcpu_trap_get_fault_type(vcpu) == FSC_FAULT &&
+		valid = kvm_vcpu_trap_get_fault_type(vcpu) == ESR_ELx_FSC_FAULT &&
 			kvm_vcpu_dabt_isvalid(vcpu) &&
 			!kvm_vcpu_abt_issea(vcpu) &&
 			!kvm_vcpu_abt_iss1tw(vcpu);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 31d7fa4c7c14..a3ee3b605c9b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1212,7 +1212,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
 	VM_BUG_ON(write_fault && exec_fault);
 
-	if (fault_status == FSC_PERM && !write_fault && !exec_fault) {
+	if (fault_status == ESR_ELx_FSC_PERM && !write_fault && !exec_fault) {
 		kvm_err("Unexpected L2 read permission error\n");
 		return -EFAULT;
 	}
@@ -1277,7 +1277,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * only exception to this is when dirty logging is enabled at runtime
 	 * and a write fault needs to collapse a block entry into a table.
 	 */
-	if (fault_status != FSC_PERM || (logging_active && write_fault)) {
+	if (fault_status != ESR_ELx_FSC_PERM ||
+	    (logging_active && write_fault)) {
 		ret = kvm_mmu_topup_memory_cache(memcache,
 						 kvm_mmu_cache_min_pages(kvm));
 		if (ret)
@@ -1342,7 +1343,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * backed by a THP and thus use block mapping if possible.
 	 */
 	if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
-		if (fault_status == FSC_PERM && fault_granule > PAGE_SIZE)
+		if (fault_status ==  ESR_ELx_FSC_PERM &&
+		    fault_granule > PAGE_SIZE)
 			vma_pagesize = fault_granule;
 		else
 			vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
@@ -1350,7 +1352,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 								   &fault_ipa);
 	}
 
-	if (fault_status != FSC_PERM && !device && kvm_has_mte(kvm)) {
+	if (fault_status != ESR_ELx_FSC_PERM && !device && kvm_has_mte(kvm)) {
 		/* Check the VMM hasn't introduced a new disallowed VMA */
 		if (kvm_vma_mte_allowed(vma)) {
 			sanitise_mte_tags(kvm, pfn, vma_pagesize);
@@ -1376,7 +1378,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * permissions only if vma_pagesize equals fault_granule. Otherwise,
 	 * kvm_pgtable_stage2_map() should be called to change block size.
 	 */
-	if (fault_status == FSC_PERM && vma_pagesize == fault_granule)
+	if (fault_status == ESR_ELx_FSC_PERM && vma_pagesize == fault_granule)
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
 	else
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
@@ -1441,7 +1443,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
 	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
 
-	if (fault_status == FSC_FAULT) {
+	if (fault_status == ESR_ELx_FSC_FAULT) {
 		/* Beyond sanitised PARange (which is the IPA limit) */
 		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
 			kvm_inject_size_fault(vcpu);
@@ -1476,8 +1478,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 			      kvm_vcpu_get_hfar(vcpu), fault_ipa);
 
 	/* Check the stage-2 fault is trans. fault or write fault */
-	if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
-	    fault_status != FSC_ACCESS) {
+	if (fault_status != ESR_ELx_FSC_FAULT &&
+	    fault_status != ESR_ELx_FSC_PERM &&
+	    fault_status != ESR_ELx_FSC_ACCESS) {
 		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
 			kvm_vcpu_trap_get_class(vcpu),
 			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
@@ -1539,7 +1542,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	/* Userspace should not be able to register out-of-bounds IPAs */
 	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
 
-	if (fault_status == FSC_ACCESS) {
+	if (fault_status == ESR_ELx_FSC_ACCESS) {
 		handle_access_fault(vcpu, fault_ipa);
 		ret = 1;
 		goto out_unlock;
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-12-24 15:32 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-20 20:09 [PATCH 0/3] KVM: arm64: Fix handling of S1PTW S2 fault on RO memslots Marc Zyngier
2022-12-20 20:09 ` [PATCH 1/3] KVM: arm64: Fix S1PTW handling " Marc Zyngier
2022-12-20 21:47   ` Oliver Upton
2022-12-21  9:35     ` Marc Zyngier
2022-12-21 16:50       ` Oliver Upton
2022-12-21 17:53         ` Marc Zyngier
2022-12-21 18:26           ` Oliver Upton
2022-12-22 13:01   ` Ard Biesheuvel
2022-12-24 12:18     ` Marc Zyngier
2022-12-24 13:09       ` Ard Biesheuvel
2022-12-20 20:09 ` [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write Marc Zyngier
2022-12-21 16:46   ` Ricardo Koller
2022-12-21 17:43     ` Marc Zyngier
2022-12-23  0:33       ` Ricardo Koller
2022-12-21 17:46     ` Oliver Upton
2022-12-22  9:01       ` Marc Zyngier
2022-12-22 20:58         ` Oliver Upton
2022-12-23  1:00           ` Ricardo Koller
2022-12-24 11:59           ` Marc Zyngier
2022-12-20 20:09 ` [PATCH 3/3] KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_* Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).