All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Oliver Upton <oupton@kernel.org>
Cc: sashiko-reviews@lists.linux.dev, kvmarm@lists.linux.dev
Subject: Re: [PATCH] KVM: arm64: nv: Check RW permissions for insn abort due to S1PTW
Date: Wed, 24 Jun 2026 09:41:26 +0100	[thread overview]
Message-ID: <86mrwkqqix.wl-maz@kernel.org> (raw)
In-Reply-To: <ajsf1v4X5sJk_Dv5@kernel.org>

On Wed, 24 Jun 2026 01:07:50 +0100,
Oliver Upton <oupton@kernel.org> wrote:
> 
> On Tue, Jun 23, 2026 at 09:32:25PM +0000, sashiko-bot@kernel.org wrote:
> > > diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> > > index 94df26de6990c..713fc44f362d9 100644
> > > --- a/arch/arm64/kvm/nested.c
> > > +++ b/arch/arm64/kvm/nested.c
> > [ ... ]
> > > @@ -918,7 +918,7 @@ int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
> > >  	if (!kvm_vcpu_trap_is_permission_fault(vcpu))
> > >  		return 0;
> > >  
> > > -	if (kvm_vcpu_trap_is_iabt(vcpu)) {
> > > +	if (kvm_vcpu_trap_is_iabt(vcpu) && !kvm_vcpu_abt_iss1tw(vcpu)) {
> > >  		if (vcpu_mode_priv(vcpu))
> > >  			forward_fault = !kvm_s2_trans_exec_el1(vcpu->kvm, trans);
> > >  		else
> > 
> > [Severity: High]
> > This isn't a bug introduced by this patch, but does this change expose
> > instruction aborts to an infinite loop if the L1 guest maps nested stage-2
> > memory as Write-Only?
> > 
> > By falling through to the else block in kvm_s2_handle_perm_fault(), S1PTW
> > instruction aborts are now evaluated for read/write permissions.
> > 
> > However, if the L1 guest mapped the nested stage-2 page table entry as
> > Write-Only, the hardware will still fault on the missing read permission
> > during the stage-1 translation table walk.
> > 
> > Looking at kvm_is_write_fault() in arch/arm64/include/asm/kvm_emulate.h:
> > 
> > static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> > {
> >         if (kvm_vcpu_abt_iss1tw(vcpu)) {
> >                 ...
> >                 return kvm_vcpu_trap_is_permission_fault(vcpu);
> >         }
> >         ...
> > }
> > 
> > It unconditionally treats all permission faults on an S1PTW as write faults,
> > assuming they are Access Flag or Dirty Bit updates.
> > 
> > Back in the else block of kvm_s2_handle_perm_fault():
> > 
> >         bool write_fault = kvm_is_write_fault(vcpu);
> > 
> >         forward_fault = ((write_fault && !trans->writable) ||
> >                          (!write_fault && !trans->readable));
> > 
> > Because write_fault is true, it only evaluates !trans->writable. Since the
> > mapping is Write-Only, this check passes and KVM doesn't forward the fault.
> > Instead, KVM resumes the guest, which will immediately fault again on the
> > read, causing an infinite loop.
> 
> Well this isn't exactly malice, just a guest choosing to shoot itself in
> the foot. Having said that, it's pretty clear that the assumption around S1PTW
> permission faults no longer holds in kvm_is_write_fault().
> 
> I think the best we can do is key write faults off of TCR_ELx.HA and
> evaluate potentially both permissions for S1PTW. Needs to be split up
> into a couple patches:
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 5bf3d7e1d92c..d5c61e0027c8 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -479,21 +479,12 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
>  
>  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>  {
> -	if (kvm_vcpu_abt_iss1tw(vcpu)) {
> -		/*
> -		 * Only a permission fault on a S1PTW should be
> -		 * considered as a write. Otherwise, page tables baked
> -		 * in a read-only memslot will result in an exception
> -		 * being delivered in the guest.
> -		 *
> -		 * The drawback is that we end-up faulting twice if the
> -		 * guest is using any of HW AF/DB: a translation fault
> -		 * to map the page containing the PT (read only at
> -		 * first), then a permission fault to allow the flags
> -		 * to be set.
> -		 */
> -		return kvm_vcpu_trap_is_permission_fault(vcpu);
> -	}
> +	/*
> +	 * The architecture sucks; assume that the S1PTW fetched for write if
> +	 * HA is enabled at stage-1.
> +	 */
> +	if (kvm_vcpu_abt_iss1tw(vcpu))
> +		return effective_tcr_ha(vcpu);

OK, so you're trading the implicit state machine (translation fault ->
RO, permission fault -> RW) for a direct TCR.HA lookup, because the
only reason you'd get a S1PTW fault for write is if you were updating
the descriptor. This is cute, and I wish I had thought of that, as it
saves us a round trip.

But why can't that happen with HD? Surely this results in a similar
fault, and I think we should also evaluate that bit.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

  reply	other threads:[~2026-06-24  8:41 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23 21:13 [PATCH] KVM: arm64: nv: Check RW permissions for insn abort due to S1PTW Oliver Upton
2026-06-23 21:32 ` sashiko-bot
2026-06-24  0:07   ` Oliver Upton
2026-06-24  8:41     ` Marc Zyngier [this message]
2026-06-24  9:39       ` Oliver Upton
2026-06-24  9:57         ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86mrwkqqix.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=oupton@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.