Re: [PATCH v2 1/2] KVM: arm64: Only consider S1PTW a write fault if HA is set

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Oliver Upton <oupton@kernel.org>
To: Marc Zyngier <maz@kernel.org>
Cc: sashiko-reviews@lists.linux.dev, kvmarm@lists.linux.dev
Subject: Re: [PATCH v2 1/2] KVM: arm64: Only consider S1PTW a write fault if HA is set
Date: Thu, 25 Jun 2026 15:18:03 -0700	[thread overview]
Message-ID: <aj2pG5ZW67coAxHR@kernel.org> (raw)
In-Reply-To: <86cxxeqrjp.wl-maz@kernel.org>

On Thu, Jun 25, 2026 at 09:43:54PM +0100, Marc Zyngier wrote:
> On Thu, 25 Jun 2026 20:34:46 +0100, Oliver Upton <oupton@kernel.org> wrote:
> > We still need to account for host-induced permission faults, e.g. dirty
> > tracking or an RO memslot getting mapped into the L2. So I think we
> > still need to evaluate R+W before forwarding to the L1.
> 
> Hmm. I had forgotten about this indeed. Ultimately, we need to keep
> track of why a S2 entry is RO in the L1 IPA space. We can either use
> more SW bits in the PTE (not that many left), or wait until Wei-Lin is
> done with his reverse + direct map tracking structure.

Pretty sure we can avoid the additional state tracking so long as we
ensure the nested S2 permission checks are ordered before anything
happening 'downstream' of the MMU.

Pairing an R+W check in kvm_s2_handle_perm_fault() along with a change
to kvm_is_write_fault() to only consider HA=1 permission faults to be
writes should do the trick. I've got the following blurb locally:

@@ -918,14 +919,39 @@ int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
 	if (!kvm_vcpu_trap_is_permission_fault(vcpu))
 		return 0;
 
-	if (kvm_vcpu_trap_is_iabt(vcpu)) {
+	/*
+	 * S1PTW permission faults are a pain to deal with, owing to the fact that
+	 * the architecture sucks and there's insufficient syndrome information to
+	 * determine if the access failed for read or write permissions. We can still
+	 * infer it based on the behavior of our pseudo-TLB (i.e. the KVM MMU):
+	 *
+	 *  - S1PTW translation faults are treated as read accesses, meaning that
+	 *    the most relaxed resulting translation is read-only. The L1 hypervisor
+	 *    could prevent read accesses in the nested stage-2. Since all TTW accesses
+	 *    require at least read permission, we can detect this by unconditionally
+	 *    checking read permission in the nested stage-2.
+	 *
+	 *  - After establishing that this vCPU observed a read-only translation, we
+	 *    can infer that the access failed for lacking write permission due to
+	 *    either the nested stage-2 or KVM. Evaluate the write permission of the
+	 *    nested stage-2.
+	 *
+	 *  - Once the nested stage-2 permission checks have passed the permission
+	 *    fault must've been due to something downstream; The rest of KVM's
+	 *    fault handling can safely short-circuit to a write access at this point
+	 *    and potentially treat the access as unsupported at the virtual endpoint
+	 *    (e.g. unsupported atomic access to RO memslot).
+	 */
+	if (kvm_vcpu_abt_iss1tw(vcpu)) {
+		forward_fault = !trans->readable;
+		if (write_fault)
+			forward_fault |= !trans->writable;
+	} else if (kvm_vcpu_trap_is_iabt(vcpu)) {
 		if (vcpu_mode_priv(vcpu))
 			forward_fault = !kvm_s2_trans_exec_el1(vcpu->kvm, trans);
 		else
 			forward_fault = !kvm_s2_trans_exec_el0(vcpu->kvm, trans);
 	} else {
-		bool write_fault = kvm_is_write_fault(vcpu);
-
 		forward_fault = ((write_fault && !trans->writable) ||
 				 (!write_fault && !trans->readable));
 	}

I will admit, I have an extreme distaste for how subtle this is + the
reliance on implementation detail.

> > Looking ahead to HAFDBS, for this to work we will need to use a liberal
> > interpretation of R_JCXVS at the time of the initial translation fault
> > and always walk with intent for write.
> 
> Yeah, I'm not too precious about that, and we might as well take
> advantage of the architecture.
> 
> > 
> > Basically, there seems to be a subtle difference arising between writes
> > as observed from the nested MMU and writes as observed at the virtual
> > endpoint (memslot). Funny how something as straightforward as the access
> > flag can be so headache inducing :)
> 
> Well, you knew NV was just a sorry hack, didn't you? ;-) It's just
> another case of "SW will sort it out eventually...".

Given the present company on NV, I'm not entirely sure SW will actually sort
itself out :)

Thanks,
Oliver

next prev parent reply	other threads:[~2026-06-25 22:18 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-24 20:24 [PATCH v2 0/2] KVM: arm64: nv: Fix permission checks for S1PTW faults Oliver Upton
2026-06-24 20:24 ` [PATCH v2 1/2] KVM: arm64: Only consider S1PTW a write fault if HA is set Oliver Upton
2026-06-24 20:40   ` sashiko-bot
2026-06-24 21:00     ` Oliver Upton
2026-06-25 15:43       ` Marc Zyngier
2026-06-25 19:34         ` Oliver Upton
2026-06-25 20:43           ` Marc Zyngier
2026-06-25 22:18             ` Oliver Upton [this message]
2026-06-24 20:24 ` [PATCH v2 2/2] KVM: arm64: nv: Treat S1PTW permission faults specially Oliver Upton
2026-06-24 20:35   ` sashiko-bot
2026-06-24 21:22     ` Oliver Upton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aj2pG5ZW67coAxHR@kernel.org \
    --to=oupton@kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=maz@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.