All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Wei-Lin Chang <r09922117@csie.ntu.edu.tw>
Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org,
	Oliver Upton <oliver.upton@linux.dev>,
	Joey Gouly <joey.gouly@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>
Subject: Re: [PATCH] KVM: arm64: nv: Allow shadow stage 2 read fault
Date: Fri, 22 Aug 2025 10:40:07 +0100	[thread overview]
Message-ID: <87a53rk83s.wl-maz@kernel.org> (raw)
In-Reply-To: <20250822031853.2007437-1-r09922117@csie.ntu.edu.tw>

Hey Wei-Lin,

On Fri, 22 Aug 2025 04:18:53 +0100,
Wei-Lin Chang <r09922117@csie.ntu.edu.tw> wrote:
> 
> We don't expect to ever encounter a stage-2 read permission fault,
> because read permission is always granted when mapping stage-2. However,
> this isn't certainly the case when NV is involved. The key is the shadow
> mapping built for L2 must have the same or stricter permissions than
> those that L1 built for L2, hence opening the possibility of mappings
> without read permission.
> 
> Let's continue handling the read fault if we're handling a shadow
> stage-2 fault, instead of erroring out.
> 
> The following illustrates an example, vertical lines stand for either
> L0, L1, or L2 running, with left: L0, middle: L1, and right: L2.
> Horizontal arrows stand for traps and erets between them.
> 
> '''
> L0                               L1                               L2
>                                                L2                 |
>                                                writes to an L2 PA |
> |<----------------------------------------------------------------|
> |
> | L0
> | finds no mapping in
> | L1's s2pt for L2
> |
> | L0
> | injects data abort
> |------------------------------->|
>                                  | L1
>                                  | for whatever reason
>                                  | treats this abort specially,
>                                  | only gives it W permission
>                                  |
>        eret traps to L0          |
> |<-------------------------------|
> |
> |      eret back to L2
> |---------------------------------------------------------------->|
>                                                                   |
>                                                    L2             |
>                                                    writes to same |
>                                                    L2 PA (replay) |
> |<----------------------------------------------------------------|
> |
> | L0
> | finds mapping in s2pt that
> | L1 built for L2, create
> | shadow mapping for L2,
> | but only gives it W to match
> | what L1 had given
> |
> |
> |      eret back to L2
> |---------------------------------------------------------------->|
>                                                                   |
>                                                    L2             |
>                                                    writes to same |
>                                                    L2 PA (replay) |
>                                                    success        |
> |<----------------------------------------------------------------|
> |
> |
> |------------------------------->| L1
>                                  | for some reason, relaxes
>                                  | L2's permission from W
>                                  | to RW, AND, doesn't flush
>                                  | TLB
>                                  |
>        eret traps to L0          |
> |<-------------------------------|
> |
> |      eret back to L2
> |---------------------------------------------------------------->|
>                                                                   |
>                                                     L2            |
>                                                     read to L2 PA |
> |<----------------------------------------------------------------|
> | stage-2 read fault
> |
> '''
> 
> Signed-off-by: Wei-Lin Chang <r09922117@csie.ntu.edu.tw>

Excellent detective work! Bonus points for the ASCII art -- I'm not
sure how practical it will be to keep it in the final commit, but this
definitely helps understanding the issue.

> ---
> 
> I am able to trigger this error with a modified L1 KVM, but I do realize
> this requires L1 to be very strange (or even just wrong) so I understand
> if we don't want to handle this kind of edge case. On the other hand,
> could there also be other ways to trigger this that I have not thought
> of?
> 
> Another thing is that this change lets L1 get away with not flushing the
> TLB, but TLBs are ephemeral so it's fine in this aspect, however I'm not
> sure if there are other considerations.

Well, it depends whose TLBs we're talking about. The CPU TLBs are
definitely ephemeral. But the KVM-managed TLBs (aka the shadow S2) is
pretty static, and I am a bit worried to relax this.

What would happen on actual HW? L1 would take a S2 fault, because the
TLBs are out of sync with the S2 PTs. And yes, maybe the TLB pressure
woulds make it so that it works *sometimes*, but there wouldn't be any
guarantee of forward progress as long as EL2 doesn't issue a TLBI.

I reckon we should implement that particular behaviour whenever
possible, and handle permission faults early, possibly ahead of the
guest S2 walk (the way we handle VNCR_EL2 faults is IMO a good model).

This would imply taking the guest's S2 permission at face value, and
only drop W permission when the memslot is RO -- you'd then need to
keep track of the original W bit somewhere. And that's where things
become much harder, because KVM can decide to remap arbitrary ranges
of IPA space as RO, which implies we should track the W bit at all
times, most likely as one of the SW bits in the PTE.

We'll need exactly that if we ever want to implement the
Hardware-managed Dirty Bit, but I have the feeling we need an actual
design for this, and not a quick hack. Your approach is therefore the
correct one for the time being.

> 
> ---
>  arch/arm64/kvm/mmu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 1c78864767c5c..41017ca579b19 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1508,8 +1508,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
>  	VM_BUG_ON(write_fault && exec_fault);
>  
> -	if (fault_is_perm && !write_fault && !exec_fault) {
> -		kvm_err("Unexpected L2 read permission error\n");
> +	if (fault_is_perm && !write_fault && !exec_fault && !nested) {
> +		kvm_err("Unexpected S2 read permission error\n");
>  		return -EFAULT;
>  	}
>  

Honestly, I'd rather kill this check altogether rather than adding
another condition to it -- I suggested it to Fuad a while ago. This is
the first time we ever see it firing, and I don't think it is bringing
much.

Thanks,

	M.

-- 
Jazz isn't dead. It just smells funny.

  parent reply	other threads:[~2025-08-22  9:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-22  3:18 [PATCH] KVM: arm64: nv: Allow shadow stage 2 read fault Wei-Lin Chang
2025-08-22  9:25 ` Oliver Upton
2025-08-26 13:55   ` Wei-Lin Chang
2025-08-22  9:40 ` Marc Zyngier [this message]
2025-08-26 13:49   ` Wei-Lin Chang
2025-09-01 11:06     ` Marc Zyngier
2025-09-07  9:39       ` Wei-Lin Chang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a53rk83s.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=joey.gouly@arm.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=r09922117@csie.ntu.edu.tw \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.