linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Alexandru Elisei <alexandru.elisei@arm.com>
To: Marc Zyngier <maz@kernel.org>
Cc: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	kvm@vger.kernel.org, James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Joey Gouly <joey.gouly@arm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Przemyslaw Gaj <pgaj@cadence.com>
Subject: Re: [PATCH v3 14/18] KVM: arm64: nv: Add SW walker for AT S1 emulation
Date: Fri, 16 Aug 2024 10:22:43 +0100	[thread overview]
Message-ID: <Zr8aYymB_2xSqIQp@raptor> (raw)
In-Reply-To: <86msldzlly.wl-maz@kernel.org>

Hi Marc,

On Thu, Aug 15, 2024 at 07:28:41PM +0100, Marc Zyngier wrote:
> 
> Hi Alex,
> 
> On Thu, 15 Aug 2024 17:44:02 +0100,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> [..]
> 
> > > +static int setup_s1_walk(struct kvm_vcpu *vcpu, u32 op, struct s1_walk_info *wi,
> > > +			 struct s1_walk_result *wr, u64 va)
> > > +{
> > > +	u64 sctlr, tcr, tg, ps, ia_bits, ttbr;
> > > +	unsigned int stride, x;
> > > +	bool va55, tbi, lva, as_el0;
> > > +
> > > +	wi->regime = compute_translation_regime(vcpu, op);
> > > +	as_el0 = (op == OP_AT_S1E0R || op == OP_AT_S1E0W);
> > > +
> > > +	va55 = va & BIT(55);
> > > +
> > > +	if (wi->regime == TR_EL2 && va55)
> > > +		goto addrsz;
> > > +
> > > +	wi->s2 = (wi->regime == TR_EL10 &&
> > > +		  (__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_VM | HCR_DC)));
> > 
> > This could be written on one line if there were a local variable for the HCR_EL2
> > register (which is already read multiple times in the function).
> 
> Sure thing.
> 
> [...]
> 
> > > +	/* Let's put the MMU disabled case aside immediately */
> > > +	switch (wi->regime) {
> > > +	case TR_EL10:
> > > +		/*
> > > +		 * If dealing with the EL1&0 translation regime, 3 things
> > > +		 * can disable the S1 translation:
> > > +		 *
> > > +		 * - HCR_EL2.DC = 1
> > > +		 * - HCR_EL2.{E2H,TGE} = {0,1}
> > > +		 * - SCTLR_EL1.M = 0
> > > +		 *
> > > +		 * The TGE part is interesting. If we have decided that this
> > > +		 * is EL1&0, then it means that either {E2H,TGE} == {1,0} or
> > > +		 * {0,x}, and we only need to test for TGE == 1.
> > > +		 */
> > > +		if (__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_DC | HCR_TGE))
> > > +			wr->level = S1_MMU_DISABLED;
> > 
> > There's no need to fallthrough and check SCTLR_ELx.M if the MMU disabled check
> > here is true.
> 
> I'm not sure it makes the code more readable. But if you do, why not.
> 
> [...]
> 
> > > +	/* Someone was silly enough to encode TG0/TG1 differently */
> > > +	if (va55) {
> > > +		wi->txsz = FIELD_GET(TCR_T1SZ_MASK, tcr);
> > > +		tg = FIELD_GET(TCR_TG1_MASK, tcr);
> > > +
> > > +		switch (tg << TCR_TG1_SHIFT) {
> > > +		case TCR_TG1_4K:
> > > +			wi->pgshift = 12;	 break;
> > > +		case TCR_TG1_16K:
> > > +			wi->pgshift = 14;	 break;
> > > +		case TCR_TG1_64K:
> > > +		default:	    /* IMPDEF: treat any other value as 64k */
> > > +			wi->pgshift = 16;	 break;
> > > +		}
> > 
> > Just a thought, wi->pgshift is used in several places to identify the guest page
> > size, might be useful to have something like PAGE_SHIFT_{4K,16K,64K}. That would
> > also make its usage consistent, because in some places wi->pgshift is compared
> > directly to 12, 14 or 16, in other places the page size is computed from
> > wi->pgshift and compared to SZ_4K, SZ_16K or SZ_64K.
> 
> I only found a single place where we compare wi->pgshift to a
> non-symbolic integer (as part of the R_YXNYW handling). Everywhere
> else, we use BIT(wi->pgshift) and compare it to SZ_*K. We moved away
> from the various PAGE_SHIFT_* macros some years ago, and I don't think
> we want them back.

Oh, I wasn't aware of that bit of history. No need to change the current code
then, it's readable enough.

> 
> What I can do is to convert the places where we init pgshift to use an
> explicit size using const_ilog2():
> 
> @@ -185,12 +188,12 @@ static int setup_s1_walk(struct kvm_vcpu *vcpu, u32 op, struct s1_walk_info *wi,
>  
>  		switch (tg << TCR_TG1_SHIFT) {
>  		case TCR_TG1_4K:
> -			wi->pgshift = 12;	 break;
> +			wi->pgshift = const_ilog2(SZ_4K);	 break;
>  		case TCR_TG1_16K:
> -			wi->pgshift = 14;	 break;
> +			wi->pgshift = const_ilog2(SZ_16K);	 break;
>  		case TCR_TG1_64K:
>  		default:	    /* IMPDEF: treat any other value as 64k */
> -			wi->pgshift = 16;	 break;
> +			wi->pgshift = const_ilog2(SZ_64K);	 break;
>  		}
>  	} else {
>  		wi->txsz = FIELD_GET(TCR_T0SZ_MASK, tcr);
> @@ -198,12 +201,12 @@ static int setup_s1_walk(struct kvm_vcpu *vcpu, u32 op, struct s1_walk_info *wi,
>  
>  		switch (tg << TCR_TG0_SHIFT) {
>  		case TCR_TG0_4K:
> -			wi->pgshift = 12;	 break;
> +			wi->pgshift = const_ilog2(SZ_4K);	 break;
>  		case TCR_TG0_16K:
> -			wi->pgshift = 14;	 break;
> +			wi->pgshift = const_ilog2(SZ_16K);	 break;
>  		case TCR_TG0_64K:
>  		default:	    /* IMPDEF: treat any other value as 64k */
> -			wi->pgshift = 16;	 break;
> +			wi->pgshift = const_ilog2(SZ_64K);	 break;
>  		}
>  	}
>  
> @@ -212,7 +215,7 @@ static int setup_s1_walk(struct kvm_vcpu *vcpu, u32 op, struct s1_walk_info *wi,
>  		if (wi->txsz > 39)
>  			goto transfault_l0;
>  	} else {
> -		if (wi->txsz > 48 || (wi->pgshift == 16 && wi->txsz > 47))
> +		if (wi->txsz > 48 || (BIT(wi->pgshift) == SZ_64K && wi->txsz > 47))
>  			goto transfault_l0;
>  	}
> 
> 
> > 
> > > +	} else {
> > > +		wi->txsz = FIELD_GET(TCR_T0SZ_MASK, tcr);
> > > +		tg = FIELD_GET(TCR_TG0_MASK, tcr);
> > > +
> > > +		switch (tg << TCR_TG0_SHIFT) {
> > > +		case TCR_TG0_4K:
> > > +			wi->pgshift = 12;	 break;
> > > +		case TCR_TG0_16K:
> > > +			wi->pgshift = 14;	 break;
> > > +		case TCR_TG0_64K:
> > > +		default:	    /* IMPDEF: treat any other value as 64k */
> > > +			wi->pgshift = 16;	 break;
> > > +		}
> > > +	}
> > > +
> > > +	/* R_PLCGL, R_YXNYW */
> > > +	if (!kvm_has_feat_enum(vcpu->kvm, ID_AA64MMFR2_EL1, ST, 48_47)) {
> > > +		if (wi->txsz > 39)
> > > +			goto transfault_l0;
> > > +	} else {
> > > +		if (wi->txsz > 48 || (wi->pgshift == 16 && wi->txsz > 47))
> > > +			goto transfault_l0;
> > > +	}
> > > +
> > > +	/* R_GTJBY, R_SXWGM */
> > > +	switch (BIT(wi->pgshift)) {
> > > +	case SZ_4K:
> > > +		lva = kvm_has_feat(vcpu->kvm, ID_AA64MMFR0_EL1, TGRAN4, 52_BIT);
> > > +		lva &= tcr & (wi->regime == TR_EL2 ? TCR_EL2_DS : TCR_DS);
> > > +		break;
> > > +	case SZ_16K:
> > > +		lva = kvm_has_feat(vcpu->kvm, ID_AA64MMFR0_EL1, TGRAN16, 52_BIT);
> > > +		lva &= tcr & (wi->regime == TR_EL2 ? TCR_EL2_DS : TCR_DS);
> > > +		break;
> > > +	case SZ_64K:
> > > +		lva = kvm_has_feat(vcpu->kvm, ID_AA64MMFR2_EL1, VARange, 52);
> > > +		break;
> > > +	}
> > > +
> > > +	if ((lva && wi->txsz < 12) || wi->txsz < 16)
> > > +		goto transfault_l0;
> > 
> > Let's assume lva = true, wi->txsz greater than 12, but smaller than 16, which is
> > architecturally allowed according to R_GTJBY and AArch64.S1MinTxSZ().
> > 
> > (lva && wi->txsz < 12) = false
> > wi->txsz < 16 = true
> > 
> > KVM treats it as a fault.
> 
> Gah... Fixed with:
> 
> @@ -231,7 +234,7 @@ static int setup_s1_walk(struct kvm_vcpu *vcpu, u32 op, struct s1_walk_info *wi,
>  		break;
>  	}
>  
> -	if ((lva && wi->txsz < 12) || wi->txsz < 16)
> +	if ((lva && wi->txsz < 12) || (!lva && wi->txsz < 16))
>  		goto transfault_l0;
>  
>  	ia_bits = get_ia_size(wi);
> 
> Not that it has an impact yet, given that we don't support any of the
> 52bit stuff yet, but thanks for spotting this!

The change looks correct to me.

> 
> [...]
> 
> > > +	/* R_VPBBF */
> > > +	if (check_output_size(wi->baddr, wi))
> > > +		goto transfault_l0;
> > 
> > I think R_VPBBF says that an Address size fault is generated here, not a
> > translation fault.
> 
> Indeed, another one fixed.
> 
> > 
> > > +
> > > +	wi->baddr &= GENMASK_ULL(wi->max_oa_bits - 1, x);
> > > +
> > > +	return 0;
> > > +
> > > +addrsz:				/* Address Size Fault level 0 */
> > > +	fail_s1_walk(wr, ESR_ELx_FSC_ADDRSZ_L(0), false, false);
> > > +	return -EFAULT;
> > > +
> > > +transfault_l0:			/* Translation Fault level 0 */
> > > +	fail_s1_walk(wr, ESR_ELx_FSC_FAULT_L(0), false, false);
> > > +	return -EFAULT;
> > > +}
> > 
> > [..]
> > 
> > > +static bool par_check_s1_perm_fault(u64 par)
> > > +{
> > > +	u8 fst = FIELD_GET(SYS_PAR_EL1_FST, par);
> > > +
> > > +	return  ((fst & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_PERM &&
> > > +		 !(par & SYS_PAR_EL1_S));
> > 
> > ESR_ELx_FSC_PERM = 0x0c is a permission fault, level 0, which Arm ARM says can
> > only happen when FEAT_LPA2. I think the code should check that the value for
> > PAR_EL1.FST is in the interval (ESR_ELx_FSC_PERM_L(0), ESR_ELx_FSC_PERM_L(3)].
> 
> I honestly don't want to second-guess the HW. If it reports something
> that is the wrong level, why should we trust the FSC at all?

Sorry, I should have been clearer.

It's not about the hardware reporting a fault on level 0 of the translation
tables, it's about the function returning false if the hardware reports a
permission fault on levels 1, 2 or 3 of the translation tables.

For example, on a permssion fault on level 3, PAR_EL1. FST = 0b001111 = 0x0F,
which means that the condition:

(fst & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_PERM (which is 0x0C) is false and KVM
will fall back to the software walker.

Does that make sense to you?

Thanks,
Alex


  reply	other threads:[~2024-08-16  9:23 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-13 10:05 [PATCH v3 00/18] KVM: arm64: nv: Add support for address translation instructions Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 01/18] arm64: Add missing APTable and TCR_ELx.HPD masks Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 02/18] arm64: Add PAR_EL1 field description Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 03/18] arm64: Add system register encoding for PSTATE.PAN Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 04/18] arm64: Add ESR_ELx_FSC_ADDRSZ_L() helper Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 05/18] KVM: arm64: Make kvm_at() take an OP_AT_* Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 06/18] KVM: arm64: nv: Enforce S2 alignment when contiguous bit is set Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 07/18] KVM: arm64: nv: Turn upper_attr for S2 walk into the full descriptor Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 08/18] KVM: arm64: nv: Honor absence of FEAT_PAN2 Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 09/18] KVM: arm64: nv: Add basic emulation of AT S1E{0,1}{R,W} Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 10/18] KVM: arm64: nv: Add basic emulation of AT S1E1{R,W}P Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 11/18] KVM: arm64: nv: Add basic emulation of AT S1E2{R,W} Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 12/18] KVM: arm64: nv: Add emulation of AT S12E{0,1}{R,W} Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 13/18] KVM: arm64: nv: Make ps_to_output_size() generally available Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 14/18] KVM: arm64: nv: Add SW walker for AT S1 emulation Marc Zyngier
2024-08-14 10:08   ` Marc Zyngier
2024-08-15 16:44   ` Alexandru Elisei
2024-08-15 18:28     ` Marc Zyngier
2024-08-16  9:22       ` Alexandru Elisei [this message]
2024-08-16 10:37         ` Marc Zyngier
2024-08-16 11:02           ` Alexandru Elisei
2024-08-16 13:44             ` Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 15/18] KVM: arm64: nv: Sanitise SCTLR_EL1.EPAN according to VM configuration Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 16/18] KVM: arm64: nv: Make AT+PAN instructions aware of FEAT_PAN3 Marc Zyngier
2024-08-15 16:47   ` Alexandru Elisei
2024-08-13 10:05 ` [PATCH v3 17/18] KVM: arm64: nv: Plumb handling of AT S1* traps from EL2 Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 18/18] KVM: arm64: nv: Add support for FEAT_ATS1A Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zr8aYymB_2xSqIQp@raptor \
    --to=alexandru.elisei@arm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=james.morse@arm.com \
    --cc=joey.gouly@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=pgaj@cadence.com \
    --cc=suzuki.poulose@arm.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).