From: Alexandru Elisei <alexandru.elisei@arm.com>
To: Marc Zyngier <maz@kernel.org>
Cc: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
kvm@vger.kernel.org, James Morse <james.morse@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Oliver Upton <oliver.upton@linux.dev>,
Zenghui Yu <yuzenghui@huawei.com>,
Joey Gouly <joey.gouly@arm.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Przemyslaw Gaj <pgaj@cadence.com>
Subject: Re: [PATCH v3 14/18] KVM: arm64: nv: Add SW walker for AT S1 emulation
Date: Fri, 16 Aug 2024 10:22:43 +0100 [thread overview]
Message-ID: <Zr8aYymB_2xSqIQp@raptor> (raw)
In-Reply-To: <86msldzlly.wl-maz@kernel.org>
Hi Marc,
On Thu, Aug 15, 2024 at 07:28:41PM +0100, Marc Zyngier wrote:
>
> Hi Alex,
>
> On Thu, 15 Aug 2024 17:44:02 +0100,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> [..]
>
> > > +static int setup_s1_walk(struct kvm_vcpu *vcpu, u32 op, struct s1_walk_info *wi,
> > > + struct s1_walk_result *wr, u64 va)
> > > +{
> > > + u64 sctlr, tcr, tg, ps, ia_bits, ttbr;
> > > + unsigned int stride, x;
> > > + bool va55, tbi, lva, as_el0;
> > > +
> > > + wi->regime = compute_translation_regime(vcpu, op);
> > > + as_el0 = (op == OP_AT_S1E0R || op == OP_AT_S1E0W);
> > > +
> > > + va55 = va & BIT(55);
> > > +
> > > + if (wi->regime == TR_EL2 && va55)
> > > + goto addrsz;
> > > +
> > > + wi->s2 = (wi->regime == TR_EL10 &&
> > > + (__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_VM | HCR_DC)));
> >
> > This could be written on one line if there were a local variable for the HCR_EL2
> > register (which is already read multiple times in the function).
>
> Sure thing.
>
> [...]
>
> > > + /* Let's put the MMU disabled case aside immediately */
> > > + switch (wi->regime) {
> > > + case TR_EL10:
> > > + /*
> > > + * If dealing with the EL1&0 translation regime, 3 things
> > > + * can disable the S1 translation:
> > > + *
> > > + * - HCR_EL2.DC = 1
> > > + * - HCR_EL2.{E2H,TGE} = {0,1}
> > > + * - SCTLR_EL1.M = 0
> > > + *
> > > + * The TGE part is interesting. If we have decided that this
> > > + * is EL1&0, then it means that either {E2H,TGE} == {1,0} or
> > > + * {0,x}, and we only need to test for TGE == 1.
> > > + */
> > > + if (__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_DC | HCR_TGE))
> > > + wr->level = S1_MMU_DISABLED;
> >
> > There's no need to fallthrough and check SCTLR_ELx.M if the MMU disabled check
> > here is true.
>
> I'm not sure it makes the code more readable. But if you do, why not.
>
> [...]
>
> > > + /* Someone was silly enough to encode TG0/TG1 differently */
> > > + if (va55) {
> > > + wi->txsz = FIELD_GET(TCR_T1SZ_MASK, tcr);
> > > + tg = FIELD_GET(TCR_TG1_MASK, tcr);
> > > +
> > > + switch (tg << TCR_TG1_SHIFT) {
> > > + case TCR_TG1_4K:
> > > + wi->pgshift = 12; break;
> > > + case TCR_TG1_16K:
> > > + wi->pgshift = 14; break;
> > > + case TCR_TG1_64K:
> > > + default: /* IMPDEF: treat any other value as 64k */
> > > + wi->pgshift = 16; break;
> > > + }
> >
> > Just a thought, wi->pgshift is used in several places to identify the guest page
> > size, might be useful to have something like PAGE_SHIFT_{4K,16K,64K}. That would
> > also make its usage consistent, because in some places wi->pgshift is compared
> > directly to 12, 14 or 16, in other places the page size is computed from
> > wi->pgshift and compared to SZ_4K, SZ_16K or SZ_64K.
>
> I only found a single place where we compare wi->pgshift to a
> non-symbolic integer (as part of the R_YXNYW handling). Everywhere
> else, we use BIT(wi->pgshift) and compare it to SZ_*K. We moved away
> from the various PAGE_SHIFT_* macros some years ago, and I don't think
> we want them back.
Oh, I wasn't aware of that bit of history. No need to change the current code
then, it's readable enough.
>
> What I can do is to convert the places where we init pgshift to use an
> explicit size using const_ilog2():
>
> @@ -185,12 +188,12 @@ static int setup_s1_walk(struct kvm_vcpu *vcpu, u32 op, struct s1_walk_info *wi,
>
> switch (tg << TCR_TG1_SHIFT) {
> case TCR_TG1_4K:
> - wi->pgshift = 12; break;
> + wi->pgshift = const_ilog2(SZ_4K); break;
> case TCR_TG1_16K:
> - wi->pgshift = 14; break;
> + wi->pgshift = const_ilog2(SZ_16K); break;
> case TCR_TG1_64K:
> default: /* IMPDEF: treat any other value as 64k */
> - wi->pgshift = 16; break;
> + wi->pgshift = const_ilog2(SZ_64K); break;
> }
> } else {
> wi->txsz = FIELD_GET(TCR_T0SZ_MASK, tcr);
> @@ -198,12 +201,12 @@ static int setup_s1_walk(struct kvm_vcpu *vcpu, u32 op, struct s1_walk_info *wi,
>
> switch (tg << TCR_TG0_SHIFT) {
> case TCR_TG0_4K:
> - wi->pgshift = 12; break;
> + wi->pgshift = const_ilog2(SZ_4K); break;
> case TCR_TG0_16K:
> - wi->pgshift = 14; break;
> + wi->pgshift = const_ilog2(SZ_16K); break;
> case TCR_TG0_64K:
> default: /* IMPDEF: treat any other value as 64k */
> - wi->pgshift = 16; break;
> + wi->pgshift = const_ilog2(SZ_64K); break;
> }
> }
>
> @@ -212,7 +215,7 @@ static int setup_s1_walk(struct kvm_vcpu *vcpu, u32 op, struct s1_walk_info *wi,
> if (wi->txsz > 39)
> goto transfault_l0;
> } else {
> - if (wi->txsz > 48 || (wi->pgshift == 16 && wi->txsz > 47))
> + if (wi->txsz > 48 || (BIT(wi->pgshift) == SZ_64K && wi->txsz > 47))
> goto transfault_l0;
> }
>
>
> >
> > > + } else {
> > > + wi->txsz = FIELD_GET(TCR_T0SZ_MASK, tcr);
> > > + tg = FIELD_GET(TCR_TG0_MASK, tcr);
> > > +
> > > + switch (tg << TCR_TG0_SHIFT) {
> > > + case TCR_TG0_4K:
> > > + wi->pgshift = 12; break;
> > > + case TCR_TG0_16K:
> > > + wi->pgshift = 14; break;
> > > + case TCR_TG0_64K:
> > > + default: /* IMPDEF: treat any other value as 64k */
> > > + wi->pgshift = 16; break;
> > > + }
> > > + }
> > > +
> > > + /* R_PLCGL, R_YXNYW */
> > > + if (!kvm_has_feat_enum(vcpu->kvm, ID_AA64MMFR2_EL1, ST, 48_47)) {
> > > + if (wi->txsz > 39)
> > > + goto transfault_l0;
> > > + } else {
> > > + if (wi->txsz > 48 || (wi->pgshift == 16 && wi->txsz > 47))
> > > + goto transfault_l0;
> > > + }
> > > +
> > > + /* R_GTJBY, R_SXWGM */
> > > + switch (BIT(wi->pgshift)) {
> > > + case SZ_4K:
> > > + lva = kvm_has_feat(vcpu->kvm, ID_AA64MMFR0_EL1, TGRAN4, 52_BIT);
> > > + lva &= tcr & (wi->regime == TR_EL2 ? TCR_EL2_DS : TCR_DS);
> > > + break;
> > > + case SZ_16K:
> > > + lva = kvm_has_feat(vcpu->kvm, ID_AA64MMFR0_EL1, TGRAN16, 52_BIT);
> > > + lva &= tcr & (wi->regime == TR_EL2 ? TCR_EL2_DS : TCR_DS);
> > > + break;
> > > + case SZ_64K:
> > > + lva = kvm_has_feat(vcpu->kvm, ID_AA64MMFR2_EL1, VARange, 52);
> > > + break;
> > > + }
> > > +
> > > + if ((lva && wi->txsz < 12) || wi->txsz < 16)
> > > + goto transfault_l0;
> >
> > Let's assume lva = true, wi->txsz greater than 12, but smaller than 16, which is
> > architecturally allowed according to R_GTJBY and AArch64.S1MinTxSZ().
> >
> > (lva && wi->txsz < 12) = false
> > wi->txsz < 16 = true
> >
> > KVM treats it as a fault.
>
> Gah... Fixed with:
>
> @@ -231,7 +234,7 @@ static int setup_s1_walk(struct kvm_vcpu *vcpu, u32 op, struct s1_walk_info *wi,
> break;
> }
>
> - if ((lva && wi->txsz < 12) || wi->txsz < 16)
> + if ((lva && wi->txsz < 12) || (!lva && wi->txsz < 16))
> goto transfault_l0;
>
> ia_bits = get_ia_size(wi);
>
> Not that it has an impact yet, given that we don't support any of the
> 52bit stuff yet, but thanks for spotting this!
The change looks correct to me.
>
> [...]
>
> > > + /* R_VPBBF */
> > > + if (check_output_size(wi->baddr, wi))
> > > + goto transfault_l0;
> >
> > I think R_VPBBF says that an Address size fault is generated here, not a
> > translation fault.
>
> Indeed, another one fixed.
>
> >
> > > +
> > > + wi->baddr &= GENMASK_ULL(wi->max_oa_bits - 1, x);
> > > +
> > > + return 0;
> > > +
> > > +addrsz: /* Address Size Fault level 0 */
> > > + fail_s1_walk(wr, ESR_ELx_FSC_ADDRSZ_L(0), false, false);
> > > + return -EFAULT;
> > > +
> > > +transfault_l0: /* Translation Fault level 0 */
> > > + fail_s1_walk(wr, ESR_ELx_FSC_FAULT_L(0), false, false);
> > > + return -EFAULT;
> > > +}
> >
> > [..]
> >
> > > +static bool par_check_s1_perm_fault(u64 par)
> > > +{
> > > + u8 fst = FIELD_GET(SYS_PAR_EL1_FST, par);
> > > +
> > > + return ((fst & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_PERM &&
> > > + !(par & SYS_PAR_EL1_S));
> >
> > ESR_ELx_FSC_PERM = 0x0c is a permission fault, level 0, which Arm ARM says can
> > only happen when FEAT_LPA2. I think the code should check that the value for
> > PAR_EL1.FST is in the interval (ESR_ELx_FSC_PERM_L(0), ESR_ELx_FSC_PERM_L(3)].
>
> I honestly don't want to second-guess the HW. If it reports something
> that is the wrong level, why should we trust the FSC at all?
Sorry, I should have been clearer.
It's not about the hardware reporting a fault on level 0 of the translation
tables, it's about the function returning false if the hardware reports a
permission fault on levels 1, 2 or 3 of the translation tables.
For example, on a permssion fault on level 3, PAR_EL1. FST = 0b001111 = 0x0F,
which means that the condition:
(fst & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_PERM (which is 0x0C) is false and KVM
will fall back to the software walker.
Does that make sense to you?
Thanks,
Alex
next prev parent reply other threads:[~2024-08-16 9:22 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-13 10:05 [PATCH v3 00/18] KVM: arm64: nv: Add support for address translation instructions Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 01/18] arm64: Add missing APTable and TCR_ELx.HPD masks Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 02/18] arm64: Add PAR_EL1 field description Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 03/18] arm64: Add system register encoding for PSTATE.PAN Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 04/18] arm64: Add ESR_ELx_FSC_ADDRSZ_L() helper Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 05/18] KVM: arm64: Make kvm_at() take an OP_AT_* Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 06/18] KVM: arm64: nv: Enforce S2 alignment when contiguous bit is set Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 07/18] KVM: arm64: nv: Turn upper_attr for S2 walk into the full descriptor Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 08/18] KVM: arm64: nv: Honor absence of FEAT_PAN2 Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 09/18] KVM: arm64: nv: Add basic emulation of AT S1E{0,1}{R,W} Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 10/18] KVM: arm64: nv: Add basic emulation of AT S1E1{R,W}P Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 11/18] KVM: arm64: nv: Add basic emulation of AT S1E2{R,W} Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 12/18] KVM: arm64: nv: Add emulation of AT S12E{0,1}{R,W} Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 13/18] KVM: arm64: nv: Make ps_to_output_size() generally available Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 14/18] KVM: arm64: nv: Add SW walker for AT S1 emulation Marc Zyngier
2024-08-14 10:08 ` Marc Zyngier
2024-08-15 16:44 ` Alexandru Elisei
2024-08-15 18:28 ` Marc Zyngier
2024-08-16 9:22 ` Alexandru Elisei [this message]
2024-08-16 10:37 ` Marc Zyngier
2024-08-16 11:02 ` Alexandru Elisei
2024-08-16 13:44 ` Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 15/18] KVM: arm64: nv: Sanitise SCTLR_EL1.EPAN according to VM configuration Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 16/18] KVM: arm64: nv: Make AT+PAN instructions aware of FEAT_PAN3 Marc Zyngier
2024-08-15 16:47 ` Alexandru Elisei
2024-08-13 10:05 ` [PATCH v3 17/18] KVM: arm64: nv: Plumb handling of AT S1* traps from EL2 Marc Zyngier
2024-08-13 10:05 ` [PATCH v3 18/18] KVM: arm64: nv: Add support for FEAT_ATS1A Marc Zyngier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zr8aYymB_2xSqIQp@raptor \
--to=alexandru.elisei@arm.com \
--cc=anshuman.khandual@arm.com \
--cc=james.morse@arm.com \
--cc=joey.gouly@arm.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=maz@kernel.org \
--cc=oliver.upton@linux.dev \
--cc=pgaj@cadence.com \
--cc=suzuki.poulose@arm.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.