Re: [PATCH v3 02/16] KVM: arm64: nv: Implement nested Stage-2 page table walk logic

public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed

From: Marc Zyngier <maz@kernel.org>
To: Zenghui Yu <zenghui.yu@linux.dev>
Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Joey Gouly <joey.gouly@arm.com>,
	Alexandru Elisei <alexandru.elisei@arm.com>,
	Christoffer Dall <christoffer.dall@arm.com>,
	Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>,
	wanghaibin.wang@huawei.com
Subject: Re: [PATCH v3 02/16] KVM: arm64: nv: Implement nested Stage-2 page table walk logic
Date: Fri, 06 Feb 2026 11:05:12 +0000	[thread overview]
Message-ID: <86ms1m9lp3.wl-maz@kernel.org> (raw)
In-Reply-To: <3f88cd49-68f1-4276-a067-b7c6beadb27c@linux.dev>

On Wed, 04 Feb 2026 08:28:57 +0000,
Zenghui Yu <zenghui.yu@linux.dev> wrote:
> 
> Hi Marc,
> 
> [ chewing through the NV code.. ;-) ]

You fool! :)

> 
> On 6/14/24 10:45 PM, Marc Zyngier wrote:
> > From: Christoffer Dall <christoffer.dall@linaro.org>
> > 
> > Based on the pseudo-code in the ARM ARM, implement a stage 2 software
> > page table walker.
> 
> [...]
> 
> > +static u32 compute_fsc(int level, u32 fsc)
> > +{
> > +	return fsc | (level & 0x3);
> > +}
> > +
> > +static int get_ia_size(struct s2_walk_info *wi)
> > +{
> > +	return 64 - wi->t0sz;
> > +}
> > +
> > +static int check_base_s2_limits(struct s2_walk_info *wi,
> > +				int level, int input_size, int stride)
> > +{
> > +	int start_size, ia_size;
> > +
> > +	ia_size = get_ia_size(wi);
> > +
> > +	/* Check translation limits */
> > +	switch (BIT(wi->pgshift)) {
> > +	case SZ_64K:
> > +		if (level == 0 || (level == 1 && ia_size <= 42))
> 
> It looks broken as the pseudocode checks the limits based on
> *implemented PA size*, rather than on ia_size, which is essentially the
> input address size (64 - T0SZ) programmed by L1 hypervisor. They're
> different.
>
> We can probably get the implemented PA size by:
> 
> AArch64.PAMax()
> {
> 	parange = get_idreg_field_enum(kvm, ID_AA64MMFR0_EL1, PARANGE);
> 	return id_aa64mmfr0_parange_to_phys_shift(parange);
> }
> 
> Not sure if I've read the spec correctly.

I think that's also the way I read AArch64_S2InvalidSL(), which more
or less mirrors the above.

The question is what should we limit it to? Is it PARange, as you
suggest? Or the IPA range defined by userspace at VM creation (the
type argument, which ends up in kvm->arch.mmu.pgt->ia_bits)?

I think this is the former, but we probably also need to handle the
later on actual access (when reading the descriptor). Failure to read
the descriptor (because it is outside of a memslot) should result in a
SEA being injected in the guest.

> 
> > +			return -EFAULT;
> > +		break;
> > +	case SZ_16K:
> > +		if (level == 0 || (level == 1 && ia_size <= 40))
> > +			return -EFAULT;
> > +		break;
> > +	case SZ_4K:
> > +		if (level < 0 || (level == 0 && ia_size <= 42))
> > +			return -EFAULT;
> > +		break;
> > +	}
> > +
> > +	/* Check input size limits */
> > +	if (input_size > ia_size)
> 
> This is always false for the current code. ;-)

Yup. At least that doesn't introduce any extra bug! :)

>
> > +		return -EFAULT;
> > +
> > +	/* Check number of entries in starting level table */
> > +	start_size = input_size - ((3 - level) * stride + wi->pgshift);
> > +	if (start_size < 1 || start_size > stride + 4)
> > +		return -EFAULT;
> > +
> > +	return 0;
> > +}
> > +
> > +/* Check if output is within boundaries */
> > +static int check_output_size(struct s2_walk_info *wi, phys_addr_t output)
> > +{
> > +	unsigned int output_size = wi->max_oa_bits;
> > +
> > +	if (output_size != 48 && (output & GENMASK_ULL(47, output_size)))
> > +		return -1;
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * This is essentially a C-version of the pseudo code from the ARM ARM
> > + * AArch64.TranslationTableWalk  function.  I strongly recommend looking at
> > + * that pseudocode in trying to understand this.
> > + *
> > + * Must be called with the kvm->srcu read lock held
> > + */
> > +static int walk_nested_s2_pgd(phys_addr_t ipa,
> > +			      struct s2_walk_info *wi, struct kvm_s2_trans *out)
> > +{
> > +	int first_block_level, level, stride, input_size, base_lower_bound;
> > +	phys_addr_t base_addr;
> > +	unsigned int addr_top, addr_bottom;
> > +	u64 desc;  /* page table entry */
> > +	int ret;
> > +	phys_addr_t paddr;
> > +
> > +	switch (BIT(wi->pgshift)) {
> > +	default:
> > +	case SZ_64K:
> > +	case SZ_16K:
> > +		level = 3 - wi->sl;
> > +		first_block_level = 2;
> > +		break;
> > +	case SZ_4K:
> > +		level = 2 - wi->sl;
> > +		first_block_level = 1;
> > +		break;
> > +	}
> > +
> > +	stride = wi->pgshift - 3;
> > +	input_size = get_ia_size(wi);
> > +	if (input_size > 48 || input_size < 25)
> > +		return -EFAULT;
> > +
> > +	ret = check_base_s2_limits(wi, level, input_size, stride);
> > +	if (WARN_ON(ret))
> > +		return ret;
> > +
> > +	base_lower_bound = 3 + input_size - ((3 - level) * stride +
> > +			   wi->pgshift);
> > +	base_addr = wi->baddr & GENMASK_ULL(47, base_lower_bound);
> > +
> > +	if (check_output_size(wi, base_addr)) {
> > +		out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
> 
> With a wrongly programmed base address, we should report the ADDRSZ
> fault at level 0 (as per R_BFHQH and the pseudocode). It's easy to fix.
>

Yup. Although this rule describe S1 rather than S2 (we don't seem to
have anything saying the same thing for S2), but I expect the
behaviour to be exactly the same.

> > +static void vtcr_to_walk_info(u64 vtcr, struct s2_walk_info *wi)
> > +{
> > +	wi->t0sz = vtcr & TCR_EL2_T0SZ_MASK;
> > +
> > +	switch (vtcr & VTCR_EL2_TG0_MASK) {
> > +	case VTCR_EL2_TG0_4K:
> > +		wi->pgshift = 12;	 break;
> > +	case VTCR_EL2_TG0_16K:
> > +		wi->pgshift = 14;	 break;
> > +	case VTCR_EL2_TG0_64K:
> > +	default:	    /* IMPDEF: treat any other value as 64k */
> > +		wi->pgshift = 16;	 break;
> > +	}
> > +
> > +	wi->sl = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
> > +	/* Global limit for now, should eventually be per-VM */
> > +	wi->max_oa_bits = min(get_kvm_ipa_limit(),
>                               ^^^
> 
> Should we use AArch64.PAMax() instead? As the output address size is
> never larger than the implemented PA size.
> 
> Now I'm wondering if we can let kvm_get_pa_bits() just return PAMax for
> (based on the exposed (to-L1) AA64MFR0.PARange value) and use it when
> possible.

Yes, that was the plan all along, but I got sidetracked.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

next prev parent reply	other threads:[~2026-02-06 11:05 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-14 14:45 [PATCH v3 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 01/16] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 02/16] KVM: arm64: nv: Implement nested Stage-2 page table walk logic Marc Zyngier
2026-02-04  8:28   ` Zenghui Yu
2026-02-06 11:05     ` Marc Zyngier [this message]
2026-02-08 18:34       ` Zenghui Yu
2024-06-14 14:45 ` [PATCH v3 03/16] KVM: arm64: nv: Handle shadow stage 2 page faults Marc Zyngier
2024-08-21 19:11   ` Zenghui Yu
2024-08-22  6:31     ` Oliver Upton
2024-06-14 14:45 ` [PATCH v3 04/16] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 05/16] KVM: arm64: nv: Add Stage-1 EL2 invalidation primitives Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 06/16] KVM: arm64: nv: Handle EL2 Stage-1 TLB invalidation Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 07/16] KVM: arm64: nv: Handle TLB invalidation targeting L2 stage-1 Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 08/16] KVM: arm64: nv: Handle TLBI VMALLS12E1{,IS} operations Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 09/16] KVM: arm64: nv: Handle TLBI ALLE1{,IS} operations Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 10/16] KVM: arm64: nv: Handle TLBI IPAS2E1{,IS} operations Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 11/16] KVM: arm64: nv: Handle FEAT_TTL hinted TLB operations Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 12/16] KVM: arm64: nv: Tag shadow S2 entries with guest's leaf S2 level Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 13/16] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 14/16] KVM: arm64: nv: Add handling of outer-shareable TLBI operations Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 15/16] KVM: arm64: nv: Add handling of range-based " Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 16/16] KVM: arm64: nv: Add handling of NXS-flavoured " Marc Zyngier
2024-06-19  8:41 ` [PATCH v3 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Oliver Upton
2024-11-21  8:11   ` Ganapatrao Kulkarni
2024-11-21 16:44     ` Marc Zyngier
2024-11-22 16:54       ` Ganapatrao Kulkarni
2024-11-22 19:04         ` Marc Zyngier
2024-11-23  9:49           ` Marc Zyngier
2024-12-05 11:50             ` Darren Hart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86ms1m9lp3.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=alexandru.elisei@arm.com \
    --cc=christoffer.dall@arm.com \
    --cc=gankulkarni@os.amperecomputing.com \
    --cc=james.morse@arm.com \
    --cc=joey.gouly@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=oliver.upton@linux.dev \
    --cc=suzuki.poulose@arm.com \
    --cc=wanghaibin.wang@huawei.com \
    --cc=yuzenghui@huawei.com \
    --cc=zenghui.yu@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox