Re: [PATCH v3 02/16] KVM: arm64: nv: Implement nested Stage-2 page table walk logic

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Zenghui Yu <zenghui.yu@linux.dev>
To: Marc Zyngier <maz@kernel.org>
Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Joey Gouly <joey.gouly@arm.com>,
	Alexandru Elisei <alexandru.elisei@arm.com>,
	Christoffer Dall <christoffer.dall@arm.com>,
	Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>,
	wanghaibin.wang@huawei.com
Subject: Re: [PATCH v3 02/16] KVM: arm64: nv: Implement nested Stage-2 page table walk logic
Date: Wed, 4 Feb 2026 16:28:57 +0800	[thread overview]
Message-ID: <3f88cd49-68f1-4276-a067-b7c6beadb27c@linux.dev> (raw)
In-Reply-To: <20240614144552.2773592-3-maz@kernel.org>

Hi Marc,

[ chewing through the NV code.. ;-) ]

On 6/14/24 10:45 PM, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> Based on the pseudo-code in the ARM ARM, implement a stage 2 software
> page table walker.

[...]

> +static u32 compute_fsc(int level, u32 fsc)
> +{
> +	return fsc | (level & 0x3);
> +}
> +
> +static int get_ia_size(struct s2_walk_info *wi)
> +{
> +	return 64 - wi->t0sz;
> +}
> +
> +static int check_base_s2_limits(struct s2_walk_info *wi,
> +				int level, int input_size, int stride)
> +{
> +	int start_size, ia_size;
> +
> +	ia_size = get_ia_size(wi);
> +
> +	/* Check translation limits */
> +	switch (BIT(wi->pgshift)) {
> +	case SZ_64K:
> +		if (level == 0 || (level == 1 && ia_size <= 42))

It looks broken as the pseudocode checks the limits based on
*implemented PA size*, rather than on ia_size, which is essentially the
input address size (64 - T0SZ) programmed by L1 hypervisor. They're
different.

We can probably get the implemented PA size by:

AArch64.PAMax()
{
	parange = get_idreg_field_enum(kvm, ID_AA64MMFR0_EL1, PARANGE);
	return id_aa64mmfr0_parange_to_phys_shift(parange);
}

Not sure if I've read the spec correctly.

> +			return -EFAULT;
> +		break;
> +	case SZ_16K:
> +		if (level == 0 || (level == 1 && ia_size <= 40))
> +			return -EFAULT;
> +		break;
> +	case SZ_4K:
> +		if (level < 0 || (level == 0 && ia_size <= 42))
> +			return -EFAULT;
> +		break;
> +	}
> +
> +	/* Check input size limits */
> +	if (input_size > ia_size)

This is always false for the current code. ;-)

> +		return -EFAULT;
> +
> +	/* Check number of entries in starting level table */
> +	start_size = input_size - ((3 - level) * stride + wi->pgshift);
> +	if (start_size < 1 || start_size > stride + 4)
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +/* Check if output is within boundaries */
> +static int check_output_size(struct s2_walk_info *wi, phys_addr_t output)
> +{
> +	unsigned int output_size = wi->max_oa_bits;
> +
> +	if (output_size != 48 && (output & GENMASK_ULL(47, output_size)))
> +		return -1;
> +
> +	return 0;
> +}
> +
> +/*
> + * This is essentially a C-version of the pseudo code from the ARM ARM
> + * AArch64.TranslationTableWalk  function.  I strongly recommend looking at
> + * that pseudocode in trying to understand this.
> + *
> + * Must be called with the kvm->srcu read lock held
> + */
> +static int walk_nested_s2_pgd(phys_addr_t ipa,
> +			      struct s2_walk_info *wi, struct kvm_s2_trans *out)
> +{
> +	int first_block_level, level, stride, input_size, base_lower_bound;
> +	phys_addr_t base_addr;
> +	unsigned int addr_top, addr_bottom;
> +	u64 desc;  /* page table entry */
> +	int ret;
> +	phys_addr_t paddr;
> +
> +	switch (BIT(wi->pgshift)) {
> +	default:
> +	case SZ_64K:
> +	case SZ_16K:
> +		level = 3 - wi->sl;
> +		first_block_level = 2;
> +		break;
> +	case SZ_4K:
> +		level = 2 - wi->sl;
> +		first_block_level = 1;
> +		break;
> +	}
> +
> +	stride = wi->pgshift - 3;
> +	input_size = get_ia_size(wi);
> +	if (input_size > 48 || input_size < 25)
> +		return -EFAULT;
> +
> +	ret = check_base_s2_limits(wi, level, input_size, stride);
> +	if (WARN_ON(ret))
> +		return ret;
> +
> +	base_lower_bound = 3 + input_size - ((3 - level) * stride +
> +			   wi->pgshift);
> +	base_addr = wi->baddr & GENMASK_ULL(47, base_lower_bound);
> +
> +	if (check_output_size(wi, base_addr)) {
> +		out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);

With a wrongly programmed base address, we should report the ADDRSZ
fault at level 0 (as per R_BFHQH and the pseudocode). It's easy to fix.

> +static void vtcr_to_walk_info(u64 vtcr, struct s2_walk_info *wi)
> +{
> +	wi->t0sz = vtcr & TCR_EL2_T0SZ_MASK;
> +
> +	switch (vtcr & VTCR_EL2_TG0_MASK) {
> +	case VTCR_EL2_TG0_4K:
> +		wi->pgshift = 12;	 break;
> +	case VTCR_EL2_TG0_16K:
> +		wi->pgshift = 14;	 break;
> +	case VTCR_EL2_TG0_64K:
> +	default:	    /* IMPDEF: treat any other value as 64k */
> +		wi->pgshift = 16;	 break;
> +	}
> +
> +	wi->sl = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
> +	/* Global limit for now, should eventually be per-VM */
> +	wi->max_oa_bits = min(get_kvm_ipa_limit(),
                              ^^^

Should we use AArch64.PAMax() instead? As the output address size is
never larger than the implemented PA size.

Now I'm wondering if we can let kvm_get_pa_bits() just return PAMax for
(based on the exposed (to-L1) AA64MFR0.PARange value) and use it when
possible.

Thanks,
Zenghui

next prev parent reply	other threads:[~2026-02-04  8:29 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-14 14:45 [PATCH v3 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 01/16] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 02/16] KVM: arm64: nv: Implement nested Stage-2 page table walk logic Marc Zyngier
2026-02-04  8:28   ` Zenghui Yu [this message]
2026-02-06 11:05     ` Marc Zyngier
2026-02-08 18:34       ` Zenghui Yu
2024-06-14 14:45 ` [PATCH v3 03/16] KVM: arm64: nv: Handle shadow stage 2 page faults Marc Zyngier
2024-08-21 19:11   ` Zenghui Yu
2024-08-22  6:31     ` Oliver Upton
2024-06-14 14:45 ` [PATCH v3 04/16] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 05/16] KVM: arm64: nv: Add Stage-1 EL2 invalidation primitives Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 06/16] KVM: arm64: nv: Handle EL2 Stage-1 TLB invalidation Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 07/16] KVM: arm64: nv: Handle TLB invalidation targeting L2 stage-1 Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 08/16] KVM: arm64: nv: Handle TLBI VMALLS12E1{,IS} operations Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 09/16] KVM: arm64: nv: Handle TLBI ALLE1{,IS} operations Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 10/16] KVM: arm64: nv: Handle TLBI IPAS2E1{,IS} operations Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 11/16] KVM: arm64: nv: Handle FEAT_TTL hinted TLB operations Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 12/16] KVM: arm64: nv: Tag shadow S2 entries with guest's leaf S2 level Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 13/16] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 14/16] KVM: arm64: nv: Add handling of outer-shareable TLBI operations Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 15/16] KVM: arm64: nv: Add handling of range-based " Marc Zyngier
2024-06-14 14:45 ` [PATCH v3 16/16] KVM: arm64: nv: Add handling of NXS-flavoured " Marc Zyngier
2024-06-19  8:41 ` [PATCH v3 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Oliver Upton
2024-11-21  8:11   ` Ganapatrao Kulkarni
2024-11-21 16:44     ` Marc Zyngier
2024-11-22 16:54       ` Ganapatrao Kulkarni
2024-11-22 19:04         ` Marc Zyngier
2024-11-23  9:49           ` Marc Zyngier
2024-12-05 11:50             ` Darren Hart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3f88cd49-68f1-4276-a067-b7c6beadb27c@linux.dev \
    --to=zenghui.yu@linux.dev \
    --cc=alexandru.elisei@arm.com \
    --cc=christoffer.dall@arm.com \
    --cc=gankulkarni@os.amperecomputing.com \
    --cc=james.morse@arm.com \
    --cc=joey.gouly@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=suzuki.poulose@arm.com \
    --cc=wanghaibin.wang@huawei.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.