public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Sascha Bischoff <Sascha.Bischoff@arm.com>
Cc: "linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"kvmarm@lists.linux.dev" <kvmarm@lists.linux.dev>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>, nd <nd@arm.com>,
	"oliver.upton@linux.dev" <oliver.upton@linux.dev>,
	Joey Gouly <Joey.Gouly@arm.com>,
	Suzuki Poulose <Suzuki.Poulose@arm.com>,
	"yuzenghui@huawei.com" <yuzenghui@huawei.com>,
	"peter.maydell@linaro.org" <peter.maydell@linaro.org>,
	"lpieralisi@kernel.org" <lpieralisi@kernel.org>,
	Timothy Hayes <Timothy.Hayes@arm.com>
Subject: Re: [PATCH 08/43] KVM: arm64: gic-v5: Introduce guest IST alloc and management
Date: Wed, 29 Apr 2026 15:29:49 +0100	[thread overview]
Message-ID: <86lde5zvoi.wl-maz@kernel.org> (raw)
In-Reply-To: <20260427160547.3129448-9-sascha.bischoff@arm.com>

On Mon, 27 Apr 2026 17:08:46 +0100,
Sascha Bischoff <Sascha.Bischoff@arm.com> wrote:
> 
> GICv5 guests use Interrupt State Tables (ISTs) to track and manage the
> interrupt state for SPIs and LPIs. These ISTs are provided to the
> host's IRS via the VMTE.
> 
> On a host GICv5 system, SPIs do not require any up-front memory
> allocation prior to their use, unlike LPIs which require the OS to
> allocate an IST. For a GICv5 guest, the same holds from the guest's
> point of view - the SPIs should require no explicit memory allocation
> by the guest. This means that the hypervisor must provision the memory
> which it passed to the IRS for managing a guest's SPI state.
> 
> In light of the above, the hypervisor allocates the SPI IST prior to
> running the guest for the first time. As only a small number of SPIs
> are expected, this is always allocated as a linear IST. The host is
> responsible for freeing this memory on guest teardown.
> 
> For LPIs, the OS needs to provision memory for state tracking. This
> applies to both hosts and guests, and so the guest will provision some
> memory for the LPI IST. However, this is not directly used by
> KVM. Instead, KVM allocates a shadow LPI IST which is passed to the
> IRS (in the VMTE). Again, on guest teardown, the hypervisor must free
> this memory again. The LPI IST is allocated as a two level structure,
> as many more LPIs are expected than SPIs.
> 
> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
> ---
>  arch/arm64/kvm/vgic/vgic-v5-tables.c | 531 +++++++++++++++++++++++++++
>  arch/arm64/kvm/vgic/vgic-v5-tables.h |  22 ++
>  include/linux/irqchip/arm-gic-v5.h   |   3 +
>  3 files changed, 556 insertions(+)
> 
> diff --git a/arch/arm64/kvm/vgic/vgic-v5-tables.c b/arch/arm64/kvm/vgic/vgic-v5-tables.c
> index 502d05d46cccf..de905f37b61a5 100644
> --- a/arch/arm64/kvm/vgic/vgic-v5-tables.c
> +++ b/arch/arm64/kvm/vgic/vgic-v5-tables.c
> @@ -501,6 +501,25 @@ int vgic_v5_vmte_init(struct kvm *kvm)
>  	return ret;
>  }
>  
> +/*
> + * The following set of forward declarations makes the code layout a *little*
> + * clearer as it lets us keep the IST-related code together.
> + */
> +static int vgic_v5_alloc_linear_ist(struct kvm *kvm, bool spi_ist,
> +				    unsigned int id_bits,
> +				    unsigned int istsz);
> +static int vgic_v5_alloc_l1_ist(struct kvm *kvm, unsigned int id_bits,
> +				unsigned int istsz, unsigned int l2_split);
> +static int vgic_v5_alloc_l2_ists(struct kvm *kvm, unsigned int id_bits,
> +				 unsigned int istsz, unsigned int l2_split);
> +static int vgic_v5_alloc_two_level_lpi_ist(struct kvm *kvm,
> +					   unsigned int id_bits,
> +					   unsigned int istsz,
> +					   unsigned int l2_split);
> +static int vgic_v5_linear_ist_free(struct kvm *kvm, bool spi);
> +static int vgic_v5_two_level_ist_free(struct kvm *kvm, bool spi);
> +static int vgic_v5_spi_ist_free(struct kvm *kvm);
> +
>  /*
>   * Release the VMT Entry, freeing up any allocated data structures before
>   * zeroing the VMTE.
> @@ -531,6 +550,18 @@ int vgic_v5_vmte_release(struct kvm *kvm)
>  	kfree(vmi->vmd_base);
>  	kfree(vmi->vpet_base);
>  
> +	/* If we have an LPI IST, free it */
> +	if (vmi->h_lpi_ist)
> +		ret = vgic_v5_lpi_ist_free(kvm);
> +	if (ret)
> +		return ret;
> +
> +	/* If we have an SPI IST, free it */
> +	if (vmi->h_spi_ist)
> +		ret = vgic_v5_spi_ist_free(kvm);
> +	if (ret)
> +		return ret;
> +
>  	xa_erase(&vm_info, vm_id);
>  	kfree(vmi);
>  
> @@ -634,3 +665,503 @@ int vgic_v5_vmte_free_vpe(struct kvm_vcpu *vcpu)
>  
>  	return 0;
>  }
> +
> +/*
> + * Assign an already allocated IST to the VM by populating the fields in the
> + * corresponding VMTE. We re-use this code for both an SPI IST and LPI IST, even
> + * if the paths to reach it might be vastly different.
> + */
> +int vgic_v5_vmte_assign_ist(struct kvm *kvm, phys_addr_t ist_base,
> +			    bool two_level, unsigned int id_bits,
> +			    unsigned int l2sz, unsigned int istsz,
> +			    bool spi_ist)
> +{
> +	struct kvm_vcpu *vcpu0 = kvm_get_vcpu(kvm, 0);
> +	u16 vm_id = vgic_v5_vm_id(kvm);
> +	struct gicv5_cmd_info cmd_info;
> +	struct vmtl2_entry *vmte;
> +	unsigned int section;
> +	u64 tmp;
> +	int ret;
> +
> +	section = spi_ist ? GICV5_VMTEL2_SPI_SECTION : GICV5_VMTEL2_LPI_SECTION;

Section? What is a section? This needs documentation (11.2.2 in the
EAC0 version of the spec) so that people can understand you are
talking about the 64bit word number in the Level-2 VM Table Entry.

> +
> +	if (ist_base & ~GICV5_VMTEL2E_IST_ADDR) {
> +		kvm_err("IST alignment issue! Address: 0x%llx, Mask 0x%llx\n",
> +			ist_base, GICV5_VMTEL2E_IST_ADDR);
> +		return -EINVAL;
> +	}
> +
> +	ret = vgic_v5_get_l2_vmte(vm_id, &vmte);
> +	if (ret)
> +		return ret;
> +
> +	/* Bail if already allocated - something is broken! */
> +	if (FIELD_GET(GICV5_VMTEL2E_IST_VALID, vmte->val[section])) {
> +		vgic_v5_clean_inval(vmte, sizeof(*vmte), true, true);

Still this odd construct. I'm starting to wonder whether I'm really
missing something.

> +		return -EINVAL;
> +	}
> +
> +	tmp = FIELD_PREP(GICV5_VMTEL2E_IST_L2SZ, l2sz);
> +	tmp |= FIELD_PREP(GICV5_VMTEL2E_IST_ADDR,
> +			ist_base >> GICV5_VMTEL2E_IST_ADDR_SHIFT);
> +	tmp |= FIELD_PREP(GICV5_VMTEL2E_IST_ISTSZ, istsz);
> +	tmp |= FIELD_PREP(GICV5_VMTEL2E_IST_ID_BITS, id_bits);
> +	tmp |= FIELD_PREP(GICV5_VMTEL2E_IST_STRUCTURE, two_level);
> +
> +	WRITE_ONCE(vmte->val[section], cpu_to_le64(tmp));
> +	vgic_v5_clean_inval(vmte, sizeof(*vmte), true, false);
> +
> +	/* Finally, mark the entry as valid */
> +	cmd_info.cmd_type = spi_ist ? SPI_VIST_MAKE_VALID : LPI_VIST_MAKE_VALID;
> +	ret = irq_set_vcpu_affinity(vgic_v5_vpe_db(vcpu0), &cmd_info);
> +
> +	/* Any cached entries we now have are stale! */
> +	vgic_v5_clean_inval(vmte, sizeof(*vmte), false, true);

Shouldn't the clean operation happen *before* you call into the IRQ
stack? It feels dangerous to do so, even if the callback doesn't do
much.

> +
> +	return ret;
> +}
> +
> +/*
> + * Helper to determine the correct l2sz to use based on the combination of
> + * PAGE_SIZE and whatever hardware supports.
> + */
> +static unsigned int vgic_v5_ist_l2sz(void)
> +{
> +	switch (PAGE_SIZE) {
> +	case SZ_64K:
> +		if (gicv5_host_ist_caps.ist_l2sz & 0x4)

Please had definitions for IRS_IDR2.IST_L2SZ.

> +			return GICV5_IRS_IST_CFGR_L2SZ_64K;
> +		fallthrough;
> +	case SZ_4K:
> +		if (gicv5_host_ist_caps.ist_l2sz & 0x1)
> +			return GICV5_IRS_IST_CFGR_L2SZ_4K;
> +		fallthrough;
> +	case SZ_16K:
> +		if (gicv5_host_ist_caps.ist_l2sz & 0x2)
> +			return GICV5_IRS_IST_CFGR_L2SZ_16K;
> +		break;
> +	}
> +
> +	if (gicv5_host_ist_caps.ist_l2sz & 0x1)
> +		return GICV5_IRS_IST_CFGR_L2SZ_4K;
> +
> +	return GICV5_IRS_IST_CFGR_L2SZ_64K;
> +}
> +
> +/* Helper to determine ISTE size based on metadata requirements */
> +static unsigned int vgic_v5_ist_istsz(unsigned int id_bits)
> +{
> +	if (!gicv5_host_ist_caps.istmd)
> +		return GICV5_IRS_IST_CFGR_ISTSZ_4;
> +
> +	if (id_bits >= gicv5_host_ist_caps.istmd_sz)
> +		return GICV5_IRS_IST_CFGR_ISTSZ_16;
> +
> +	return GICV5_IRS_IST_CFGR_ISTSZ_8;
> +}
> +
> +/*
> + * Allocate a Linear IST - always used for SPIs and potentially LPIs.
> + *
> + * The calculation for n has been taken from the GICv5 spec.

Bonus points if you add a reference to the relevant part of the spec.

> + *
> + * NOTE: istsz is the FIELD used by GICv5, not the actual size (or log2() of the
> + * size).
> + */
> +static int vgic_v5_alloc_linear_ist(struct kvm *kvm, bool spi_ist,
> +				    unsigned int id_bits, unsigned int istsz)
> +{
> +	const size_t n = id_bits + 1 + istsz;
> +	u16 vm_id = vgic_v5_vm_id(kvm);
> +	struct vgic_v5_vm_info *vmi;
> +	__le64 *ist;
> +	u32 l1sz;
> +
> +	vmi = xa_load(&vm_info, vm_id);
> +	if (WARN_ON_ONCE(!vmi))
> +		return -EINVAL;
> +
> +	/*
> +	 * Allocate the IST. We only have one level, so we just use the L2 ISTE.
> +	 */
> +	l1sz = BIT(n + 1);
> +	ist = kzalloc(l1sz, GFP_KERNEL);
> +	if (!ist)
> +		return -ENOMEM;
> +
> +	if (spi_ist) {
> +		vmi->h_spi_ist = ist;
> +	} else {
> +		vmi->h_lpi_ist_structure = false;
> +		vmi->h_lpi_ist = ist;
> +	}
> +
> +	vgic_v5_clean_inval(ist, l1sz, true, true);
> +
> +	return 0;
> +}
> +
> +/*
> + * Allocate the first level of a two-level IST - LPI, only.
> + *
> + * The calculations for n, l1_size have been taken from the GICv5 spec.
> + *
> + * NOTE: istsz and l2sz are the FIELDS used by GICv5, not the actual sizes (or
> + * log2() of the sizes).
> + */
> +static int vgic_v5_alloc_l1_ist(struct kvm *kvm, unsigned int id_bits,
> +				unsigned int istsz, unsigned int l2sz)
> +{
> +	const size_t n =  max(5, id_bits - ((10 - istsz) + (2 * l2sz)) + 3 - 1);
> +	u16 vm_id = vgic_v5_vm_id(kvm);
> +	const u32 l1_size = BIT(n + 1);
> +	struct vgic_v5_vm_info *vmi;
> +	__le64 *ist;
> +
> +	vmi = xa_load(&vm_info, vm_id);
> +	if (!vmi)
> +		return -EINVAL;
> +
> +	ist = kzalloc(l1_size, GFP_KERNEL);
> +	if (!ist)
> +		return -ENOMEM;
> +
> +	vmi->h_lpi_ist_structure = true;
> +	vmi->h_lpi_ist = ist;
> +
> +	vgic_v5_clean_inval(ist, l1_size, true, true);
> +
> +	return 0;
> +}
> +
> +/*
> + * Allocate ALL of the second level ISTs for a two-level IST - LPI, only.
> + *
> + * The calculations for n, l1_entries, l2_size have been taken from the GICv5
> + * spec.
> + *
> + * NOTE: istsz and l2sz are the FIELDS used by GICv5, not the actual sizes (or
> + * log2() of the sizes).
> + */
> +static int vgic_v5_alloc_l2_ists(struct kvm *kvm, unsigned int id_bits,
> +				unsigned int istsz, unsigned int l2sz)
> +{
> +	const size_t n =  max(5, id_bits - ((10 - istsz) + (2 * l2sz)) + 3 - 1);
> +	const int l1_entries = BIT(n + 1) / GICV5_IRS_ISTL1E_SIZE;
> +	const size_t l2_size = BIT(11 + (2 * l2sz) + 1);
> +	u16 vm_id = vgic_v5_vm_id(kvm);
> +	struct vgic_v5_vm_info *vmi;
> +	__le64 *l2ist;
> +	__le64 *l1ist;
> +	int index;
> +
> +	vmi = xa_load(&vm_info, vm_id);
> +	if (WARN_ON_ONCE(!vmi))
> +		return -EINVAL;
> +
> +	l1ist = vmi->h_lpi_ist;
> +
> +	/*
> +	 * Allocate the storage for the pointers to the L2 ISTs (used when
> +	 * freeing later).
> +	 */
> +	vmi->h_lpi_l2_ists = kzalloc_objs(*vmi->h_lpi_l2_ists, l1_entries,
> +					  GFP_KERNEL);
> +	if (!vmi->h_lpi_l2_ists)
> +		return -ENOMEM;
> +
> +	/* Allocate the L2 IST for each L1 IST entry */
> +	for (index = 0; index < l1_entries; ++index) {
> +		l2ist = kzalloc(l2_size, GFP_KERNEL);
> +		if (!l2ist) {
> +			while (--index >= 0)
> +				kfree(vmi->h_lpi_l2_ists[index]);
> +
> +			kfree(vmi->h_lpi_l2_ists);
> +			vmi->h_lpi_l2_ists = NULL;
> +
> +			return -ENOMEM;
> +		}
> +
> +		/*
> +		 * We are not doing on-demand allocation of the L2 ISTs, and are
> +		 * instead provisioning the whole IST up front. This means that
> +		 * we are able to mark the L2 ISTs as valid in the L1 ISTEs as
> +		 * the overall IST is not yet valid.
> +		 */
> +		l1ist[index] = cpu_to_le64(
> +			virt_to_phys(l2ist) & GICV5_ISTL1E_L2_ADDR_MASK) |
> +			GICV5_ISTL1E_VALID;
> +
> +		vmi->h_lpi_l2_ists[index] = l2ist;
> +
> +		vgic_v5_clean_inval(l2ist, l2_size, true, true);
> +	}
> +
> +	/* Handle CMOs for the whole L1 IST in one go */
> +	vgic_v5_clean_inval(l1ist, l1_entries * sizeof(*l1ist), true, false);
> +
> +	return 0;
> +}
> +
> +/* Allocate a two-level IST - LPIs, only */
> +static int vgic_v5_alloc_two_level_lpi_ist(struct kvm *kvm, unsigned int id_bits,
> +					   unsigned int istsz, unsigned int l2sz)
> +{
> +	u16 vm_id = vgic_v5_vm_id(kvm);
> +	struct vgic_v5_vm_info *vmi;
> +	int ret;
> +
> +	/*
> +	 * Allocate the L1 IST first, then all of the L2s. Everything
> +	 * is preallocated and we do no on-demand IST allocation. This
> +	 * is to avoid needing to track if and when the guest is doing
> +	 * on-demand IST allocation.
> +	 */
> +	ret = vgic_v5_alloc_l1_ist(kvm, id_bits, istsz, l2sz);
> +	if (ret)
> +		return ret;
> +
> +	ret = vgic_v5_alloc_l2_ists(kvm, id_bits, istsz, l2sz);
> +	if (ret) {
> +		/* Free the L1 IST again */
> +		vmi = xa_load(&vm_info, vm_id);
> +		kfree(vmi->h_lpi_ist);
> +		vmi->h_lpi_ist = 0;
> +
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static void vgic_v5_free_allocated_lpi_ist(struct vgic_v5_vm_info *vmi,
> +					   unsigned int id_bits,
> +					   unsigned int istsz,
> +					   unsigned int l2sz)
> +{
> +	if (!vmi->h_lpi_ist_structure) {
> +		kfree(vmi->h_lpi_ist);
> +		vmi->h_lpi_ist = NULL;
> +		return;
> +	}
> +
> +	if (vmi->h_lpi_l2_ists) {
> +		const size_t n = max(2, id_bits - ((10 - istsz) + (2 * l2sz)) + 3 - 1);
> +		const int l1_entries = BIT(n + 1) / GICV5_IRS_ISTL1E_SIZE;
> +		int index;
> +
> +		for (index = 0; index < l1_entries; ++index)
> +			kfree(vmi->h_lpi_l2_ists[index]);
> +
> +		kfree(vmi->h_lpi_l2_ists);
> +		vmi->h_lpi_l2_ists = NULL;
> +	}
> +
> +	kfree(vmi->h_lpi_ist);
> +	vmi->h_lpi_ist = NULL;
> +}
> +
> +void vgic_v5_free_allocated_spi_ist(struct kvm *kvm)
> +{
> +	u16 vm_id = vgic_v5_vm_id(kvm);
> +	struct vgic_v5_vm_info *vmi;
> +
> +	vmi = xa_load(&vm_info, vm_id);
> +	if (WARN_ON_ONCE(!vmi))
> +		return;
> +
> +	kfree(vmi->h_spi_ist);
> +	vmi->h_spi_ist = NULL;
> +}
> +
> +/*
> + * Free a Linear IST. Can only happen once the VM is dead.
> + */
> +static int vgic_v5_linear_ist_free(struct kvm *kvm, bool spi)
> +{
> +	u16 vm_id = vgic_v5_vm_id(kvm);
> +	struct vmtl2_entry *vmte;
> +	struct vgic_v5_vm_info *vmi;
> +	int section, ret;
> +
> +	vmi = xa_load(&vm_info, vm_id);
> +	if (!vmi)
> +		return -EINVAL;
> +
> +	ret = vgic_v5_get_l2_vmte(vm_id, &vmte);
> +	if (ret)
> +		return ret;
> +
> +	if (spi) {
> +		section = GICV5_VMTEL2_SPI_SECTION;
> +		vgic_v5_free_allocated_spi_ist(kvm);
> +	} else {
> +		section = GICV5_VMTEL2_LPI_SECTION;
> +		vgic_v5_free_allocated_lpi_ist(vmi, 0, 0, 0);
> +	}
> +
> +	/* The VM should be dead here, so we can just zero the VMT section */
> +	WRITE_ONCE(vmte->val[section], 0ULL);
> +	vgic_v5_clean_inval(vmte, sizeof(*vmte), true, true);
> +
> +	return 0;
> +}
> +
> +/*
> + * Free a Two-Level IST. Can only happen once the VM is dead.
> + */
> +static int vgic_v5_two_level_ist_free(struct kvm *kvm, bool spi)
> +{
> +	unsigned int id_bits, istsz, l2sz;
> +	u16 vm_id = vgic_v5_vm_id(kvm);
> +	struct vgic_v5_vm_info *vmi;
> +	__le64 *l1ist, tmp;
> +	struct vmtl2_entry *vmte;
> +	int section, l1_entries;
> +	size_t n;
> +	int ret;
> +
> +	/* We don't create two-level SPI ISTs, so freeing is a bad idea! */
> +	if (spi)
> +		return -EINVAL;
> +
> +	vmi = xa_load(&vm_info, vm_id);
> +	if (!vmi)
> +		return -EINVAL;
> +
> +	section = GICV5_VMTEL2_LPI_SECTION;
> +	l1ist = vmi->h_lpi_ist;
> +
> +	if (!vmi->h_lpi_ist_structure)
> +		return -EINVAL;
> +
> +	ret = vgic_v5_get_l2_vmte(vm_id, &vmte);
> +	if (ret)
> +		return ret;
> +
> +	tmp = le64_to_cpu(READ_ONCE(vmte->val[section]));
> +
> +	id_bits = FIELD_GET(GICV5_VMTEL2E_IST_ID_BITS, tmp);
> +	istsz = FIELD_GET(GICV5_VMTEL2E_IST_ISTSZ, tmp);
> +	l2sz = FIELD_GET(GICV5_VMTEL2E_IST_L2SZ, tmp);
> +
> +	/* Calculation for n taken from the GICv5 specification */
> +	n =  max(2, id_bits - ((10 - istsz) + (2 * l2sz)) + 3 - 1);
> +	l1_entries = BIT(n + 1) / GICV5_IRS_ISTL1E_SIZE;
> +
> +	vgic_v5_free_allocated_lpi_ist(vmi, id_bits, istsz, l2sz);
> +
> +	/* The VM must be dead, so we can just zero the VMT section */
> +	WRITE_ONCE(vmte->val[section], 0ULL);
> +
> +	vgic_v5_clean_inval(vmte, sizeof(*vmte), true, true);
> +
> +	return 0;
> +}
> +
> +/*
> + * Allocate an IST for SPIs.
> + *
> + * We don't anticipate a large number of SPIs being allocated. Therefore, we
> + * always allocate a Linear IST for SPIs. This will need to be revisited should
> + * that assumption no longer hold.
> + */
> +int vgic_v5_spi_ist_allocate(struct kvm *kvm, phys_addr_t *base_addr,
> +			     unsigned int id_bits, unsigned int istsz)
> +{
> +	u16 vm_id = vgic_v5_vm_id(kvm);
> +	struct vgic_v5_vm_info *vmi;
> +	int ret;
> +
> +	vmi = xa_load(&vm_info, vm_id);
> +	if (WARN_ON_ONCE(!vmi))
> +		return -EINVAL;
> +
> +	ret = vgic_v5_alloc_linear_ist(kvm, true, id_bits, istsz);
> +	if (ret)
> +		return ret;
> +
> +	*base_addr = virt_to_phys(vmi->h_spi_ist);
> +
> +	return 0;
> +}
> +
> +/*
> + * Free the IST for SPIs. Should only happen once the VM is dead.
> + */
> +static int vgic_v5_spi_ist_free(struct kvm *kvm)
> +{
> +	return vgic_v5_linear_ist_free(kvm, true);
> +}
> +
> +/*
> + * Allocate an IST for LPIs.
> + *
> + * Unlike with SPIs, we anticipate that the guest will allocate a relatively
> + * large number of LPIs. Therefore, while we support doing a linear LPI IST, it
> + * is expected that LPI ISTs will be two-level.
> + */
> +int vgic_v5_lpi_ist_alloc(struct kvm *kvm, unsigned int id_bits)
> +{
> +	u16 vm_id = vgic_v5_vm_id(kvm);
> +	struct vgic_v5_vm_info *vmi;
> +	unsigned int istsz, l2sz;
> +	phys_addr_t phys_addr;
> +	bool two_level;
> +	int ret;
> +
> +	vmi = xa_load(&vm_info, vm_id);
> +	if (WARN_ON_ONCE(!vmi))
> +		return -EINVAL;
> +
> +	istsz = vgic_v5_ist_istsz(id_bits);
> +	l2sz = vgic_v5_ist_l2sz();
> +
> +	/*
> +	 * Determine if we want to create a Linear or a Two-Level IST.
> +	 *
> +	 * If we require more than one page for the IST, create a Two-Level IST
> +	 * (if the host supports it, which is likely).
> +	 *
> +	 * Note: GICv5's istsz is not the size of the ISTEs in log2(bytes). It
> +	 * is 2 less, hence the +2 below.
> +	 */
> +	two_level = gicv5_host_ist_caps.ist_levels &&
> +		id_bits > PAGE_SHIFT - (2 + istsz);
> +
> +	if (!two_level)
> +		ret = vgic_v5_alloc_linear_ist(kvm, false /* LPIs, not SPIs */,
> +					       id_bits, istsz);
> +	else
> +		ret = vgic_v5_alloc_two_level_lpi_ist(kvm, id_bits, istsz,
> +						      l2sz);
> +
> +	if (ret)
> +		return ret;
> +
> +	phys_addr = virt_to_phys(vmi->h_lpi_ist);
> +	ret = vgic_v5_vmte_assign_ist(kvm, phys_addr, two_level, id_bits, l2sz,
> +				      istsz, false);
> +	if (ret)
> +		vgic_v5_free_allocated_lpi_ist(vmi, id_bits, istsz, l2sz);
> +
> +	return ret;
> +}
> +
> +/* Free the LPI IST again */
> +int vgic_v5_lpi_ist_free(struct kvm *kvm)
> +{
> +	u16 vm_id = vgic_v5_vm_id(kvm);
> +	struct vgic_v5_vm_info *vmi;
> +
> +	vmi = xa_load(&vm_info, vm_id);
> +	if (!vmi)
> +		return -ENXIO;
> +
> +	if (!vmi->h_lpi_ist_structure)
> +		return vgic_v5_linear_ist_free(kvm, false);
> +	else
> +		return vgic_v5_two_level_ist_free(kvm, false);
> +}
> diff --git a/arch/arm64/kvm/vgic/vgic-v5-tables.h b/arch/arm64/kvm/vgic/vgic-v5-tables.h
> index 5501a44308362..37e220cda1987 100644
> --- a/arch/arm64/kvm/vgic/vgic-v5-tables.h
> +++ b/arch/arm64/kvm/vgic/vgic-v5-tables.h
> @@ -54,6 +54,13 @@ struct vmtl2_entry {
>  #define GICV5_VMTEL2E_IST_STRUCTURE	BIT_ULL(58)
>  #define GICV5_VMTEL2E_IST_ID_BITS	GENMASK_ULL(63, 59)
>  
> +/*
> + * The LPI and SPI configuration is stored in the 2nd and 3rd 64-bit chunks of
> + * the VMTE (0-based).
> + */
> +#define GICV5_VMTEL2_LPI_SECTION	2
> +#define GICV5_VMTEL2_SPI_SECTION	3
> +
>  /* Virtual PE Table Entry */
>  typedef __le64 vpe_entry;
>  #define GICV5_VPE_VALID			BIT_ULL(0)
> @@ -66,6 +73,12 @@ struct vgic_v5_vm_info {
>  	vpe_entry __iomem	*vpet_base;
>  	void __iomem		**vped_ptrs;
>  	u8			vpe_id_bits;
> +
> +	/* Tracking for the hyp-owned ISTs */
> +	bool			h_lpi_ist_structure;
> +	__le64			*h_lpi_ist;
> +	__le64			**h_lpi_l2_ists;
> +	__le64			*h_spi_ist;

Can you please document what these individual fields represent? I'm
not sure what hyp-owned means here...

>  };
>  
>  struct vgic_v5_vmt {
> @@ -146,4 +159,13 @@ int vgic_v5_vmte_release(struct kvm *kvm);
>  int vgic_v5_vmte_alloc_vpe(struct kvm_vcpu *vcpu);
>  int vgic_v5_vmte_free_vpe(struct kvm_vcpu *vcpu);
>  
> +int vgic_v5_vmte_assign_ist(struct kvm *kvm, phys_addr_t ist_base,
> +			    bool two_level, unsigned int id_bits,
> +			    unsigned int l2sz, unsigned int istsz, bool spi_ist);
> +int vgic_v5_spi_ist_allocate(struct kvm *kvm, phys_addr_t *base_addr,
> +			     unsigned int id_bits, unsigned int istsz);
> +void vgic_v5_free_allocated_spi_ist(struct kvm *kvm);
> +int vgic_v5_lpi_ist_alloc(struct kvm *kvm, unsigned int id_bits);
> +int vgic_v5_lpi_ist_free(struct kvm *kvm);
> +
>  #endif
> diff --git a/include/linux/irqchip/arm-gic-v5.h b/include/linux/irqchip/arm-gic-v5.h
> index 89579ee04f5d1..ccec0a045927c 100644
> --- a/include/linux/irqchip/arm-gic-v5.h
> +++ b/include/linux/irqchip/arm-gic-v5.h
> @@ -450,6 +450,9 @@ enum gicv5_vcpu_info_cmd_type {
>  	VMT_L2_MAP,		/* Map in a L2 VMT - *may* happen on VM init */
>  	VMTE_MAKE_VALID,	/* Make the VMTE valid */
>  	VMTE_MAKE_INVALID,	/* Make the VMTE (et al.) invalid */
> +	SPI_VIST_MAKE_VALID,	/* No corresponding invalid */
> +	LPI_VIST_MAKE_VALID,	/* Triggered by a guest */
> +	LPI_VIST_MAKE_INVALID,	/* Triggered by a guest */
>  };
>  
>  struct gicv5_cmd_info {

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


  reply	other threads:[~2026-04-29 14:30 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27 16:06 [PATCH 00/43] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
2026-04-27 16:06 ` [PATCH 01/43] arm64/sysreg: Add GICv5 GIC VDPEND and VDRCFG encodings Sascha Bischoff
2026-04-27 16:06 ` [PATCH 02/43] arm64/sysreg: Update ICC_CR0_EL1 with LINK and LINK_IDLE fields Sascha Bischoff
2026-04-27 16:07 ` [PATCH 03/43] KVM: arm64: gic-v5: Add resident/non-resident hyp calls Sascha Bischoff
2026-04-28 14:28   ` Marc Zyngier
2026-05-01 16:40     ` Sascha Bischoff
2026-04-27 16:07 ` [PATCH 04/43] irqchip/gic-v5: Provide IRS config frame attrs to KVM Sascha Bischoff
2026-04-28 14:56   ` Marc Zyngier
2026-05-01 16:46     ` Sascha Bischoff
2026-04-27 16:07 ` [PATCH 05/43] KVM: arm64: gic-v5: Extract host IRS caps from IRS config frame Sascha Bischoff
2026-04-28 15:20   ` Marc Zyngier
2026-05-01 16:44     ` Sascha Bischoff
2026-04-27 16:08 ` [PATCH 06/43] KVM: arm64: gic-v5: Add VPE doorbell domain Sascha Bischoff
2026-04-28 16:40   ` Marc Zyngier
2026-05-01 16:54     ` Sascha Bischoff
2026-04-27 16:08 ` [PATCH 07/43] KVM: arm64: gic-v5: Create & manage VM and VPE tables Sascha Bischoff
2026-04-28 14:54   ` Vladimir Murzin
2026-05-01 16:42     ` Sascha Bischoff
2026-04-28 15:55   ` Joey Gouly
2026-04-29 10:25   ` Marc Zyngier
2026-04-27 16:08 ` [PATCH 08/43] KVM: arm64: gic-v5: Introduce guest IST alloc and management Sascha Bischoff
2026-04-29 14:29   ` Marc Zyngier [this message]
2026-04-27 16:09 ` [PATCH 09/43] KVM: arm64: gic-v5: Implement VMT/vIST IRS MMIO Ops Sascha Bischoff
2026-04-29 12:50   ` Joey Gouly
2026-04-29 16:04   ` Marc Zyngier
2026-04-27 16:09 ` [PATCH 10/43] KVM: arm64: gic-v5: Implement VPE " Sascha Bischoff
2026-04-30  8:46   ` Marc Zyngier
2026-04-27 16:09 ` [PATCH 11/43] KVM: arm64: gic-v5: Make VPEs valid in vgic_v5_reset() Sascha Bischoff
2026-04-30  9:37   ` Marc Zyngier
2026-04-27 16:10 ` [PATCH 12/43] KVM: arm64: gic-v5: Clear db_fired flag before making VPE non-resident Sascha Bischoff
2026-04-27 16:10 ` [PATCH 13/43] KVM: arm64: gic-v5: Make VPEs (non-)resident in vgic_load/put Sascha Bischoff
2026-04-30 10:26   ` Marc Zyngier
2026-04-27 16:10 ` [PATCH 14/43] KVM: arm64: gic-v5: Request VPE doorbells when going non-resident Sascha Bischoff
2026-04-30 10:37   ` Marc Zyngier
2026-04-27 16:11 ` [PATCH 15/43] KVM: arm64: gic-v5: Handle doorbells in kvm_vgic_vcpu_pending_irq() Sascha Bischoff
2026-04-27 16:11 ` [PATCH 16/43] KVM: arm64: gic-v5: Initialise and teardown VMTEs & doorbells Sascha Bischoff
2026-04-30 12:23   ` Marc Zyngier
2026-04-27 16:11 ` [PATCH 17/43] KVM: arm64: gic-v5: Enable VPE DBs on VPE reset and disable on teardown Sascha Bischoff
2026-04-27 16:12 ` [PATCH 18/43] KVM: arm64: gic-v5: Define remaining IRS MMIO registers Sascha Bischoff
2026-04-27 16:12 ` [PATCH 19/43] KVM: arm64: gic-v5: Introduce struct vgic_v5_irs and IRS base address Sascha Bischoff
2026-04-27 16:12 ` [PATCH 20/43] KVM: arm64: gic-v5: Add IRS IODEV to iodev_types and generic MMIO handlers Sascha Bischoff
2026-04-27 16:13 ` [PATCH 21/43] KVM: arm64: gic-v5: Add KVM_VGIC_V5_ADDR_TYPE_IRS to UAPI Sascha Bischoff
2026-04-27 16:13 ` [PATCH 22/43] KVM: arm64: gic-v5: Add GICv5 IRS IODEV and MMIO emulation Sascha Bischoff
2026-04-27 16:13 ` [PATCH 23/43] KVM: arm64: gic-v5: Set IRICHPPIDIS based on IRS enable state Sascha Bischoff
2026-04-27 16:14 ` [PATCH 24/43] KVM: arm64: gic-v5: Call IRS init/teardown from vgic_v5 init/teardown Sascha Bischoff
2026-04-27 16:14 ` [PATCH 25/43] KVM: arm64: gic-v5: Register the IRS IODEV Sascha Bischoff
2026-04-27 16:14 ` [PATCH 26/43] Documentation: KVM: Extend VGICv5 docs for KVM_VGIC_V5_ADDR_TYPE_IRS Sascha Bischoff
2026-04-27 16:15 ` [PATCH 27/43] KVM: arm64: selftests: Update vGICv5 selftest to set IRS address Sascha Bischoff
2026-04-27 16:15 ` [PATCH 28/43] KVM: arm64: gic-v5: Introduce SPI AP list Sascha Bischoff
2026-04-27 16:15 ` [PATCH 29/43] KVM: arm64: gic-v5: Add GIC VDPEND and GIC VDRCFG hyp calls Sascha Bischoff
2026-04-27 16:16 ` [PATCH 30/43] KVM: arm64: gic-v5: Track SPI state for in-flight SPIs Sascha Bischoff
2026-04-27 16:16 ` [PATCH 31/43] KVM: arm64: gic: Introduce set_pending_state() to irq_op Sascha Bischoff
2026-04-27 16:16 ` [PATCH 32/43] KVM: arm64: gic-v5: Support SPI injection Sascha Bischoff
2026-04-27 16:17 ` [PATCH 33/43] KVM: arm64: gic-v5: Add GICv5 SPI injection to irqfd Sascha Bischoff
2026-04-27 16:17 ` [PATCH 34/43] KVM: arm64: gic-v5: Mask per-vcpu PPI state in vgic_v5_finalize_ppi_state() Sascha Bischoff
2026-04-27 16:17 ` [PATCH 35/43] KVM: arm64: gic-v5: Add GICv5 EL1 sysreg userspace set/get interface Sascha Bischoff
2026-04-27 16:18 ` [PATCH 36/43] KVM: arm64: gic-v5: Implement save/restore mechanisms for ISTs Sascha Bischoff
2026-05-01 18:54   ` Vladimir Murzin
2026-04-27 16:18 ` [PATCH 37/43] KVM: arm64: gic-v5: Handle userspace accesses to IRS MMIO region Sascha Bischoff
2026-04-27 16:19 ` [PATCH 38/43] KVM: arm64: gic-v5: Add VGIC_GRP_IRS_REGS/VGIC_GRP_IST to UAPI Sascha Bischoff
2026-04-27 16:19 ` [PATCH 39/43] KVM: arm64: gic-v5: Plumb in has/set/get_attr for sysregs & IRS MMIO regs Sascha Bischoff
2026-04-27 16:19 ` [PATCH 40/43] Documentation: KVM: Document KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS for VGICv5 Sascha Bischoff
2026-04-27 16:20 ` [PATCH 41/43] Documentation: KVM: Add KVM_DEV_ARM_VGIC_GRP_IRS_REGS to VGICv5 docs Sascha Bischoff
2026-04-27 16:20 ` [PATCH 42/43] Documentation: KVM: Add docs for KVM_DEV_ARM_VGIC_GRP_IST Sascha Bischoff
2026-04-27 16:20 ` [PATCH 43/43] Documentation: KVM: Add the VGICv5 IRS save/restore sequences Sascha Bischoff
2026-04-30  8:57   ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86lde5zvoi.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=Joey.Gouly@arm.com \
    --cc=Sascha.Bischoff@arm.com \
    --cc=Suzuki.Poulose@arm.com \
    --cc=Timothy.Hayes@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=lpieralisi@kernel.org \
    --cc=nd@arm.com \
    --cc=oliver.upton@linux.dev \
    --cc=peter.maydell@linaro.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox