From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4DCF9EF8FE1 for ; Wed, 4 Mar 2026 13:08:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version: References:In-Reply-To:Subject:Cc:To:From:Message-ID:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=t7BHMODsiL4WobOrFimmBUu2R3GTADvPlT6mSwSoSVQ=; b=Gu4qCyah5WNEn+1a+M1wilc1pA +ZYGtUoAu0bWchx9XVTpN+85kdRq7sfxPgENrH72zSwNqoWD9LHkGpmrcD1S4Nld2p/4RLjg7Giko zIhe0IOvfugO+AQC4BJReMv5/cIy9v+PWWbSv8smj/3FhiTmzJ3sqYOlf+FnQXuTFaR/DBpQrQOHq yiNpdI2bQdkNlHLYXvPlYxIZAAv3/7C+mHW8Fsg3HobXOXQjMCQE1FV3fyrX4VY+X0RsANkEzztC6 /sFGrolCli5wa8pe7MU0y3y/PJhrgypeUCFOTt+8z/uzSUNMsdzimkTfMCIxYm9DCbTe00gS31W1S nn0gyu3g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1vxlxv-0000000HDwq-06Zy; Wed, 04 Mar 2026 13:08:39 +0000 Received: from tor.source.kernel.org ([172.105.4.254]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1vxlxt-0000000HDwj-3jWn for linux-arm-kernel@lists.infradead.org; Wed, 04 Mar 2026 13:08:38 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id DA04C600AD; Wed, 4 Mar 2026 13:08:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 675C2C19423; Wed, 4 Mar 2026 13:08:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772629716; bh=LHsCaD+gLHQSDE0O1U39DwaZKlvgtplALXsSL2ppUQY=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=MyKc3qjzx3w/11ryEPu8diT05CtJO/jZ3ssYwvjAdy3VdRv2fMm6oTjpAvAoTtDvh Pvbof+x4RmhN6WWOlo91jRx8/hrZvzZjI4bIw7eG6MJNq7FyquxRsn1Y6gaxmS8s3B AYbPpVwOo2zJ6jpzohgex98/DNBCdbbWYu8lqpVvqWL2m3PJLsqIElcVZkhjMDf76n K+jHTY28wGpzXkh6qXgWHY7CDLVTvR3cN0kaVjNzdxq//hXgwEzitF672joOYe7Hi0 19losYzAf1QqXGXdJrzXZFjIdj2cCygQIcBufRTSHZfmmEqrAHueVFa7Rw/6fd7JsP Wt5PyOLwKRUyA== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vxlxp-0000000G2ot-3diI; Wed, 04 Mar 2026 13:08:34 +0000 Date: Wed, 04 Mar 2026 13:08:33 +0000 Message-ID: <861phz91ym.wl-maz@kernel.org> From: Marc Zyngier To: Sascha Bischoff Cc: "linux-arm-kernel@lists.infradead.org" , "kvmarm@lists.linux.dev" , "kvm@vger.kernel.org" , nd , "oliver.upton@linux.dev" , Joey Gouly , Suzuki Poulose , "yuzenghui@huawei.com" , "peter.maydell@linaro.org" , "lpieralisi@kernel.org" , Timothy Hayes , "jonathan.cameron@huawei.com" Subject: Re: [PATCH v5 19/36] KVM: arm64: gic-v5: Implement PPI interrupt injection In-Reply-To: <20260226155515.1164292-20-sascha.bischoff@arm.com> References: <20260226155515.1164292-1-sascha.bischoff@arm.com> <20260226155515.1164292-20-sascha.bischoff@arm.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: Sascha.Bischoff@arm.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, nd@arm.com, oliver.upton@linux.dev, Joey.Gouly@arm.com, Suzuki.Poulose@arm.com, yuzenghui@huawei.com, peter.maydell@linaro.org, lpieralisi@kernel.org, Timothy.Hayes@arm.com, jonathan.cameron@huawei.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, 26 Feb 2026 16:00:21 +0000, Sascha Bischoff wrote: > > This change introduces interrupt injection for PPIs for GICv5-based > guests. > > The lifecycle of PPIs is largely managed by the hardware for a GICv5 > system. The hypervisor injects pending state into the guest by using > the ICH_PPI_PENDRx_EL2 registers. These are used by the hardware to > pick a Highest Priority Pending Interrupt (HPPI) for the guest based > on the enable state of each individual interrupt. The enable state and > priority for each interrupt are provided by the guest itself (through > writes to the PPI registers). > > When Direct Virtual Interrupt (DVI) is set for a particular PPI, the > hypervisor is even able to skip the injection of the pending state > altogether - it all happens in hardware. > > The result of the above is that no AP lists are required for GICv5, > unlike for older GICs. Instead, for PPIs the ICH_PPI_* registers > fulfil the same purpose for all 128 PPIs. Hence, as long as the > ICH_PPI_* registers are populated prior to guest entry, and merged > back into the KVM shadow state on exit, the PPI state is preserved, > and interrupts can be injected. > > When injecting the state of a PPI the state is merged into the > PPI-specific vgic_irq structure. The PPIs are made pending via the > ICH_PPI_PENDRx_EL2 registers, the value of which is generated from the > vgic_irq structures for each PPI exposed on guest entry. The > queue_irq_unlock() irq_op is required to kick the vCPU to ensure that > it seems the new state. The result is that no AP lists are used for > private interrupts on GICv5. > > Prior to entering the guest, vgic_v5_flush_ppi_state() is called from > kvm_vgic_flush_hwstate(). This generates the pending state to inject > into the guest, and snapshots it (twice - an entry and an exit copy) > in order to track any changes. These changes can come from a guest > consuming an interrupt or from a guest making an Edge-triggered > interrupt pending. > > When returning from running a guest, the guest's PPI state is merged > back into KVM's vgic_irq state in vgic_v5_merge_ppi_state() from > kvm_vgic_sync_hwstate(). The Enable and Active state is synced back for > all PPIs, and the pending state is synced back for Edge PPIs (Level is > driven directly by the devices generating said levels). The incoming > pending state from the guest is merged with KVM's shadow state to > avoid losing any incoming interrupts. > > Signed-off-by: Sascha Bischoff > Reviewed-by: Jonathan Cameron > --- > arch/arm64/kvm/vgic/vgic-v5.c | 160 ++++++++++++++++++++++++++++++++++ > arch/arm64/kvm/vgic/vgic.c | 40 +++++++-- > arch/arm64/kvm/vgic/vgic.h | 25 ++++-- > 3 files changed, 209 insertions(+), 16 deletions(-) > > diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c > index db2225aefb130..a230c45db46ee 100644 > --- a/arch/arm64/kvm/vgic/vgic-v5.c > +++ b/arch/arm64/kvm/vgic/vgic-v5.c > @@ -132,6 +132,166 @@ int vgic_v5_finalize_ppi_state(struct kvm *kvm) > return 0; > } > > +/* > + * For GICv5, the PPIs are mostly directly managed by the hardware. We (the > + * hypervisor) handle the pending, active, enable state save/restore, but don't > + * need the PPIs to be queued on a per-VCPU AP list. Therefore, sanity check the > + * state, unlock, and return. > + */ > +static bool vgic_v5_ppi_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq, > + unsigned long flags) > + __releases(&irq->irq_lock) > +{ > + struct kvm_vcpu *vcpu; > + > + lockdep_assert_held(&irq->irq_lock); > + > + if (WARN_ON_ONCE(!__irq_is_ppi(KVM_DEV_TYPE_ARM_VGIC_V5, irq->intid))) > + goto out_unlock_fail; > + > + vcpu = irq->target_vcpu; > + if (WARN_ON_ONCE(!vcpu)) > + goto out_unlock_fail; > + > + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); > + > + /* Directly kick the target VCPU to make sure it sees the IRQ */ > + kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu); > + kvm_vcpu_kick(vcpu); > + > + return true; > + > +out_unlock_fail: > + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); > + > + return false; > +} > + > +static struct irq_ops vgic_v5_ppi_irq_ops = { > + .queue_irq_unlock = vgic_v5_ppi_queue_irq_unlock, > +}; > + > +void vgic_v5_set_ppi_ops(struct vgic_irq *irq) > +{ > + if (WARN_ON(!irq)) > + return; > + > + guard(raw_spinlock_irqsave)(&irq->irq_lock); > + > + if (!WARN_ON(irq->ops)) > + irq->ops = &vgic_v5_ppi_irq_ops; > +} > + > +/* > + * Detect any PPIs state changes, and propagate the state with KVM's > + * shadow structures. > + */ > +void vgic_v5_fold_ppi_state(struct kvm_vcpu *vcpu) > +{ > + struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5; > + > + for (int reg = 0; reg < 2; reg++) { > + const u64 activer = host_data_ptr(vgic_v5_ppi_state)->activer_exit[reg]; > + const u64 pendr = host_data_ptr(vgic_v5_ppi_state)->pendr_exit[reg]; > + unsigned long changed_bits; > + int i; > + > + /* > + * Track what changed across activer, pendr, but mask with > + * ~DVI. > + */ > + changed_bits = cpu_if->vgic_ppi_activer[reg] ^ activer; > + changed_bits |= host_data_ptr(vgic_v5_ppi_state)->pendr_entry[reg] ^ pendr; > + changed_bits &= ~cpu_if->vgic_ppi_dvir[reg]; > + > + for_each_set_bit(i, &changed_bits, 64) { > + struct vgic_irq *irq; > + u32 intid; > + > + intid = FIELD_PREP(GICV5_HWIRQ_TYPE, GICV5_HWIRQ_TYPE_PPI); > + intid |= FIELD_PREP(GICV5_HWIRQ_ID, reg * 64 + i); > + > + irq = vgic_get_vcpu_irq(vcpu, intid); > + > + scoped_guard(raw_spinlock_irqsave, &irq->irq_lock) { > + irq->active = !!(activer & BIT(i)); > + > + /* > + * This is an OR to avoid losing incoming > + * edges! > + */ > + if (irq->config == VGIC_CONFIG_EDGE) > + irq->pending_latch |= !!(pendr & BIT(i)); > + } > + > + vgic_put_irq(vcpu->kvm, irq); > + } > + > + /* > + * Re-inject the exit state as entry state next time! > + * > + * Note that the write of the Enable state is trapped, and hence > + * there is nothing to explcitly sync back here as we already > + * have the latest copy by definition. > + */ > + cpu_if->vgic_ppi_activer[reg] = activer; > + } I think this whole thing would benefit from using bitmap operations rather than these nested loops. I wrote the following, which isn't very nice either (too many casts), but could be improved by either changing the underlying types to be actual bitmaps or using bitmap_from_arr64()... void vgic_v5_fold_ppi_state(struct kvm_vcpu *vcpu) { struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5; DECLARE_BITMAP(changed_pending, 128); DECLARE_BITMAP(changed_active, 128); DECLARE_BITMAP(changed_bits, 128); unsigned long *activer, *pendr; int i; activer = (unsigned long *)&host_data_ptr(vgic_v5_ppi_state)->activer_exit; pendr = (unsigned long *)&host_data_ptr(vgic_v5_ppi_state)->pendr_exit; bitmap_xor(changed_active, (unsigned long *)cpu_if->vgic_ppi_activer, activer, 128); bitmap_xor(changed_pending, (unsigned long *)host_data_ptr(vgic_v5_ppi_state)->pendr_entry, pendr, 128); bitmap_or(changed_bits, changed_active, changed_pending, 128); for_each_set_bit(i, changed_bits, 128) { struct vgic_irq *irq; bool active; u32 intid; intid = FIELD_PREP(GICV5_HWIRQ_TYPE, GICV5_HWIRQ_TYPE_PPI); intid |= FIELD_PREP(GICV5_HWIRQ_ID, i); irq = vgic_get_vcpu_irq(vcpu, intid); active = test_bit(i, activer); scoped_guard(raw_spinlock_irqsave, &irq->irq_lock) { irq->active = active; /* * This is an OR to avoid losing incoming * edges! */ if (irq->config == VGIC_CONFIG_EDGE) irq->pending_latch |= test_bit(i, pendr); } /* * Re-inject the exit state as entry state next time! * * Note that the write of the Enable state is trapped, and * hence there is nothing to explcitly sync back here as we * already have the latest copy by definition. */ __assign_bit(i, (unsigned long *)cpu_if->vgic_ppi_activer, active); vgic_put_irq(vcpu->kvm, irq); } } > +} > + > +void vgic_v5_flush_ppi_state(struct kvm_vcpu *vcpu) > +{ > + unsigned long pendr[2]; > + > + /* > + * Time to enter the guest - we first need to build the guest's > + * ICC_PPI_PENDRx_EL1, however. > + */ > + pendr[0] = 0; > + pendr[1] = 0; > + for (int reg = 0; reg < 2; reg++) { > + u64 mask = vcpu->kvm->arch.vgic.gicv5_vm.vgic_ppi_mask[reg]; > + unsigned long bm_p = 0; > + int i; > + > + bitmap_from_arr64(&bm_p, &mask, 64); Given that you are already converting a 64bit quantity, you could bite the bullet and do all 128 bits at once. > + > + for_each_set_bit(i, &bm_p, 64) { > + struct vgic_irq *irq; > + u32 intid; > + > + intid = FIELD_PREP(GICV5_HWIRQ_TYPE, GICV5_HWIRQ_TYPE_PPI); > + intid |= FIELD_PREP(GICV5_HWIRQ_ID, reg * 64 + i); > + > + irq = vgic_get_vcpu_irq(vcpu, intid); > + > + scoped_guard(raw_spinlock_irqsave, &irq->irq_lock) { > + if (irq_is_pending(irq)) > + __assign_bit(i % 64, &pendr[reg], 1); > + } > + > + vgic_put_irq(vcpu->kvm, irq); > + } > + } > + > + /* > + * Copy the shadow state to the pending reg that will be written to the > + * ICH_PPI_PENDRx_EL2 regs. While the guest is running we track any > + * incoming changes to the pending state in the vgic_irq structures. The > + * incoming changes are merged with the outgoing changes on the return > + * path. > + */ > + host_data_ptr(vgic_v5_ppi_state)->pendr_entry[0] = pendr[0]; > + host_data_ptr(vgic_v5_ppi_state)->pendr_entry[1] = pendr[1]; > + > + /* > + * Make sure that we can correctly detect "edges" in the PPI > + * state. There's a path where we never actually enter the guest, and > + * failure to do this risks losing pending state > + */ > + host_data_ptr(vgic_v5_ppi_state)->pendr_exit[0] = pendr[0]; > + host_data_ptr(vgic_v5_ppi_state)->pendr_exit[1] = pendr[1]; > +} > + > /* > * Sets/clears the corresponding bit in the ICH_PPI_DVIR register. > */ > diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c > index 49d65e8cc742b..69bfa0f81624c 100644 > --- a/arch/arm64/kvm/vgic/vgic.c > +++ b/arch/arm64/kvm/vgic/vgic.c > @@ -105,6 +105,18 @@ struct vgic_irq *vgic_get_vcpu_irq(struct kvm_vcpu *vcpu, u32 intid) > if (WARN_ON(!vcpu)) > return NULL; > > + if (vgic_is_v5(vcpu->kvm)) { > + u32 int_num, hwirq_id; > + > + if (!__irq_is_ppi(KVM_DEV_TYPE_ARM_VGIC_V5, intid)) > + return NULL; > + > + hwirq_id = FIELD_GET(GICV5_HWIRQ_ID, intid); > + int_num = array_index_nospec(hwirq_id, VGIC_V5_NR_PRIVATE_IRQS); > + > + return &vcpu->arch.vgic_cpu.private_irqs[int_num]; > + } > + > /* SGIs and PPIs */ > if (intid < VGIC_NR_PRIVATE_IRQS) { > intid = array_index_nospec(intid, VGIC_NR_PRIVATE_IRQS); > @@ -825,9 +837,11 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu) > vgic_release_deleted_lpis(vcpu->kvm); > } > > -static inline void vgic_fold_lr_state(struct kvm_vcpu *vcpu) > +static void vgic_fold_state(struct kvm_vcpu *vcpu) > { > - if (kvm_vgic_global_state.type == VGIC_V2) > + if (vgic_is_v5(vcpu->kvm)) > + vgic_v5_fold_ppi_state(vcpu); > + else if (kvm_vgic_global_state.type == VGIC_V2) > vgic_v2_fold_lr_state(vcpu); > else > vgic_v3_fold_lr_state(vcpu); > @@ -1034,8 +1048,10 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu) > if (can_access_vgic_from_kernel()) > vgic_save_state(vcpu); > > - vgic_fold_lr_state(vcpu); > - vgic_prune_ap_list(vcpu); > + vgic_fold_state(vcpu); > + > + if (!vgic_is_v5(vcpu->kvm)) > + vgic_prune_ap_list(vcpu); I'm starting to think we should have per-GIC implementations of these things. This is becoming very tortuous. Thanks, M. -- Without deviation from the norm, progress is not possible.