From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 01092CD8CB9 for ; Wed, 10 Jun 2026 16:00:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version: References:In-Reply-To:Subject:Cc:To:From:Message-ID:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=EYEEupVe94DV+IqIi8IHEJttbhlW/BsT5p7oUDeyBhY=; b=N0g+1CHAostZAx1/9N3LqejnrH gGg/lY0sTArrxT1DvZ5VSTWYgZXP2+qjup3aCxfNMH/KHQBIOiteJIciUYMQFCjDXjJggTZeOxC1U wtYMDBGoy8DjyLqFnF944HDVyAXR8lF9/9g9OzSeW8jA5SZIIghjj5B78cWhMhbCZqFKWar3ZZjzl RWeEKGyZ+zN8WRSHHFA9UWbEG/FA+A3xwDPnsFc4tRA/5gQjgDLNIjzYkZSdCKp2QQIWMYm4dmLFj 9wUI9xjoUNi+F8IIHzadR2cGAxlANe9ZEMumu3qsG4U6nVTh+MFe7nOQJ+TUw9TOzAUC+T38tZi5E enMeEyvg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.99.1 #2 (Red Hat Linux)) id 1wXLLy-000000084kf-2mCV; Wed, 10 Jun 2026 16:00:30 +0000 Received: from tor.source.kernel.org ([172.105.4.254]) by bombadil.infradead.org with esmtps (Exim 4.99.1 #2 (Red Hat Linux)) id 1wXLLx-000000084kX-2nJW for linux-arm-kernel@lists.infradead.org; Wed, 10 Jun 2026 16:00:29 +0000 Received: from smtp.kernel.org (quasi.space.kernel.org [100.103.45.18]) by tor.source.kernel.org (Postfix) with ESMTP id B5695601E6; Wed, 10 Jun 2026 16:00:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 486011F00893; Wed, 10 Jun 2026 16:00:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1781107228; bh=EYEEupVe94DV+IqIi8IHEJttbhlW/BsT5p7oUDeyBhY=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=YKUK7BoJneJM6kmvy5qtLqR7tj/86PtzNViIDoweet8dhWYD0ALSsf7t2f5NCO5k9 NRnBPZHneYFOd/NVO1868BkG9RNnlrZNh1FoB74lDzNz5PdOimh1RgrbChiXWSYfp3 iRSe/Io78Emt03Y5flZv+LLjEItJfGx3tbfiIM2Pw2DswAyfsktduyq1Ovx4taSPcT geDUkE6qvxRHWI2g/ERzClnQL6u9A0yQifRW/0v+bCiKxV3EekdoiLDpWzddSppem7 /aHd0aZvVE7Matv9ZMOpUjBDjNkYM6/ucbQYct+HcmgYWtreYsYX9/HeYvTPpFgj0V B4CqCZIVtraNA== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wXLLt-0000000BQFf-3QJT; Wed, 10 Jun 2026 16:00:25 +0000 Date: Wed, 10 Jun 2026 17:00:25 +0100 Message-ID: <865x3qtmg6.wl-maz@kernel.org> From: Marc Zyngier To: Hyunwoo Kim Cc: Oliver Upton , joey.gouly@arm.com, seiden@linux.ibm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org, Sascha.Bischoff@arm.com, jic23@kernel.org, timothy.hayes@arm.com, andre.przywara@arm.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev Subject: Re: [PATCH] KVM: arm64: vgic: Check the interrupt is still ours before migrating it In-Reply-To: References: <87ecila0w3.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: imv4bel@gmail.com, oupton@kernel.org, joey.gouly@arm.com, seiden@linux.ibm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org, Sascha.Bischoff@arm.com, jic23@kernel.org, timothy.hayes@arm.com, andre.przywara@arm.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, 10 Jun 2026 14:52:10 +0100, Hyunwoo Kim wrote: > > On Fri, Jun 05, 2026 at 01:43:32AM -0700, Oliver Upton wrote: > > On Fri, Jun 05, 2026 at 08:42:52AM +0100, Marc Zyngier wrote: > > > On Fri, 05 Jun 2026 07:00:37 +0100, > > > Oliver Upton wrote: > > > > > > > > On Fri, Jun 05, 2026 at 05:59:15AM +0900, Hyunwoo Kim wrote: > > > > > vgic_prune_ap_list() drops both ap_list_lock and irq_lock while migrating > > > > > an interrupt to another vCPU. After reacquiring the locks it only checks > > > > > that the affinity is unchanged (target_vcpu == vgic_target_oracle(irq)) > > > > > before moving the interrupt, which assumes that an interrupt whose affinity > > > > > is preserved is still queued on this vCPU's ap_list. > > > > > > > > > > That assumption no longer holds if the interrupt is taken off the ap_list > > > > > while the locks are dropped. vgic_flush_pending_lpis() removes the > > > > > interrupt from the list and sets irq->vcpu to NULL, but leaves > > > > > enabled/pending/target_vcpu untouched. As the interrupt is still enabled > > > > > and pending, vgic_target_oracle() returns the same target_vcpu, so the > > > > > affinity check passes and list_del() is run a second time on an entry that > > > > > has already been removed. > > > > > > > > > > Also check that the interrupt is still assigned to this vCPU > > > > > (irq->vcpu == vcpu) before moving it. > > > > > > > > > > Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework") > > > > > Signed-off-by: Hyunwoo Kim > > > > > > > > Looking at this and the other VGIC patch you sent (which should've been > > > > a combined series), are you trying to deal with a vCPU writing to > > > > another vCPU's redistributor? I.e. vCPU B setting GICR_CTLR.EnableLPIs=0 > > > > behind the back of vCPU A? > > > > > > > > That is extremely relevant information as the off-the-cuff reaction is > > > > that no race exists. But since the GIC architecture is awesome and > > > > allows for this sort of insanity, it obviously does.... > > > > > > > > Anyway, for LPIs resident on a particular RD, there's zero expectation > > > > that the pending state is preserved when EnableLPIs=0. So I'd rather > > > > vgic_flush_pending_lpis() just invalidate the pending state. > > > > > > Just clearing the pending state introduces a potential problem as we > > > now have an interrupt that is neither active nor pending on the AP > > > list. It is not impossible to solve (we now have similar behaviours > > > with SPI deactivation from another vcpu), but that requires posting a > > > KVM_REQ_VGIC_PROCESS_UPDATE to the target vcpu. > > > > Right, I was suggesting that in addition to deleting the LPI from the AP > > list we actually invalidate the pending state so that someone sitting on > > a pointer to a to-be-freed LPI sees vgic_target_oracle() returning > > NULL > > > > > > Beyond that, I see two other fixes for lifetime issues around the > > > > vgic_irq in the middle of migration. I'd like to see explicit RCU > > > > protection around the release && reacquire of the ap_list_lock rather > > > > than depending on the precondition that IRQs are disabled. > > > > > > I'm not sure I follow. Are you suggesting turning the AP list into an > > > RCU protected list? > > > > No, sorry, I should expand a little. > > > > We store a reference on the vgic_irq struct in the AP list, which is > > stable so long as the ap_list_lock is held. It should be possible for > > the refcount to drop to 0 between releasing the ap_list_lock and > > reacquiring it. > > > > So either vgic_prune_ap_list() takes an additional reference on the > > vgic_irq before dropping the ap_list_lock or rely on RCU to protect > > vgic_irq structs observed with a non-zero refcount. > > What are your thoughts on this approach? > > > Best regards, > Hyunwoo Kim > > --- > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c > index 933983bb2005..7fb871c3ccd8 100644 > --- a/arch/arm64/kvm/vgic/vgic-init.c > +++ b/arch/arm64/kvm/vgic/vgic-init.c > @@ -523,7 +523,7 @@ static void __kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu) > * Retire all pending LPIs on this vcpu anyway as we're > * going to destroy it. > */ > - vgic_flush_pending_lpis(vcpu); > + vgic_flush_pending_lpis(vcpu, true); > > INIT_LIST_HEAD(&vgic_cpu->ap_list_head); > kfree(vgic_cpu->private_irqs); > diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c > index 5913a20d8301..f85d63f17af0 100644 > --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c > +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c > @@ -303,7 +303,7 @@ static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu *vcpu, > if (ctlr != GICR_CTLR_ENABLE_LPIS) > return; > > - vgic_flush_pending_lpis(vcpu); > + vgic_flush_pending_lpis(vcpu, false); > vgic_its_invalidate_all_caches(vcpu->kvm); > atomic_set_release(&vgic_cpu->ctlr, 0); > } else { > diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c > index 1e9fe8764584..09629a38fc0a 100644 > --- a/arch/arm64/kvm/vgic/vgic.c > +++ b/arch/arm64/kvm/vgic/vgic.c > @@ -192,7 +192,7 @@ static void vgic_release_deleted_lpis(struct kvm *kvm) > xa_unlock_irqrestore(&dist->lpi_xa, flags); > } > > -void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu) > +void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu, bool destroy) > { > struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; > struct vgic_irq *irq, *tmp; > @@ -204,6 +204,13 @@ void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu) > list_for_each_entry_safe(irq, tmp, &vgic_cpu->ap_list_head, ap_list) { > if (irq_is_lpi(vcpu->kvm, irq->intid)) { > raw_spin_lock(&irq->irq_lock); > + /* Leave interrupts pending a migration for prune. */ > + if (!destroy && irq->vcpu != vgic_target_oracle(irq)) { > + raw_spin_unlock(&irq->irq_lock); > + continue; > + } It's rather unclear to me what the semantics of this are. If vcpu-a decides to nuke the LPIs of vcpu-b and the LPI had in the meantime been migrated to vcpu-c, but obviously not observed by vcpu-c yet as the LPI is still on vcpu-b's AP-list, then I don't see the point in keeping this state. Am I missing something obvious? > + /* Pending state is not preserved across EnableLPIs=0. */ > + irq->pending_latch = false; That part I agree with. > list_del(&irq->ap_list); > irq->vcpu = NULL; > raw_spin_unlock(&irq->irq_lock); > @@ -797,6 +804,9 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu) > > /* This interrupt looks like it has to be migrated. */ > > + /* Keep the interrupt alive while the locks are dropped. */ > + vgic_get_irq_ref(irq); > + > raw_spin_unlock(&irq->irq_lock); > raw_spin_unlock(&vgic_cpu->ap_list_lock); > > @@ -839,6 +849,8 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu) > raw_spin_unlock(&vcpuB->arch.vgic_cpu.ap_list_lock); > raw_spin_unlock(&vcpuA->arch.vgic_cpu.ap_list_lock); > > + deleted_lpis |= vgic_put_irq_norelease(vcpu->kvm, irq); > + > if (target_vcpu_needs_kick) { > kvm_make_request(KVM_REQ_IRQ_PENDING, target_vcpu); > kvm_vcpu_kick(target_vcpu); > diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h > index 9d941241c8a2..c1ac24ede899 100644 > --- a/arch/arm64/kvm/vgic/vgic.h > +++ b/arch/arm64/kvm/vgic/vgic.h > @@ -341,7 +341,7 @@ void vgic_v3_put(struct kvm_vcpu *vcpu); > bool vgic_has_its(struct kvm *kvm); > int kvm_vgic_register_its_device(void); > void vgic_enable_lpis(struct kvm_vcpu *vcpu); > -void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu); > +void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu, bool destroy); > int vgic_its_inject_msi(struct kvm *kvm, struct kvm_msi *msi); > int vgic_v3_has_attr_regs(struct kvm_device *dev, struct kvm_device_attr *attr); > int vgic_v3_dist_uaccess(struct kvm_vcpu *vcpu, bool is_write, > I reckon this would work just as well with just the pending state being removed in vgic_flush_pending_lpis(), and the reference holding hack in gvgic_prune_ap_list(). Thanks, M. -- Without deviation from the norm, progress is not possible.