From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DB429C2BD09 for ; Mon, 1 Jul 2024 11:21:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Subject:Cc:To:From: Message-ID:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=wPYIK12DjizMHtanvhi+wDNkZhCpx+RNzMzvmx83hwI=; b=r90p+ggx1kT08mx0W30pM9ZIqi aakyGrdZa6NQND9+JblnVWZDMORt0CwACKm7OLYa3WLSyuJ2Wd+YStGJxhpn/HuizJDIWR2dy8oYo oWl89In8CaPBqnQAaaL1heC7+CP0jqj/MzmSYFTQZlYAHNb6/j065LVWyKF7au2ngq5e2x9pDioRs mnGYuEgf74glPbMqihi+dtkfhVa08Ro8IVHont8jVMeMJLNfxRqNxW1R68EhAuDbZhB07LlqMP3e/ O6sLT6yifXa7iHFZ/8miEB8U9tGKi/uL6x8ukIcxQorrQ/cnFeURIJi1gUhhJatOetn9zLXKtyj3i m/Z8q/WA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sOF5g-00000002sMC-1zwO; Mon, 01 Jul 2024 11:21:00 +0000 Received: from dfw.source.kernel.org ([139.178.84.217]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sOF5W-00000002sKP-3qvW for linux-arm-kernel@lists.infradead.org; Mon, 01 Jul 2024 11:20:52 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id B5C9561248; Mon, 1 Jul 2024 11:20:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6A7D6C4AF0C; Mon, 1 Jul 2024 11:20:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719832849; bh=NSwcZo86tOSm2faPtLycodlFmTcALPoXvh4E64ZsC+4=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=C6OLsfOxu3hQ33cphcacWiUvIbF/LE4ul+zpVGkBMY0iE98Nw43X2+5IqwHL4P8XB Pueyyc+NJ+7Svgc1K8M4Uu4r2OiY4YO0jIWKB1P3MAfQlr5AkUtfSIH2LbAu/jB0hB li+vRq/WNgrJUoXK3yl7Lg9DAp7cISRAnuXG6uf3O2X2qyP7lMAoCivHCUn/ODYh3q twyxb9sJRAzS9upLpZ37N6sMnEOEXXBWzRnRdgS0d/03Xwxl43sCW+WZaXKvMI08kG gF6FocybgDUr6HVGQQ9hmFHeoJACKPfiqfbVpodkwrQzpHiQaA6CESJb/OlEtaTDP5 SVmdMTki7twqw== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1sOF5T-008lYA-49; Mon, 01 Jul 2024 12:20:47 +0100 Date: Mon, 01 Jul 2024 12:20:46 +0100 Message-ID: <868qylicj5.wl-maz@kernel.org> From: Marc Zyngier To: Nianyao Tang Cc: , , , , Subject: Re: [PATCH] irqchip/gic-v4: Fix vcpus racing for vpe->col_idx in vmapp and vmovp In-Reply-To: <20240701072305.4129823-1-tangnianyao@huawei.com> References: <20240701072305.4129823-1-tangnianyao@huawei.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.2 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: tangnianyao@huawei.com, tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, guoyang2@huawei.com, wangwudi@hisilicon.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240701_042051_122140_85EF99A7 X-CRM114-Status: GOOD ( 53.01 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, 01 Jul 2024 08:23:05 +0100, Nianyao Tang wrote: >=20 > its_map_vm may modify vpe->col_idx without holding vpe->vpe_lock. > It would result in a vpe resident on one RD after vmovp to a different RD. > Or, a vpe maybe vmovp to a RD same as it is current mapped in vpe table. >=20 > On a 2-ITS, GICv4 enabled system, 32 vcpus deployed on cpu of collection 0 > and 1. Two pci devices route VLPIs, using each of the ITS. > VPE ready to reside on RD1 may have such unexpected case because another > vcpu on other cpu is doing vmapp and modify his vpe->col_idx. >=20 > Unexpected Case 1: > RD 0 1 > vcpu_load > lock vpe_lock > vpe->col_idx =3D 1 > its_map_vm > lock vmovp_lock > waiting vmovp_lock > vpe->col_idx =3D 0 > (cpu0 is first online cpu) > vmapp vpe on col0 > unlock vmovp_lock > lock vmovp_lock > vmovp vpe to col0 > unlock vmovp_lock > vpe resident here fail to > receive VLPI! >=20 > Unexpected Case 2: > RD 0 1 > its_map_vm vcpu_load > lock vmovp_lock lock vpe_lock > vpe->col_idx =3D 0 > vpe->col_idx =3D 1 > vmapp vpe on col1 waiting vmovp_lock > unlock vmovp_lock > lock vmovp_lock > vmovp vpe to col1 > (target RD =3D=3D source RD!) > unlock vmovp_lock Why is this second case a problem? What is the conclusion? This commit message doesn't explain how you are solving it. >=20 >=20 >=20 > Signed-off-by: Nianyao Tang > --- > drivers/irqchip/irq-gic-v3-its.c | 18 +++++++++++++----- > 1 file changed, 13 insertions(+), 5 deletions(-) >=20 > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v= 3-its.c > index f99c0a86320b..adda9824e0e7 100644 > --- a/drivers/irqchip/irq-gic-v3-its.c > +++ b/drivers/irqchip/irq-gic-v3-its.c > @@ -1794,11 +1794,15 @@ static bool gic_requires_eager_mapping(void) > static void its_map_vm(struct its_node *its, struct its_vm *vm) > { > unsigned long flags; > + bool vm_mapped_on_any_its =3D false; > + int i; > =20 > if (gic_requires_eager_mapping()) > return; > =20 > - raw_spin_lock_irqsave(&vmovp_lock, flags); > + for (i =3D 0; i < GICv4_ITS_LIST_MAX; i++) > + if (vm->vlpi_count[i] > 0) > + vm_mapped_on_any_its =3D true; What makes you think that dropping the vmovp lock is a good idea? What if you have a concurrent unmap? > =20 > /* > * If the VM wasn't mapped yet, iterate over the vpes and get > @@ -1813,15 +1817,19 @@ static void its_map_vm(struct its_node *its, stru= ct its_vm *vm) > struct its_vpe *vpe =3D vm->vpes[i]; > struct irq_data *d =3D irq_get_irq_data(vpe->irq); > =20 > - /* Map the VPE to the first possible CPU */ > - vpe->col_idx =3D cpumask_first(cpu_online_mask); > + raw_spin_lock_irqsave(&vpe->vpe_lock, flags); > + > + if (!vm_mapped_on_any_its) { > + /* Map the VPE to the first possible CPU */ > + vpe->col_idx =3D cpumask_first(cpu_online_mask); > + } If the issue is that the target RD isn't initialised before we issue a VMAPP, then why not initialising it right from the start? > its_send_vmapp(its, vpe, true); > its_send_vinvall(its, vpe); > irq_data_update_effective_affinity(d, cpumask_of(vpe->col_idx)); > + > + raw_spin_unlock_irqrestore(&vpe->vpe_lock, flags); > } > } > - > - raw_spin_unlock_irqrestore(&vmovp_lock, flags); > } > =20 > static void its_unmap_vm(struct its_node *its, struct its_vm *vm) I don't think this patch makes much sense. It opens an even bigger race between map and unmap, and I really don't think we want that. The main issue is that you are trying to avoid fixing the root cause of the problem... The first course of action is to move the vpe->col_idx init and affinity update to activation time, which would make all implementations (v4 with and without VMOVP list, v4.1) behave the same. On its own, this solves the biggest issue. The other thing would be to ensure that this lazy VMAPP cannot take place concurrently with vcpu_load(). We can't take the VPE lock after the vmovp_lock, as this violates the lock ordering established in its_vpe_set_affinity(), which locks the VPE before doing anything else. A possibility would be to take *all* vpe locks *before* taking the vmovp lock, ensuring that vmapp sees something consistent. But that's potentially huge, and likely to cause stalls on a busy VM. Instead, a per-VM lock would do the trick and allow each VPE lock to be taken in turn. With that, we can fix the locking correctly, and make sure that vcpu_load and vmapp are mutually exclusive. I've written the small patch series below (compile-tested only). Please let me know how it fares for you. M. =46rom a9857d782fc649bc08db953477bed9b8c3421118 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Mon, 1 Jul 2024 10:39:19 +0100 Subject: [PATCH 1/3] irqchip/gic-v4: Always configure affinity on VPE activation We currently have two paths to set the initial affinity of a VPE: - at activation time on GICv4 without the stupid VMOVP list, and on GICv4.1 - at map time for GICv4 with VMOVP list The latter location may end-up modifying the affinity of VPE that is currently running, making the results unpredictible. Instead, unify the two paths, making sure we only set the initial affinity at activation time. Signed-off-by: Marc Zyngier --- drivers/irqchip/irq-gic-v3-its.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-= its.c index 3c755d5dad6e6..a00c5e8c4ea65 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1809,13 +1809,9 @@ static void its_map_vm(struct its_node *its, struct = its_vm *vm) =20 for (i =3D 0; i < vm->nr_vpes; i++) { struct its_vpe *vpe =3D vm->vpes[i]; - struct irq_data *d =3D irq_get_irq_data(vpe->irq); =20 - /* Map the VPE to the first possible CPU */ - vpe->col_idx =3D cpumask_first(cpu_online_mask); its_send_vmapp(its, vpe, true); its_send_vinvall(its, vpe); - irq_data_update_effective_affinity(d, cpumask_of(vpe->col_idx)); } } =20 @@ -4562,6 +4558,10 @@ static int its_vpe_irq_domain_activate(struct irq_do= main *domain, struct its_vpe *vpe =3D irq_data_get_irq_chip_data(d); struct its_node *its; =20 + /* Map the VPE to the first possible CPU */ + vpe->col_idx =3D cpumask_first(cpu_online_mask); + irq_data_update_effective_affinity(d, cpumask_of(vpe->col_idx)); + /* * If we use the list map, we issue VMAPP on demand... Unless * we're on a GICv4.1 and we eagerly map the VPE on all ITSs @@ -4570,9 +4570,6 @@ static int its_vpe_irq_domain_activate(struct irq_dom= ain *domain, if (!gic_requires_eager_mapping()) return 0; =20 - /* Map the VPE to the first possible CPU */ - vpe->col_idx =3D cpumask_first(cpu_online_mask); - list_for_each_entry(its, &its_nodes, entry) { if (!is_v4(its)) continue; @@ -4581,8 +4578,6 @@ static int its_vpe_irq_domain_activate(struct irq_dom= ain *domain, its_send_vinvall(its, vpe); } =20 - irq_data_update_effective_affinity(d, cpumask_of(vpe->col_idx)); - return 0; } =20 --=20 2.39.2 =46rom e3c53afb6b3b5e9e4dc5963f9f70350455e7e986 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Mon, 1 Jul 2024 12:00:13 +0100 Subject: [PATCH 2/3] irqchip/gic-v4: Substitute vmovp_lock for a per-VM lock vmovp_lock is abused in a number of cases to serialise updates to vlpi_count[] and deal with map/unmap of a VM to ITSs. Instead, provide such a lock and revisit the use of vlpi_count[] so that it is always wrapped in this per-VM vmapp_lock. Signed-off-by: Marc Zyngier --- drivers/irqchip/irq-gic-v3-its.c | 21 +++++++++++---------- include/linux/irqchip/arm-gic-v4.h | 5 +++++ 2 files changed, 16 insertions(+), 10 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-= its.c index a00c5e8c4ea65..7c37fbe97afbe 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1338,6 +1338,11 @@ static void its_send_vmovp(struct its_vpe *vpe) * Wall <-- Head. */ raw_spin_lock_irqsave(&vmovp_lock, flags); + /* + * Protect against concurrent updates of the mapping state on + * individual VMs. + */ + raw_spin_lock(&vpe->its_vm->vmapp_lock); =20 desc.its_vmovp_cmd.seq_num =3D vmovp_seq_num++; desc.its_vmovp_cmd.its_list =3D get_its_list(vpe->its_vm); @@ -1354,6 +1359,7 @@ static void its_send_vmovp(struct its_vpe *vpe) its_send_single_vcommand(its, its_build_vmovp_cmd, &desc); } =20 + raw_spin_unlock(&vpe->its_vm->vmapp_lock); raw_spin_unlock_irqrestore(&vmovp_lock, flags); } =20 @@ -1791,12 +1797,10 @@ static bool gic_requires_eager_mapping(void) =20 static void its_map_vm(struct its_node *its, struct its_vm *vm) { - unsigned long flags; - if (gic_requires_eager_mapping()) return; =20 - raw_spin_lock_irqsave(&vmovp_lock, flags); + guard(raw_spinlock_irqsave)(&vm->vmapp_lock); =20 /* * If the VM wasn't mapped yet, iterate over the vpes and get @@ -1814,19 +1818,15 @@ static void its_map_vm(struct its_node *its, struct= its_vm *vm) its_send_vinvall(its, vpe); } } - - raw_spin_unlock_irqrestore(&vmovp_lock, flags); } =20 static void its_unmap_vm(struct its_node *its, struct its_vm *vm) { - unsigned long flags; - /* Not using the ITS list? Everything is always mapped. */ if (gic_requires_eager_mapping()) return; =20 - raw_spin_lock_irqsave(&vmovp_lock, flags); + guard(raw_spinlock_irqsave)(&vm->vmapp_lock); =20 if (!--vm->vlpi_count[its->list_nr]) { int i; @@ -1834,8 +1834,6 @@ static void its_unmap_vm(struct its_node *its, struct= its_vm *vm) for (i =3D 0; i < vm->nr_vpes; i++) its_send_vmapp(its, vm->vpes[i], false); } - - raw_spin_unlock_irqrestore(&vmovp_lock, flags); } =20 static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info) @@ -3922,6 +3920,8 @@ static void its_vpe_invall(struct its_vpe *vpe) { struct its_node *its; =20 + guard(raw_spinlock_irqsave)(&vpe->its_vm->vmapp_lock); + list_for_each_entry(its, &its_nodes, entry) { if (!is_v4(its)) continue; @@ -4527,6 +4527,7 @@ static int its_vpe_irq_domain_alloc(struct irq_domain= *domain, unsigned int virq vm->db_lpi_base =3D base; vm->nr_db_lpis =3D nr_ids; vm->vprop_page =3D vprop_page; + raw_spin_lock_init(&vm->vmapp_lock); =20 if (gic_rdists->has_rvpeid) irqchip =3D &its_vpe_4_1_irq_chip; diff --git a/include/linux/irqchip/arm-gic-v4.h b/include/linux/irqchip/arm= -gic-v4.h index 2c63375bbd43f..ed879e03f4e64 100644 --- a/include/linux/irqchip/arm-gic-v4.h +++ b/include/linux/irqchip/arm-gic-v4.h @@ -25,6 +25,11 @@ struct its_vm { irq_hw_number_t db_lpi_base; unsigned long *db_bitmap; int nr_db_lpis; + /* + * Ensures mutual exclusion between updates to vlpi_count[] + * and map/unmap when using the ITSList mechanism. + */ + raw_spinlock_t vmapp_lock; u32 vlpi_count[GICv4_ITS_LIST_MAX]; }; =20 --=20 2.39.2 =46rom b8092dce0312469a0ea03ada1da854f4ded80e2a Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Mon, 1 Jul 2024 12:03:22 +0100 Subject: [PATCH 3/3] irqchip/gic-v4: Make sure a VPE is locked when VMAPP is issued In order to make sure that vpe->col_idx is correctly sampled when a VMAPP command is issued, we must hold the lock for the VPE. This is now possible since the introduction of the per-VM vmapp_lock, which can be taken before vpe_lock in the locking order. Signed-off-by: Marc Zyngier --- drivers/irqchip/irq-gic-v3-its.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-= its.c index 7c37fbe97afbe..0e1583a82d4e7 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1814,7 +1814,9 @@ static void its_map_vm(struct its_node *its, struct i= ts_vm *vm) for (i =3D 0; i < vm->nr_vpes; i++) { struct its_vpe *vpe =3D vm->vpes[i]; =20 + raw_spin_lock(&vpe->vpe_lock); its_send_vmapp(its, vpe, true); + raw_spin_unlock(&vpe->vpe_lock); its_send_vinvall(its, vpe); } } @@ -1831,8 +1833,10 @@ static void its_unmap_vm(struct its_node *its, struc= t its_vm *vm) if (!--vm->vlpi_count[its->list_nr]) { int i; =20 - for (i =3D 0; i < vm->nr_vpes; i++) + for (i =3D 0; i < vm->nr_vpes; i++) { + guard(raw_spinlock)(&vm->vpes[i]->vpe_lock); its_send_vmapp(its, vm->vpes[i], false); + } } } =20 --=20 2.39.2 --=20 Without deviation from the norm, progress is not possible.