From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 991DB182AB for ; Tue, 11 Jul 2023 10:10:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 03B5DC433C7; Tue, 11 Jul 2023 10:10:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1689070204; bh=xvD9qOibcpIE57JzhSjo57C0b5FnYP22lRIrz4WjLd4=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=UKXqKoWtgS/06BqN1/gx6XjfRQLf7alc2DGHshVpPDQ/i/5t4ed1nAge60F23iHZD p8sKkBjYc458c6Vo8aYA83nBK5Bkp+XoyfioPK/jjAAlxd2Seflnb4AMtB7ej5nXZh EuvzNuRYWzCMrAXQUUD1X8cI8EAysLHIQDN6UXHI1nqYgmXm5zTdViiEpQhyxIDaoG p+1JDML4nUb38buoCe3dJ/WyPnfdNJfYbtohCV7BUKDkgZRpZnEPhcLP3QZyOPkg0K jf6OtBqZUZ/XZ0jEur2s8f0GwdG5DbPy4//WURkXn5BDOS6sjQFAIw+AlGxsllEkc2 lm4b//i+QX3tQ== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1qJAJl-00C5JM-Dt; Tue, 11 Jul 2023 11:10:01 +0100 Date: Tue, 11 Jul 2023 11:10:01 +0100 Message-ID: <86edlewyh2.wl-maz@kernel.org> From: Marc Zyngier To: Oliver Upton Cc: kvmarm@lists.linux.dev, James Morse , Suzuki K Poulose , Zenghui Yu , Reiji Watanabe , stable@vger.kernel.org, Yu Zhao Subject: Re: [PATCH v2] KVM: arm64: Correctly handle page aging notifiers for unaligned memslot In-Reply-To: <20230627235405.4069823-1-oliver.upton@linux.dev> References: <20230627235405.4069823-1-oliver.upton@linux.dev> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: kvmarm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: oliver.upton@linux.dev, kvmarm@lists.linux.dev, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, reijiw@google.com, stable@vger.kernel.org, yuzhao@google.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Wed, 28 Jun 2023 00:54:05 +0100, Oliver Upton wrote: > > Userspace is allowed to select any PAGE_SIZE aligned hva to back guest > memory. This is even the case with hugepages, although it is a rather > suboptimal configuration as PTE level mappings are used at stage-2. > > The arm64 page aging handlers have an assumption that the specified > range is exactly one page/block of memory, which in the aforementioned > case is not necessarily true. All together this leads to the WARN() in > kvm_age_gfn() firing. > > However, the WARN is only part of the issue as the table walkers visit > at most a single leaf PTE. For hugepage-backed memory in a memslot that > isn't hugepage-aligned, page aging entirely misses accesses to the > hugepage beyond the first page in the memslot. > > Add a new walker dedicated to handling page aging MMU notifiers capable > of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new > walker and drop the WARN that caught the issue in the first place. The > implementation of this walker was inspired by the test_clear_young() > implementation by Yu Zhao [*], but repurposed to address a bug in the > existing aging implementation. > > Cc: stable@vger.kernel.org # v5.15 > Fixes: 056aad67f836 ("kvm: arm/arm64: Rework gpa callback handlers") > Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/ > Co-developed-by: Yu Zhao > Signed-off-by: Yu Zhao > Reported-by: Reiji Watanabe > Signed-off-by: Oliver Upton > --- > arch/arm64/include/asm/kvm_pgtable.h | 26 ++++++------------ > arch/arm64/kvm/hyp/pgtable.c | 41 ++++++++++++++++++++++------ > arch/arm64/kvm/mmu.c | 18 ++++++------ > 3 files changed, 49 insertions(+), 36 deletions(-) > > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h > index dc3c072e862f..75f437e8cd15 100644 > --- a/arch/arm64/include/asm/kvm_pgtable.h > +++ b/arch/arm64/include/asm/kvm_pgtable.h > @@ -556,22 +556,26 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size); > kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr); > > /** > - * kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry. > + * kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access > + * flag in a page-table entry. > * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*(). > * @addr: Intermediate physical address to identify the page-table entry. > + * @size: Size of the address range to visit. > + * @mkold: True if the access flag should be cleared. > * > * The offset of @addr within a page is ignored. > * > - * If there is a valid, leaf page-table entry used to translate @addr, then > - * clear the access flag in that entry. > + * Tests and conditionally clears the access flag for every valid, leaf > + * page-table entry used to translate the range [@addr, @addr + @size). > * > * Note that it is the caller's responsibility to invalidate the TLB after > * calling this function to ensure that the updated permissions are visible > * to the CPUs. > * > - * Return: The old page-table entry prior to clearing the flag, 0 on failure. > + * Return: True if any of the visited PTEs had the access flag set. > */ > -kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr); > +bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, > + u64 size, bool mkold); > > /** > * kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a > @@ -593,18 +597,6 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr); > int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr, > enum kvm_pgtable_prot prot); > > -/** > - * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the > - * access flag set. > - * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*(). > - * @addr: Intermediate physical address to identify the page-table entry. > - * > - * The offset of @addr within a page is ignored. > - * > - * Return: True if the page-table entry has the access flag set, false otherwise. > - */ > -bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr); > - > /** > * kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point > * of Coherency for guest stage-2 address > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > index 5282cb9ca4cf..5d701e9adf5c 100644 > --- a/arch/arm64/kvm/hyp/pgtable.c > +++ b/arch/arm64/kvm/hyp/pgtable.c > @@ -1153,25 +1153,48 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr) > return pte; > } > > -kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr) > +struct stage2_age_data { > + bool mkold; > + bool young; > +}; > + > +static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx, > + enum kvm_pgtable_walk_flags visit) > { > - kvm_pte_t pte = 0; > - stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF, > - &pte, NULL, 0); > + kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF; > + struct stage2_age_data *data = ctx->arg; > + > + if (!kvm_pte_valid(ctx->old) || new == ctx->old) > + return 0; > + > + data->young = true; > + > + if (data->mkold && !stage2_try_set_pte(ctx, new)) > + return -EAGAIN; > + > /* > * "But where's the TLBI?!", you scream. > * "Over in the core code", I sigh. > * > * See the '->clear_flush_young()' callback on the KVM mmu notifier. > */ > - return pte; > + return 0; > } > > -bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr) > +bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, > + u64 size, bool mkold) > { > - kvm_pte_t pte = 0; > - stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0); > - return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF; > + struct stage2_age_data data = { > + .mkold = mkold, > + }; > + struct kvm_pgtable_walker walker = { > + .cb = stage2_age_walker, > + .arg = &data, > + .flags = KVM_PGTABLE_WALK_LEAF, > + }; > + > + WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker)); Do we really want a WARN_ON() here? From what I can tell, it can be (trivially?) triggered by the previous function returning -EAGAIN if the pte update fails in the case of a shared walk. Otherwise, look OK to me. M. -- Without deviation from the norm, progress is not possible.