From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B724DC4332F for ; Mon, 30 Oct 2023 10:41:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Subject:Cc:To:From:Message-ID:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=pMW6k8MadpCLrZ7Vu4RCg5bnGEQC7Q/DmXYVa4KCuQE=; b=lL3Q7mjyz9ZWdC 0LCTVqe+kfecwD+m1OBNyJLP5h/Q+Ajx0TIyekx6EDxLmjulEybkFzykrGBK8QwAdwPLOCqf6CuGO kG+3KfZNhN6ybEbB1QnChDJdZcz44qBchPxclHhWLQdR1EApc+uSJv+zQqY9PzFU6uRnRAsR3tyt0 y4/UJcjBSQ9tlAwoFSZ5zIot+bTyhjseiSkt8maQSq+W36crShwd4nSaKd/vfip3fXRjX2mcOGZQV OQLFSHLAgM90SjAjVLbVDtu6mO2R0jKKAfSdmnUcb8k1k4PCVfNS4htG1Z/t6TzeQqwCruaNV8185 Nr1l/YaILZRhWu2aEHvQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qxPhI-003BmI-0v; Mon, 30 Oct 2023 10:40:40 +0000 Received: from sin.source.kernel.org ([2604:1380:40e1:4800::1]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qxPhF-003BlO-0U for linux-arm-kernel@lists.infradead.org; Mon, 30 Oct 2023 10:40:38 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 533B1CE1670; Mon, 30 Oct 2023 10:40:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 85DE8C433C8; Mon, 30 Oct 2023 10:40:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1698662432; bh=IC+XVlpEXDmeUZI867tbsHj8XCh7Alv+GZIJc0Uyfrg=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=SyQr3ACPAf+EYTKIg2Ox5AVtZDD7uIWUp0689SP+5YdBEtJIZM6wjnBExPDS5QfhW HVNs9LjIlMlE4Vu/5O680I5aF0aQs8w3jy1GVYfbyZACENQraFHdr4XHD4AeixonHk 3WnL4aLvcal22WrQ9FMsvXc2ryx0K7rtBmvTrDjKJs2biyGVAQzdM9ZIkImWeqhKUW hcPKAbit3NaFBoYKiM2zBMpJWlQcrJgJoYS3mOd651muFpO+RUa/uPFDxLtjqdjbd8 XUGfqk2Kg+JEzPGmwNSNeCTB0S+/QbQ8slF0POHTsbKxu0eIfrzsjUKOfqtu7UmEdR XoYvm2Pr9zUPg== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1qxPh7-008tWT-UA; Mon, 30 Oct 2023 10:40:30 +0000 Date: Mon, 30 Oct 2023 10:40:28 +0000 Message-ID: <86o7gg1jhf.wl-maz@kernel.org> From: Marc Zyngier To: Ryan Roberts Cc: Vincent Donnefort , oliver.upton@linux.dev, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, kernel-team@android.com, will@kernel.org, willy@infradead.org Subject: Re: [PATCH v2 2/2] KVM: arm64: Use folio for THP adjustment In-Reply-To: <418313c5-2094-4aaf-ae43-a1f3bf8e936f@arm.com> References: <20230928173205.2826598-1-vdonnefort@google.com> <20230928173205.2826598-3-vdonnefort@google.com> <418313c5-2094-4aaf-ae43-a1f3bf8e936f@arm.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: ryan.roberts@arm.com, vdonnefort@google.com, oliver.upton@linux.dev, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, kernel-team@android.com, will@kernel.org, willy@infradead.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231030_034037_530320_DDFA6066 X-CRM114-Status: GOOD ( 37.65 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Sat, 28 Oct 2023 10:17:17 +0100, Ryan Roberts wrote: > > On 28/09/2023 18:32, Vincent Donnefort wrote: > > Since commit cb196ee1ef39 ("mm/huge_memory: convert > > do_huge_pmd_anonymous_page() to use vma_alloc_folio()"), transparent > > huge pages use folios. It enables us to check efficiently if a page is > > mapped by a block simply looking at the folio size. This is saving a > > page table walk. > > > > It is safe to read the folio in this path. We've just increased its > > refcount (GUP from __gfn_to_pfn_memslot()). This will prevent attempts > > of splitting the huge page. > > > > Signed-off-by: Vincent Donnefort > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index de5e5148ef5d..69fcbcc7aca5 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -791,51 +791,6 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size, > > return 0; > > } > > > > -static struct kvm_pgtable_mm_ops kvm_user_mm_ops = { > > - /* We shouldn't need any other callback to walk the PT */ > > - .phys_to_virt = kvm_host_va, > > -}; > > - > > -static int get_user_mapping_size(struct kvm *kvm, u64 addr) > > -{ > > - struct kvm_pgtable pgt = { > > - .pgd = (kvm_pteref_t)kvm->mm->pgd, > > - .ia_bits = vabits_actual, > > - .start_level = (KVM_PGTABLE_MAX_LEVELS - > > - CONFIG_PGTABLE_LEVELS), > > - .mm_ops = &kvm_user_mm_ops, > > - }; > > - unsigned long flags; > > - kvm_pte_t pte = 0; /* Keep GCC quiet... */ > > - u32 level = ~0; > > - int ret; > > - > > - /* > > - * Disable IRQs so that we hazard against a concurrent > > - * teardown of the userspace page tables (which relies on > > - * IPI-ing threads). > > - */ > > - local_irq_save(flags); > > - ret = kvm_pgtable_get_leaf(&pgt, addr, &pte, &level); > > - local_irq_restore(flags); > > - > > - if (ret) > > - return ret; > > - > > - /* > > - * Not seeing an error, but not updating level? Something went > > - * deeply wrong... > > - */ > > - if (WARN_ON(level >= KVM_PGTABLE_MAX_LEVELS)) > > - return -EFAULT; > > - > > - /* Oops, the userspace PTs are gone... Replay the fault */ > > - if (!kvm_pte_valid(pte)) > > - return -EAGAIN; > > - > > - return BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(level)); > > -} > > - > > static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = { > > .zalloc_page = stage2_memcache_zalloc_page, > > .zalloc_pages_exact = kvm_s2_zalloc_pages_exact, > > @@ -1274,7 +1229,7 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, > > * > > * Returns the size of the mapping. > > */ > > -static long > > +static unsigned long > > transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot, > > unsigned long hva, kvm_pfn_t *pfnp, > > phys_addr_t *ipap) > > @@ -1287,10 +1242,7 @@ transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot, > > * block map is contained within the memslot. > > */ > > if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) { > > - int sz = get_user_mapping_size(kvm, hva); > > - > > - if (sz < 0) > > - return sz; > > + size_t sz = folio_size(pfn_folio(pfn)); > > Hi, > > Sorry this is an extremely late reply - I just noticed this because Marc > mentioned it in another thread. > > This doesn't look quite right to me; just because you have a folio of a given > size, that doesn't mean the whole thing is mapped into this particular address > space. For example, you could have a (PMD-sized) THP that gets partially > munmapped - the folio is still PMD-sized but only some of it is mapped and > should be accessible to the process. Or you could have a large file-backed folio > (from a filesystem that supports large folios - e.g. XFS) but the application > only mapped part of the file. > > Perhaps I've misunderstood and those edge cases can't happen here for some reason? I went ahead and applied the following hack to the *current* tree, with this patch: diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 482280fe22d7..de365489a62f 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1291,6 +1291,10 @@ transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot, */ if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) { int sz = get_user_mapping_size(kvm, hva); + size_t fsz = folio_size(pfn_folio(pfn)); + + if (sz != fsz) + pr_err("sz = %d fsz = %ld\n", sz, fsz); if (sz < 0) return sz; and sure enough, I see the check firing under *a lot* of memory pressure: [84567.458803] sz = 4096 fsz = 2097152 [84620.166018] sz = 4096 fsz = 2097152 So indeed, folio_size() doesn't provide what we need. We absolutely need to match what is actually mapped in userspace or things may turn out to be rather ugly should the other pages that are part of the same folio be allocated somewhere else. Is that even possible? M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel