From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 17D04C61DB2 for ; Tue, 10 Jun 2025 19:30:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-Type: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=xQ3eXxzKXjOqzvRPuyBp5OS45IZk77dxOG3URGussUQ=; b=T2zgG9rqxCwtLwC+iT6P3q1LUA Gpq9RqUnr9G6o3ABJX+cyIAh7mLfJ+owTQaFQKTJ+udPZSzqYMzkqKHaLSskaMUlo6rB7vp8ZTPV4 /fwKs4bNXCgbqzfrqGVGCiEFX9J5kTfAuH+8ExeUmNP6Fifw/6AAuEzS9NigP1YthwqQ7y4aNuOgV UkcwDediM2GU1sG2D8hSFT2P6JQl7/CIcJAPWQofd6W4cbPwgtioQ5hZXrAyRoKAoiA0RNyviPZk6 8N2rUMsHiAAUX1qxx5K/cFxACG6f91hP21CcQBs3SeyFZ3RGeJdFT8QHcV/yXuZMBvqDB+Ov8ejIj DF8hFybA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1uP4fn-00000007vM9-1uEz; Tue, 10 Jun 2025 19:30:15 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1uP1PN-00000007OqU-2jdH for linux-arm-kernel@bombadil.infradead.org; Tue, 10 Jun 2025 16:01:05 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:In-Reply-To:References; bh=xQ3eXxzKXjOqzvRPuyBp5OS45IZk77dxOG3URGussUQ=; b=NEB5wPmoBXtqHmaNej2ZeC0pRC 8XJQzDo551XA/mGeRQnnkBCZsc47SctuDM9CBFLXujN5L8V0OJDUe3slfi9UOYTQkAjWZckw5Jxtd 12Y+E1yVVTvVgP8clqumR3IaQGl24DF25pv/8YoWJMKCyrPZ/CPXfMiXhExdI4zAg+b8AQmcVia8O sYlJJHbRmJiKorJUz5IfQXXk+yzoJLpykxgZIhds3e++Y3qzN1zV3oNOJwG+qFnYhY7ageoNuzP/G XwoVVI/eGHZBOQgy8WtGl0A4JXhCeLpyip+PUTeeszblU3UaGBmhYal3NvprKuTrP5E5DqVF4HszN KfssOdVg==; Received: from foss.arm.com ([217.140.110.172]) by desiato.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1uP1PK-00000001vEs-1SFm for linux-arm-kernel@lists.infradead.org; Tue, 10 Jun 2025 16:01:05 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BDED814BF; Tue, 10 Jun 2025 09:00:39 -0700 (PDT) Received: from localhost.localdomain (unknown [10.163.87.23]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 8F1F13F673; Tue, 10 Jun 2025 09:00:54 -0700 (PDT) From: Dev Jain To: catalin.marinas@arm.com, will@kernel.org Cc: anshuman.khandual@arm.com, quic_zhenhuah@quicinc.com, ryan.roberts@arm.com, kevin.brodsky@arm.com, yangyicong@hisilicon.com, joey.gouly@arm.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, david@redhat.com, Dev Jain Subject: [PATCH v2] arm64: Enable vmalloc-huge with ptdump Date: Tue, 10 Jun 2025 21:30:48 +0530 Message-Id: <20250610160048.11254-1-dev.jain@arm.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250610_170102_790779_3964AC9E X-CRM114-Status: GOOD ( 14.89 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org arm64 disables vmalloc-huge when kernel page table dumping is enabled, because an intermediate table may be removed, potentially causing the ptdump code to dereference an invalid address. We want to be able to analyze block vs page mappings for kernel mappings with ptdump, so to enable vmalloc-huge with ptdump, synchronize between page table removal in pmd_free_pte_page()/pud_free_pmd_page() and ptdump pagetable walking. We use mmap_read_lock and not write lock because we don't need to synchronize between two different vm_structs; two vmalloc objects running this same code path will point to different page tables, hence there is no race. For pud_free_pmd_page(), we isolate the PMD table to avoid taking the lock 512 times again via pmd_free_pte_page(). Note that there is no need to move __flush_tlb_kernel_pgtable() to immediately after pud_clear(); the only argument against this would be that we immediately require a dsb(ishst) (present in __flush_tlb_kernel_pgtable()) after pud_clear(), but that is not the case, since the transition is from valid -> invalid, not vice-versa. No issues were observed with mm-selftests. No issues were observed while parallelly running test_vmalloc.sh and dumping the kernel pagetable through sysfs in a loop. v1->v2: - Take lock only when CONFIG_PTDUMP_DEBUGFS is on - In case of pud_free_pmd_page(), isolate the PMD table to avoid taking the lock 512 times again via pmd_free_pte_page() Signed-off-by: Dev Jain --- arch/arm64/include/asm/vmalloc.h | 6 ++--- arch/arm64/mm/mmu.c | 43 +++++++++++++++++++++++++++++--- 2 files changed, 42 insertions(+), 7 deletions(-) diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h index 12f534e8f3ed..e835fd437ae0 100644 --- a/arch/arm64/include/asm/vmalloc.h +++ b/arch/arm64/include/asm/vmalloc.h @@ -12,15 +12,13 @@ static inline bool arch_vmap_pud_supported(pgprot_t prot) /* * SW table walks can't handle removal of intermediate entries. */ - return pud_sect_supported() && - !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS); + return pud_sect_supported(); } #define arch_vmap_pmd_supported arch_vmap_pmd_supported static inline bool arch_vmap_pmd_supported(pgprot_t prot) { - /* See arch_vmap_pud_supported() */ - return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS); + return true; } #define arch_vmap_pte_range_map_size arch_vmap_pte_range_map_size diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 8fcf59ba39db..fa98a62e4baf 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1267,7 +1267,25 @@ int pmd_clear_huge(pmd_t *pmdp) return 1; } -int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr) +#ifdef CONFIG_PTDUMP_DEBUGFS +static inline void ptdump_synchronize_lock(void) +{ + /* Synchronize against ptdump_walk_pgd() */ + mmap_read_lock(&init_mm); +} + +static inline void ptdump_synchronize_unlock(void) +{ + mmap_read_unlock(&init_mm); +} +#else /* CONFIG_PTDUMP_DEBUGFS */ + +static inline void ptdump_synchronize_lock(void) {} +static inline void ptdump_synchronize_unlock(void) {} + +#endif /* CONFIG_PTDUMP_DEBUGFS */ + +static int __pmd_free_pte_page(pmd_t *pmdp, unsigned long addr, bool lock) { pte_t *table; pmd_t pmd; @@ -1280,12 +1298,23 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr) } table = pte_offset_kernel(pmdp, addr); + + if (lock) + ptdump_synchronize_lock(); pmd_clear(pmdp); + if (lock) + ptdump_synchronize_unlock(); + __flush_tlb_kernel_pgtable(addr); pte_free_kernel(NULL, table); return 1; } +int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr) +{ + return __pmd_free_pte_page(pmdp, addr, true); +} + int pud_free_pmd_page(pud_t *pudp, unsigned long addr) { pmd_t *table; @@ -1301,14 +1330,22 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr) } table = pmd_offset(pudp, addr); + + /* + * Isolate the PMD table; in case of race with ptdump, this helps + * us to avoid taking the lock in __pmd_free_pte_page() + */ + ptdump_synchronize_lock(); + pud_clear(pudp); + ptdump_synchronize_unlock(); + pmdp = table; next = addr; end = addr + PUD_SIZE; do { - pmd_free_pte_page(pmdp, next); + __pmd_free_pte_page(pmdp, next, false); } while (pmdp++, next += PMD_SIZE, next != end); - pud_clear(pudp); __flush_tlb_kernel_pgtable(addr); pmd_free(NULL, table); return 1; -- 2.30.2