From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA180EB64DA for ; Wed, 12 Jul 2023 17:59:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232994AbjGLR7W (ORCPT ); Wed, 12 Jul 2023 13:59:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233081AbjGLR6y (ORCPT ); Wed, 12 Jul 2023 13:58:54 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 836522688 for ; Wed, 12 Jul 2023 10:58:50 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 0994761873 for ; Wed, 12 Jul 2023 17:58:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 55BAAC433C9; Wed, 12 Jul 2023 17:58:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1689184729; bh=Q3VUpC1qAdFyzaoYjrx/oDARXBPY3D6tJUVTVY3oXr8=; h=Date:To:From:Subject:From; b=KZPLnGtXqoGgaAi+ZmBx3F1PkDhNuS/Cq2bihuhyKgvrflvNmHgQl4qEjWhHbiLRT pSbCerzLihyUkPbs6vs0vaLiDYWUDYxlSaNWCvWnqPtlmYpuBDfstsWYyXAb0x3/zg 2ZssvOzPSNRDaEmmFnu5N2w+651ZrBsvVD9a73y4= Date: Wed, 12 Jul 2023 10:58:48 -0700 To: mm-commits@vger.kernel.org, ziy@nvidia.com, zhengqi.arch@bytedance.com, zackr@vmware.com, yuzhao@google.com, ying.huang@intel.com, willy@infradead.org, will@kernel.org, vishal.moola@gmail.com, vbabka@suse.cz, thomas.hellstrom@linux.intel.com, surenb@google.com, steven.price@arm.com, song@kernel.org, sj@kernel.org, shy828301@gmail.com, rppt@kernel.org, rcampbell@nvidia.com, peterz@infradead.org, peterx@redhat.com, pasha.tatashin@soleen.com, naoya.horiguchi@nec.com, mpe@ellerman.id.au, minchan@kernel.org, mike.kravetz@oracle.com, mgorman@techsingularity.net, lstoakes@gmail.com, linux@armlinux.org.uk, linmiaohe@huawei.com, kirill.shutemov@linux.intel.com, jgg@ziepe.ca, jannh@google.com, ira.weiny@intel.com, imbrenda@linux.ibm.com, hch@infradead.org, hca@linux.ibm.com, gor@linux.ibm.com, gerald.schaefer@linux.ibm.com, david@redhat.com, davem@davemloft.net, christophe.leroy@csgroup.eu, borntraeger@linux.ibm.com, axelrasmussen@google.com, apopple@nvidia.com, anshuman.khandual@arm.com, aneesh.kumar@linux.ibm.com, agordeev@linux.ibm.com, hughd@google.com, akpm@linux-foundation.org From: Andrew Morton Subject: + mm-khugepaged-delete-khugepaged_collapse_pte_mapped_thps.patch added to mm-unstable branch Message-Id: <20230712175849.55BAAC433C9@smtp.kernel.org> Precedence: bulk Reply-To: linux-kernel@vger.kernel.org List-ID: X-Mailing-List: mm-commits@vger.kernel.org The patch titled Subject: mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() has been added to the -mm mm-unstable branch. Its filename is mm-khugepaged-delete-khugepaged_collapse_pte_mapped_thps.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-khugepaged-delete-khugepaged_collapse_pte_mapped_thps.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Hugh Dickins Subject: mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() Date: Tue, 11 Jul 2023 21:43:36 -0700 (PDT) Now that retract_page_tables() can retract page tables reliably, without depending on trylocks, delete all the apparatus for khugepaged to try again later: khugepaged_collapse_pte_mapped_thps() etc; and free up the per-mm memory which was set aside for that in the khugepaged_mm_slot. But one part of that is worth keeping: when hpage_collapse_scan_file() found SCAN_PTE_MAPPED_HUGEPAGE, that address was noted in the mm_slot to be tried for retraction later - catching, for example, page tables where a reversible mprotect() of a portion had required splitting the pmd, but now it can be recollapsed. Call collapse_pte_mapped_thp() directly in this case (why was it deferred before? I assume an issue with needing mmap_lock for write, but now it's only needed for read). Link: https://lkml.kernel.org/r/a5dce57-6dfa-5559-4698-e817eb2f993@google.com Signed-off-by: Hugh Dickins Cc: Alexander Gordeev Cc: Alistair Popple Cc: Aneesh Kumar K.V Cc: Anshuman Khandual Cc: Axel Rasmussen Cc: Christian Borntraeger Cc: Christophe Leroy Cc: Christoph Hellwig Cc: Claudio Imbrenda Cc: David Hildenbrand Cc: "David S. Miller" Cc: Gerald Schaefer Cc: Heiko Carstens Cc: Huang, Ying Cc: Ira Weiny Cc: Jann Horn Cc: Jason Gunthorpe Cc: Kirill A. Shutemov Cc: Lorenzo Stoakes Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Miaohe Lin Cc: Michael Ellerman Cc: Mike Kravetz Cc: Mike Rapoport (IBM) Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Pavel Tatashin Cc: Peter Xu Cc: Peter Zijlstra Cc: Qi Zheng Cc: Ralph Campbell Cc: Russell King Cc: SeongJae Park Cc: Song Liu Cc: Steven Price Cc: Suren Baghdasaryan Cc: Thomas Hellström Cc: Vasily Gorbik Cc: Vishal Moola (Oracle) Cc: Vlastimil Babka Cc: Will Deacon Cc: Yang Shi Cc: Yu Zhao Cc: Zack Rusin Cc: Zi Yan Signed-off-by: Andrew Morton --- mm/khugepaged.c | 127 ++++++---------------------------------------- 1 file changed, 17 insertions(+), 110 deletions(-) --- a/mm/khugepaged.c~mm-khugepaged-delete-khugepaged_collapse_pte_mapped_thps +++ a/mm/khugepaged.c @@ -93,8 +93,6 @@ static DEFINE_READ_MOSTLY_HASHTABLE(mm_s static struct kmem_cache *mm_slot_cache __read_mostly; -#define MAX_PTE_MAPPED_THP 8 - struct collapse_control { bool is_khugepaged; @@ -108,15 +106,9 @@ struct collapse_control { /** * struct khugepaged_mm_slot - khugepaged information per mm that is being scanned * @slot: hash lookup from mm to mm_slot - * @nr_pte_mapped_thp: number of pte mapped THP - * @pte_mapped_thp: address array corresponding pte mapped THP */ struct khugepaged_mm_slot { struct mm_slot slot; - - /* pte-mapped THP in this mm */ - int nr_pte_mapped_thp; - unsigned long pte_mapped_thp[MAX_PTE_MAPPED_THP]; }; /** @@ -1441,50 +1433,6 @@ static void collect_mm_slot(struct khuge } #ifdef CONFIG_SHMEM -/* - * Notify khugepaged that given addr of the mm is pte-mapped THP. Then - * khugepaged should try to collapse the page table. - * - * Note that following race exists: - * (1) khugepaged calls khugepaged_collapse_pte_mapped_thps() for mm_struct A, - * emptying the A's ->pte_mapped_thp[] array. - * (2) MADV_COLLAPSE collapses some file extent with target mm_struct B, and - * retract_page_tables() finds a VMA in mm_struct A mapping the same extent - * (at virtual address X) and adds an entry (for X) into mm_struct A's - * ->pte-mapped_thp[] array. - * (3) khugepaged calls khugepaged_collapse_scan_file() for mm_struct A at X, - * sees a pte-mapped THP (SCAN_PTE_MAPPED_HUGEPAGE) and adds an entry - * (for X) into mm_struct A's ->pte-mapped_thp[] array. - * Thus, it's possible the same address is added multiple times for the same - * mm_struct. Should this happen, we'll simply attempt - * collapse_pte_mapped_thp() multiple times for the same address, under the same - * exclusive mmap_lock, and assuming the first call is successful, subsequent - * attempts will return quickly (without grabbing any additional locks) when - * a huge pmd is found in find_pmd_or_thp_or_none(). Since this is a cheap - * check, and since this is a rare occurrence, the cost of preventing this - * "multiple-add" is thought to be more expensive than just handling it, should - * it occur. - */ -static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, - unsigned long addr) -{ - struct khugepaged_mm_slot *mm_slot; - struct mm_slot *slot; - bool ret = false; - - VM_BUG_ON(addr & ~HPAGE_PMD_MASK); - - spin_lock(&khugepaged_mm_lock); - slot = mm_slot_lookup(mm_slots_hash, mm); - mm_slot = mm_slot_entry(slot, struct khugepaged_mm_slot, slot); - if (likely(mm_slot && mm_slot->nr_pte_mapped_thp < MAX_PTE_MAPPED_THP)) { - mm_slot->pte_mapped_thp[mm_slot->nr_pte_mapped_thp++] = addr; - ret = true; - } - spin_unlock(&khugepaged_mm_lock); - return ret; -} - /* hpage must be locked, and mmap_lock must be held */ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct page *hpage) @@ -1708,29 +1656,6 @@ drop_hpage: return result; } -static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_slot) -{ - struct mm_slot *slot = &mm_slot->slot; - struct mm_struct *mm = slot->mm; - int i; - - if (likely(mm_slot->nr_pte_mapped_thp == 0)) - return; - - if (!mmap_write_trylock(mm)) - return; - - if (unlikely(hpage_collapse_test_exit(mm))) - goto out; - - for (i = 0; i < mm_slot->nr_pte_mapped_thp; i++) - collapse_pte_mapped_thp(mm, mm_slot->pte_mapped_thp[i], false); - -out: - mm_slot->nr_pte_mapped_thp = 0; - mmap_write_unlock(mm); -} - static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) { struct vm_area_struct *vma; @@ -2371,16 +2296,6 @@ static int hpage_collapse_scan_file(stru { BUILD_BUG(); } - -static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_slot) -{ -} - -static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, - unsigned long addr) -{ - return false; -} #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, @@ -2410,7 +2325,6 @@ static unsigned int khugepaged_scan_mm_s khugepaged_scan.mm_slot = mm_slot; } spin_unlock(&khugepaged_mm_lock); - khugepaged_collapse_pte_mapped_thps(mm_slot); mm = slot->mm; /* @@ -2463,36 +2377,29 @@ skip: khugepaged_scan.address); mmap_read_unlock(mm); - *result = hpage_collapse_scan_file(mm, - khugepaged_scan.address, - file, pgoff, cc); mmap_locked = false; + *result = hpage_collapse_scan_file(mm, + khugepaged_scan.address, file, pgoff, cc); + if (*result == SCAN_PTE_MAPPED_HUGEPAGE) { + mmap_read_lock(mm); + mmap_locked = true; + if (hpage_collapse_test_exit(mm)) { + fput(file); + goto breakouterloop; + } + *result = collapse_pte_mapped_thp(mm, + khugepaged_scan.address, false); + if (*result == SCAN_PMD_MAPPED) + *result = SCAN_SUCCEED; + } fput(file); } else { *result = hpage_collapse_scan_pmd(mm, vma, - khugepaged_scan.address, - &mmap_locked, - cc); + khugepaged_scan.address, &mmap_locked, cc); } - switch (*result) { - case SCAN_PTE_MAPPED_HUGEPAGE: { - pmd_t *pmd; - - *result = find_pmd_or_thp_or_none(mm, - khugepaged_scan.address, - &pmd); - if (*result != SCAN_SUCCEED) - break; - if (!khugepaged_add_pte_mapped_thp(mm, - khugepaged_scan.address)) - break; - } fallthrough; - case SCAN_SUCCEED: + + if (*result == SCAN_SUCCEED) ++khugepaged_pages_collapsed; - break; - default: - break; - } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; _ Patches currently in -mm which might be from hughd@google.com are mm-userfaultfd-add-new-uffdio_poison-ioctl-fix.patch mm-pgtable-add-rcu_read_lock-and-rcu_read_unlocks.patch mm-pgtable-add-pae-safety-to-__pte_offset_map.patch arm-adjust_pte-use-pte_offset_map_nolock.patch powerpc-assert_pte_locked-use-pte_offset_map_nolock.patch powerpc-add-pte_free_defer-for-pgtables-sharing-page.patch sparc-add-pte_free_defer-for-pte_t-pgtable_t.patch s390-add-pte_free_defer-for-pgtables-sharing-page.patch mm-pgtable-add-pte_free_defer-for-pgtable-as-page.patch mm-khugepaged-retract_page_tables-without-mmap-or-vma-lock.patch mm-khugepaged-collapse_pte_mapped_thp-with-mmap_read_lock.patch mm-khugepaged-delete-khugepaged_collapse_pte_mapped_thps.patch mm-delete-mmap_write_trylock-and-vma_try_start_write.patch mm-pgtable-notes-on-pte_offset_map.patch