From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22CCAC6FA89 for ; Mon, 12 Sep 2022 03:28:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229517AbiILD2Y (ORCPT ); Sun, 11 Sep 2022 23:28:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229615AbiILD2V (ORCPT ); Sun, 11 Sep 2022 23:28:21 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 339AE25EA9 for ; Sun, 11 Sep 2022 20:28:07 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id AF2596116A for ; Mon, 12 Sep 2022 03:28:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 07C0AC433D7; Mon, 12 Sep 2022 03:28:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1662953286; bh=7uoJel3s4vnlxmCuuVdFl3yrg2aGxeGLlqdKjfXig/w=; h=Date:To:From:Subject:From; b=r5k54S7CslACu5xKQ6nkg8aLU0k6LgVD1L7npRG4OfZ0rX7nKLi5fIVogj2Z1BLFD kApoOfiRHZOV0S+zD1pTGa26r4/3thb2qVR4gyrwGnOhyAVwpdQwRkn1bU75ixZNVJ dlyKXP7wGHC39wjkDMcxYnvz6OAlJ7AQN11kSzb4= Date: Sun, 11 Sep 2022 20:28:05 -0700 To: mm-commits@vger.kernel.org, ziy@nvidia.com, willy@infradead.org, vbabka@suse.cz, tsbogend@alpha.franken.de, songliubraving@fb.com, sj@kernel.org, shy828301@gmail.com, rongwei.wang@linux.alibaba.com, rientjes@google.com, peterx@redhat.com, pasha.tatashin@soleen.com, minchan@kernel.org, mhocko@suse.com, mattst88@gmail.com, linmiaohe@huawei.com, kirill.shutemov@linux.intel.com, jrdr.linux@gmail.com, jcmvbkbc@gmail.com, James.Bottomley@HansenPartnership.com, ink@jurassic.park.msu.ru, hughd@google.com, deller@gmx.de, david@redhat.com, dan.carpenter@oracle.com, ckennelly@google.com, chris@zankel.net, axelrasmussen@google.com, axboe@kernel.dk, asml.silence@gmail.com, arnd@arndb.de, alex.shi@linux.alibaba.com, aarcange@redhat.com, zokeefe@google.com, akpm@linux-foundation.org From: Andrew Morton Subject: [merged mm-stable] mm-khugepaged-record-scan_pmd_mapped-when-scan_pmd-finds-hugepage.patch removed from -mm tree Message-Id: <20220912032806.07C0AC433D7@smtp.kernel.org> Precedence: bulk Reply-To: linux-kernel@vger.kernel.org List-ID: X-Mailing-List: mm-commits@vger.kernel.org The quilt patch titled Subject: mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds hugepage has been removed from the -mm tree. Its filename was mm-khugepaged-record-scan_pmd_mapped-when-scan_pmd-finds-hugepage.patch This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: "Zach O'Keefe" Subject: mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds hugepage Date: Wed, 6 Jul 2022 16:59:26 -0700 When scanning an anon pmd to see if it's eligible for collapse, return SCAN_PMD_MAPPED if the pmd already maps a hugepage. Note that SCAN_PMD_MAPPED is different from SCAN_PAGE_COMPOUND used in the file-collapse path, since the latter might identify pte-mapped compound pages. This is required by MADV_COLLAPSE which necessarily needs to know what hugepage-aligned/sized regions are already pmd-mapped. In order to determine if a pmd already maps a hugepage, refactor mm_find_pmd(): Return mm_find_pmd() to it's pre-commit f72e7dcdd252 ("mm: let mm_find_pmd fix buggy race with THP fault") behavior. ksm was the only caller that explicitly wanted a pte-mapping pmd, so open code the pte-mapping logic there (pmd_present() and pmd_trans_huge() checks). Undo revert change in commit f72e7dcdd252 ("mm: let mm_find_pmd fix buggy race with THP fault") that open-coded split_huge_pmd_address() pmd lookup and use mm_find_pmd() instead. Link: https://lkml.kernel.org/r/20220706235936.2197195-9-zokeefe@google.com Signed-off-by: Zach O'Keefe Reviewed-by: Yang Shi Cc: Alex Shi Cc: Andrea Arcangeli Cc: Arnd Bergmann Cc: Axel Rasmussen Cc: Chris Kennelly Cc: Chris Zankel Cc: David Hildenbrand Cc: David Rientjes Cc: Helge Deller Cc: Hugh Dickins Cc: Ivan Kokshaysky Cc: James Bottomley Cc: Jens Axboe Cc: "Kirill A. Shutemov" Cc: Matthew Wilcox Cc: Matt Turner Cc: Max Filippov Cc: Miaohe Lin Cc: Michal Hocko Cc: Minchan Kim Cc: Pasha Tatashin Cc: Pavel Begunkov Cc: Peter Xu Cc: Rongwei Wang Cc: SeongJae Park Cc: Song Liu Cc: Thomas Bogendoerfer Cc: Vlastimil Babka Cc: Zi Yan Cc: Dan Carpenter Cc: "Souptick Joarder (HPE)" Signed-off-by: Andrew Morton --- include/trace/events/huge_memory.h | 1 mm/huge_memory.c | 18 -------- mm/internal.h | 2 mm/khugepaged.c | 60 +++++++++++++++++++++------ mm/ksm.c | 10 ++++ mm/rmap.c | 15 ++---- 6 files changed, 67 insertions(+), 39 deletions(-) --- a/include/trace/events/huge_memory.h~mm-khugepaged-record-scan_pmd_mapped-when-scan_pmd-finds-hugepage +++ a/include/trace/events/huge_memory.h @@ -11,6 +11,7 @@ EM( SCAN_FAIL, "failed") \ EM( SCAN_SUCCEED, "succeeded") \ EM( SCAN_PMD_NULL, "pmd_null") \ + EM( SCAN_PMD_MAPPED, "page_pmd_mapped") \ EM( SCAN_EXCEED_NONE_PTE, "exceed_none_pte") \ EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \ EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \ --- a/mm/huge_memory.c~mm-khugepaged-record-scan_pmd_mapped-when-scan_pmd-finds-hugepage +++ a/mm/huge_memory.c @@ -2286,25 +2286,11 @@ out: void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, bool freeze, struct folio *folio) { - pgd_t *pgd; - p4d_t *p4d; - pud_t *pud; - pmd_t *pmd; + pmd_t *pmd = mm_find_pmd(vma->vm_mm, address); - pgd = pgd_offset(vma->vm_mm, address); - if (!pgd_present(*pgd)) + if (!pmd) return; - p4d = p4d_offset(pgd, address); - if (!p4d_present(*p4d)) - return; - - pud = pud_offset(p4d, address); - if (!pud_present(*pud)) - return; - - pmd = pmd_offset(pud, address); - __split_huge_pmd(vma, pmd, address, freeze, folio); } --- a/mm/internal.h~mm-khugepaged-record-scan_pmd_mapped-when-scan_pmd-finds-hugepage +++ a/mm/internal.h @@ -187,7 +187,7 @@ extern void reclaim_throttle(pg_data_t * /* * in mm/rmap.c: */ -extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); +pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); /* * in mm/page_alloc.c --- a/mm/khugepaged.c~mm-khugepaged-record-scan_pmd_mapped-when-scan_pmd-finds-hugepage +++ a/mm/khugepaged.c @@ -28,6 +28,7 @@ enum scan_result { SCAN_FAIL, SCAN_SUCCEED, SCAN_PMD_NULL, + SCAN_PMD_MAPPED, SCAN_EXCEED_NONE_PTE, SCAN_EXCEED_SWAP_PTE, SCAN_EXCEED_SHARED_PTE, @@ -877,6 +878,45 @@ static int hugepage_vma_revalidate(struc return SCAN_SUCCEED; } +static int find_pmd_or_thp_or_none(struct mm_struct *mm, + unsigned long address, + pmd_t **pmd) +{ + pmd_t pmde; + + *pmd = mm_find_pmd(mm, address); + if (!*pmd) + return SCAN_PMD_NULL; + + pmde = pmd_read_atomic(*pmd); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + /* See comments in pmd_none_or_trans_huge_or_clear_bad() */ + barrier(); +#endif + if (!pmd_present(pmde)) + return SCAN_PMD_NULL; + if (pmd_trans_huge(pmde)) + return SCAN_PMD_MAPPED; + if (pmd_bad(pmde)) + return SCAN_PMD_NULL; + return SCAN_SUCCEED; +} + +static int check_pmd_still_valid(struct mm_struct *mm, + unsigned long address, + pmd_t *pmd) +{ + pmd_t *new_pmd; + int result = find_pmd_or_thp_or_none(mm, address, &new_pmd); + + if (result != SCAN_SUCCEED) + return result; + if (new_pmd != pmd) + return SCAN_FAIL; + return SCAN_SUCCEED; +} + /* * Bring missing pages in from swap, to complete THP collapse. * Only done if khugepaged_scan_pmd believes it is worthwhile. @@ -988,9 +1028,8 @@ static int collapse_huge_page(struct mm_ goto out_nolock; } - pmd = mm_find_pmd(mm, address); - if (!pmd) { - result = SCAN_PMD_NULL; + result = find_pmd_or_thp_or_none(mm, address, &pmd); + if (result != SCAN_SUCCEED) { mmap_read_unlock(mm); goto out_nolock; } @@ -1018,7 +1057,8 @@ static int collapse_huge_page(struct mm_ if (result != SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ - if (mm_find_pmd(mm, address) != pmd) + result = check_pmd_still_valid(mm, address, pmd); + if (result != SCAN_SUCCEED) goto out_up_write; anon_vma_lock_write(vma->anon_vma); @@ -1121,11 +1161,9 @@ static int khugepaged_scan_pmd(struct mm VM_BUG_ON(address & ~HPAGE_PMD_MASK); - pmd = mm_find_pmd(mm, address); - if (!pmd) { - result = SCAN_PMD_NULL; + result = find_pmd_or_thp_or_none(mm, address, &pmd); + if (result != SCAN_SUCCEED) goto out; - } memset(cc->node_load, 0, sizeof(cc->node_load)); pte = pte_offset_map_lock(mm, pmd, address, &ptl); @@ -1383,8 +1421,7 @@ void collapse_pte_mapped_thp(struct mm_s if (!PageHead(hpage)) goto drop_hpage; - pmd = mm_find_pmd(mm, haddr); - if (!pmd) + if (find_pmd_or_thp_or_none(mm, haddr, &pmd) != SCAN_SUCCEED) goto drop_hpage; start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); @@ -1502,8 +1539,7 @@ static void retract_page_tables(struct a if (vma->vm_end < addr + HPAGE_PMD_SIZE) continue; mm = vma->vm_mm; - pmd = mm_find_pmd(mm, addr); - if (!pmd) + if (find_pmd_or_thp_or_none(mm, addr, &pmd) != SCAN_SUCCEED) continue; /* * We need exclusive mmap_lock to retract page table. --- a/mm/ksm.c~mm-khugepaged-record-scan_pmd_mapped-when-scan_pmd-finds-hugepage +++ a/mm/ksm.c @@ -1134,6 +1134,7 @@ static int replace_page(struct vm_area_s { struct mm_struct *mm = vma->vm_mm; pmd_t *pmd; + pmd_t pmde; pte_t *ptep; pte_t newpte; spinlock_t *ptl; @@ -1148,6 +1149,15 @@ static int replace_page(struct vm_area_s pmd = mm_find_pmd(mm, addr); if (!pmd) goto out; + /* + * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at() + * without holding anon_vma lock for write. So when looking for a + * genuine pmde (in which to find pte), test present and !THP together. + */ + pmde = *pmd; + barrier(); + if (!pmd_present(pmde) || pmd_trans_huge(pmde)) + goto out; mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr, addr + PAGE_SIZE); --- a/mm/rmap.c~mm-khugepaged-record-scan_pmd_mapped-when-scan_pmd-finds-hugepage +++ a/mm/rmap.c @@ -767,13 +767,17 @@ unsigned long page_address_in_vma(struct return vma_address(page, vma); } +/* + * Returns the actual pmd_t* where we expect 'address' to be mapped from, or + * NULL if it doesn't exist. No guarantees / checks on what the pmd_t* + * represents. + */ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) { pgd_t *pgd; p4d_t *p4d; pud_t *pud; pmd_t *pmd = NULL; - pmd_t pmde; pgd = pgd_offset(mm, address); if (!pgd_present(*pgd)) @@ -788,15 +792,6 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, goto out; pmd = pmd_offset(pud, address); - /* - * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at() - * without holding anon_vma lock for write. So when looking for a - * genuine pmde (in which to find pte), test present and !THP together. - */ - pmde = *pmd; - barrier(); - if (!pmd_present(pmde) || pmd_trans_huge(pmde)) - pmd = NULL; out: return pmd; } _ Patches currently in -mm which might be from zokeefe@google.com are mm-shmem-add-flag-to-enforce-shmem-thp-in-hugepage_vma_check.patch mm-khugepaged-attempt-to-map-file-shmem-backed-pte-mapped-thps-by-pmds.patch mm-madvise-add-file-and-shmem-support-to-madv_collapse.patch mm-khugepaged-add-tracepoint-to-hpage_collapse_scan_file.patch selftests-vm-dedup-thp-helpers.patch selftests-vm-modularize-thp-collapse-memory-operations.patch selftests-vm-add-thp-collapse-file-and-tmpfs-testing.patch selftests-vm-add-thp-collapse-shmem-testing.patch selftests-vm-add-file-shmem-madv_collapse-selftest-for-cleared-pmd.patch selftests-vm-add-selftest-for-madv_collapse-of-uffd-minor-memory.patch